Skip to content

DataNath/notebooklm_source_automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NotebookLM source automation

Produced by Nathan Purvis - Databasyx co-founder | Data Engineer @ The Information Lab

Contact

GitHub | LinkedIn | Twitter | Alteryx Community
Email: Nathan@databasyx.com

The problem

Google's NotebookLM is a tool - powered by Google Gemini - that allows us to quickly generate resources like study guides, briefing documents and audio overviews. To create these assets, we need to make a new notebook and provide sources that the model can then pull from. These sources can be:

  • File uploads (PDF, .txt, markdown & audio i.e. mp3)
  • Google drive: Docs or Slides
  • Links: Website or YouTube
  • Paste text: Manually paste in text like meeting notes

However, as pointed out by colleagues and in various Reddit posts, the process for adding link-based sources is incredibly cumbersome; users need to continuously:

  • Press 'Add source'
  • Select 'Website' or 'YouTube'
  • Paste the source URL
  • Hit enter/press 'Insert'

This might be fine for a handful of sources but, given you can create notebooks of up to 300 sources, this is less than ideal when scaled.

The solution

Given we have a repeated pattern of behaviour in terms of how sources are added, this process is a perfect candidate for browser automation, and that's exactly what is used here. Using Playwright - a library created specifically for end-to-end testing and general browser automation tooling - we can easily loop through the steps outlined above to create a new notebook populated with your desired sources.

How do I use this?

Follow the steps below to use this yourself!

1. Clone this repository

git clone https://github.com/DataNath/notebooklm_source_automation.git

If you're in Documents for example, this will create a new subdirectory here with the project's contents.

2. Move into the new directory

cd notebooklm_source_automation

3. Create a virtual environment (optional)

python -m venv .venv

This step isn't strictly necessary but is good practice for isolation and keeping projects lean in terms of packages and so on.

4. Activate your virtual environment

For Windows users:

.venv\scripts\activate

For Mac users:

source .venv/bin/activate

Again, this isn't strictly necessary i.e. if you're not using a venv as outlined in the step above.

5. Install required packages

pip install -r requirements.txt

This will install Playwright and its transitive dependencies.

6. Install Chromium browser

playwright install chromium

This installs the Chromium browser that this project runs on.

7. Provide your source links

The project is set up to read a list of up to 300 (NotebookLM's limit) link-based sources from the relevant file within /sources. By cloning this repository these will already exist as empty files (other than a header) for you to populate.

Warning

If you create your own file(s) and overwrite the existing, make sure the schema is identical i.e. a single field with a maximum of 300 source rows starting on the second row.

8. Set your Google login state

python set_login_state.py

A browser will launch and prompt you to login to Google. Once complete, hit ENTER - the script will terminate and you should see a state.json file appear in your directory. This is used to persist authentication and browser session data, saving you from having to log in before every run. Don't worry, this is already in .gitignore!

Note

I haven't tested/checked exact persistence but, for context, only had to re-run the login script once whilst developing the initial release.

9. Run!

python main.py

This will prompt you to provide two things in the terminal:

  • A source type (currently only 'Website' or 'YouTube')
  • A name for the new notebook

Feedback and/or issues

Please feel free to leave any feeback or suggestions for improvement. If you spot any issues, let me know and I'll endeavour to address them as soon as I can!

(Back to top)

About

Automating browser interactions with Playwright to systematically add website sources to Google's NotebookLM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages