Crawl a codebase and emit markdown document(s) for uploading to NotebookLM as sources.
One-time setup if you don't have pipx yet:
brew install pipx
pipx ensurepath
# open a new terminal after thisPublic repo:
pipx install git+https://github.com/bsreeram08/codebase2nlm.gitSpecific branch, tag, or commit:
pipx install git+https://github.com/bsreeram08/codebase2nlm.git@main
pipx install git+https://github.com/bsreeram08/codebase2nlm.git@v0.1.0
pipx install git+https://github.com/bsreeram08/codebase2nlm.git@<commit-sha>Private repo (uses your SSH key):
pipx install git+ssh://git@github.com/bsreeram08/codebase2nlm.gitUpdate to the latest version on the default branch:
pipx upgrade codebase2nlm
# or force a clean reinstall
pipx install --force git+https://github.com/bsreeram08/codebase2nlm.gitgit clone https://github.com/bsreeram08/codebase2nlm.git
cd codebase2nlm
pipx install .pipx uninstall codebase2nlm# crawl the current directory
codebase2nlm
# crawl a specific project
codebase2nlm ~/projects/myapp
# custom output location
codebase2nlm ~/projects/myapp -o ~/Desktop/myapp-notebooklm
# tweak the per-file word limit
codebase2nlm ~/projects/myapp --max-words 400000
# tweak line and size limits too (NotebookLM-friendly defaults are used automatically)
codebase2nlm ~/projects/myapp --max-lines 95000 --max-mb 190
# ignore .gitignore (still respects .crawlignore and built-in skips)
codebase2nlm ~/projects/myapp --no-gitignoreOutput lands in <PATH>/notebooklm_output/ by default. Upload the resulting
codebase.md (or each codebase_partNN.md if the codebase was too large for a
single source) to NotebookLM.
- Honors
.gitignoreand.crawlignoreat the repo root (gitignore syntax). - Skips common noise:
.git,node_modules,__pycache__,.venv, lockfiles, etc. - Lists binary files in the tree tagged
(binary — contents omitted), skips their contents. - Produces a full ASCII file tree at the top, followed by every text file's contents in labeled code fences.
- Auto-splits into
codebase_part01.md,codebase_part02.md, ... when any NotebookLM per-source limit would be exceeded:- word count (default target: 450k to stay below 500k),
- line count (default target: 95k lines),
- upload size (default target: 190MB to stay below 200MB).
- Oversized files are automatically chunked into labeled sections like
(chunk 1 of N)instead of being left as a single over-limit section. - Warns when output creates more than 50 parts (NotebookLM notebook source count limit).