NotebookLM Wiki Pipeline

Turn large Google Drive PDFs into Obsidian wiki notes without loading the full PDF text into Claude or Codex context. NotebookLM reads the source, and the agent receives only the structured answer needed to create a reusable note.

vNext update: reuse one NotebookLM notebook per topic, while each new note-generation query is scoped to the newly attached or selected PDF source.

That is the main product value:

Topic notebook = reusable knowledge container
MCP query with source_ids = target-PDF-only extraction

/pdf-to-wiki https://drive.google.com/file/d/YOUR_FILE_ID "K-IFRS 1109 Financial Instruments" --topic audit-accounting

The screenshot above is an actual NotebookLM notebook screen from a live MCP test using original public-safe demo PDFs. The topic notebook contains three related infrastructure sources: clean energy grid planning, urban water resilience, and public transit operations. For note generation, the MCP call used source_ids=[target_source_id], and NotebookLM returned sources_used with only the target clean-energy source.

Why This Matters

The old safe pattern was:

1 PDF = 1 NotebookLM notebook

That avoids source contamination, but it does not scale well. Users who process many PDFs about the same topic end up with scattered one-off notebooks.

The vNext pattern is:

1 topic = 1 reusable NotebookLM notebook
1 new wiki note = query only 1 target source inside that notebook

For example, a public-infrastructure notebook can contain:

clean_energy_grid_report.pdf
water_resilience_brief.pdf
public_transit_operations_note.pdf

When you want a wiki note for only the clean-energy grid PDF, call:

notebook_query(
    notebook_id="public-infrastructure-topic-notebook",
    source_ids=["target:clean-energy-grid-report"],
    query="Summarize insights using only the Clean Energy Grid Planning Report."
)

The notebook remains reusable for future topic-level questions, but the note extraction remains grounded in one selected PDF.

User Benefits

Reuse topic notebooks instead of creating one notebook per PDF.
Keep related PDFs together for later cross-document questions.
Generate a new wiki note from only the newly attached or selected source.
Reduce answer contamination by passing source_ids=[target_source_id].
Record target_source_id, sources_used, and query_scope in the completion report.

Architecture

Google Drive PDF
  |
  | pass Drive URL or file ID only
  v
Topic registry
  |
  | choose topic notebook by --topic or routing keywords
  v
Reusable NotebookLM topic notebook
  |
  | notebook_query(source_ids=[target_source_id], query=...)
  v
NotebookLM answer grounded in the target PDF
  |
  | agent formats Markdown and wikilinks
  v
Obsidian wiki note

Installation

Connect Google Drive in Claude.

claude.ai -> Settings -> Integrations -> Google Drive -> Connect

Install notebooklm-mcp-cli.

uv tool install notebooklm-mcp-cli

Log in to NotebookLM.

nlm login

Register the MCP server with Claude Code.

nlm setup add claude-code

Install the slash command.

cp commands/pdf-to-wiki.md ~/.claude/commands/pdf-to-wiki.md

Configure the output directory in ~/.claude/commands/pdf-to-wiki.md.

OUTPUT_DIR=~/your-obsidian-vault/AI_Generated

Topic Registry

Copy the example registry and fill in your own NotebookLM notebook IDs.

cp config/notebooks.example.json config/notebooks.local.json

Example:

{
  "default_policy": "single_source_notebook",
  "default_extraction_mode": "source_scoped_topic_query",
  "topics": [
    {
      "id": "public-infrastructure",
      "label": "Public Infrastructure",
      "notebook_id": "NOTEBOOKLM_NOTEBOOK_ID_FOR_PUBLIC_INFRASTRUCTURE",
      "routing_keywords": ["clean energy", "water resilience", "public transit", "infrastructure"],
      "sources": []
    }
  ]
}

Check the routing decision locally:

python3 scripts/notebook_registry.py \
  "https://drive.google.com/file/d/YOUR_FILE_ID/view" \
  --title "Clean Energy Grid Planning Report" \
  --topic public-infrastructure \
  --registry config/notebooks.local.json

Expected decision shape:

{
  "topic_id": "public-infrastructure",
  "notebook_action": "reuse_topic_notebook",
  "extraction_mode": "source_scoped_topic_query",
  "extraction_notebook_action": "reuse_topic_notebook",
  "topic_notebook_action": "query_target_source_in_topic",
  "source_action": "add_source"
}

MCP Calls

Add the PDF to the topic notebook:

source_add(
  notebook_id="{topic_notebook_id}",
  source_type="drive",
  document_id="{drive_file_id}",
  doc_type="pdf",
  wait=True,
  wait_timeout=120.0
)

Then query only the target source:

notebook_query(
  notebook_id="{topic_notebook_id}",
  source_ids=["{target_source_id}"],
  query="{target-PDF-only prompt}"
)

The prompt should also state the scope:

primary_scope: target PDF only
source_scoped_query: query only this target source_id
topic_notebook_context: other PDFs in the same notebook may be used only for a separated comparison section

Source Verification Gate

The live test found an important operational gap: Drive search may return a wrong PDF when filenames are generic. Before generating the final note, run a short source verification query:

Using only target_source_id, verify the document title, author, and topic.
If it is not the requested document, stop and report source_mismatch.

This prevents the pipeline from summarizing a wrong source that happened to match the search query.

Output Note Metadata

Generated notes should preserve the routing and query scope:

source: notebooklm
drive_url: {drive_url}
drive_file_id: {drive_file_id}
notebook_id: {topic_notebook_id}
target_source_id: {target_source_id}
sources_used: [{target_source_id}]
notebook_policy: reuse_topic_notebook
extraction_mode: source_scoped_topic_query
query_scope: target_source_only
topic: {topic_id}
created: {YYYY-MM-DD}
tags: [ai-generated, pdf-analysis]

Testing

Run deterministic routing tests:

python3 -m unittest discover -v

Run a CLI smoke test:

python3 scripts/notebook_registry.py \
  "https://drive.google.com/file/d/YOUR_FILE_ID/view" \
  --title "Clean Energy Grid Planning Report" \
  --topic public-infrastructure

Guardrails

Do not download full Drive PDF content into the agent context.
Do not paste extracted PDF text into prompts.
Use Drive only for metadata and source IDs.
Use NotebookLM source_add for PDF ingestion.
Use notebook_query(source_ids=[target_source_id]) for new note extraction.
Keep topic-level comparison questions separate from target-source note generation.

Project Structure

.
├── config/
│   └── notebooks.example.json
├── commands/
│   └── pdf-to-wiki.md
├── scripts/
│   └── notebook_registry.py
├── tests/
│   └── test_notebook_registry.py
├── docs/
│   ├── assets/
│   └── adr/
└── examples/

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
commands		commands
config		config
docs		docs
examples		examples
scripts		scripts
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTEXT.md		CONTEXT.md
LICENSE		LICENSE
README.ko.md		README.ko.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NotebookLM Wiki Pipeline

Why This Matters

User Benefits

Architecture

Installation

Topic Registry

MCP Calls

Source Verification Gate

Output Note Metadata

Testing

Guardrails

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NotebookLM Wiki Pipeline

Why This Matters

User Benefits

Architecture

Installation

Topic Registry

MCP Calls

Source Verification Gate

Output Note Metadata

Testing

Guardrails

Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages