Skip to content

ArthurYangX/nano-NotebookLM

Repository files navigation

nano-NotebookLM

A self-hosted, open-source study assistant — chat with citations, structured LaTeX notes, exam-prep with a self-evolving question bank, and an editable knowledge graph.

License: Apache 2.0 Python 3.10+ Tests Code style PRs welcome

English | 简体中文

Not affiliated with Google. NotebookLM is a trademark of Google LLC.

hero-web-1080.mp4

Overview

Upload course PDFs / PPTX / DOCX / Markdown → automatic knowledge graph

  • vector index → chat with page-accurate citations, structured LaTeX notes, practice quizzes, exam prep with a self-evolving question bank, and an editable mind map.

Bring your own model: DeepSeek · OpenAI · Anthropic Claude · Gemini · and 11+ more named cloud providers, or any local runner that speaks OpenAI's /v1/chat/completions (Ollama / vLLM / LM Studio / llama.cpp). See the Provider matrix for the full list.


Quick Start

# 1. clone + install (uv is ~10× faster than pip; either works)
git clone https://github.com/ArthurYangX/nano-NotebookLM && cd nano-NotebookLM
uv venv && source .venv/bin/activate && uv pip install -e ".[test]"
# or: python -m venv .venv && source .venv/bin/activate && pip install -e ".[test]"

# 2. configure — copy the template and fill AT LEAST ONE LLM key
#    (OPENAI_API_KEY, or ANTHROPIC_API_KEY, or LOCAL_LLM_BASE_URL+LOCAL_LLM_MODEL)
cp .env.example .env

# 3. run (Ctrl-C to stop; or use ./dev.sh below to manage as a background process)
python api/server.py

→ Open http://localhost:8000 in your browser

Then click the ⚙ Settings gear (top-right of the topbar) to manage LLM providers — add or swap a provider, switch the active default, or hit Test for a 5-second connectivity ping. All without restarting the server. The .env values you just filled in seed the first row; once Settings exists, the UI is the source of truth (writes go to artifacts/providers.json).


Prefer a managed lifecycle? ./dev.sh wraps the same commands:

./dev.sh install   # creates .venv + installs deps + seeds .env
./dev.sh up        # starts the server in background, waits for /api/health
./dev.sh status    # pid + /api/status snapshot
./dev.sh logs      # tail /tmp/nano-nlm.log
./dev.sh down      # stops the server

Install uv for ~10× faster setup: curl -LsSf https://astral.sh/uv/install.sh | sh — full docs at https://docs.astral.sh/uv/getting-started/installation/. Plain pip works identically if you'd rather skip it.


Docker (no Python install needed)

git clone https://github.com/ArthurYangX/nano-NotebookLM && cd nano-NotebookLM
cp .env.example .env   # fill at least one LLM key
docker compose up      # → http://localhost:8000

Course data, KGs, notes, and sessions persist in ./artifacts/ on the host. The default image is ~2 GB — most of which is torch (CPU build) + sentence-transformers for the local embedding option; PDF extraction uses pymupdf. To bake MinerU OCR into the image (heavy: ~5-7 GB total), build with --build-arg WITH_MINERU=1:

docker compose build --build-arg WITH_MINERU=1
docker compose up

Want to host a public demo? See huggingface_space/DEPLOY.md for a one-command push to HuggingFace Spaces (free CPU tier works).

Optional extras (only install what you need)

What Install Without it
MinerU — OCR / layout-aware PDF extractor for scanned slides uv pip install -e ".[mineru]" (heavy: ~3-5GB w/ models) Upload engine=mineru returns an error; engine=pymupdf (default) still works
tectonic — LaTeX → PDF compiler for Notes export brew install tectonic / apt install tectonic "Export PDF" button is hidden; in-browser KaTeX preview still works
LibreOffice (soffice) — .pptx → PDF sidecar for MinerU brew install --cask libreoffice / apt install libreoffice .pptx with engine=mineru silently falls back to python-pptx (no OCR)

GPU: MinerU auto-detects CUDA on Linux (1–2 s/page); Apple Silicon defaults to CPU (~12 s/page) — auto-detect never picks MPS because the pipeline backend hangs there. Override with MINERU_DEVICE_MODE=cuda|cpu|mps in .env. Escape hatch: if CUDA is detected but breaks at runtime (stale driver / OOM), set MINERU_DEVICE_MODE=cpu to force CPU.

Network requirement. The frontend loads React / KaTeX / d3-force / CodeMirror / IBM Plex fonts from public CDNs (jsdelivr, unpkg, esm.sh, Google Fonts) — first page-load needs internet. Fully air-gapped deployments will need to vendor those assets locally.

Looking for architecture and code map? See CLAUDE.md. Want to contribute? See CONTRIBUTING.md.


Why nano-NotebookLM

Capability nano-NotebookLM open-notebook Google NotebookLM ChatGPT (upload PDF)
Fully self-hosted, your data never leaves
Bring your own LLM (cloud or local) ✅ 20+ providers (any OpenAI-compatible endpoint) ✅ 18+ providers ❌ Gemini only ❌ OpenAI only
Page-accurate citations, click to jump ⚠️ basic refs ⚠️ inline quote only
Editable knowledge graph
LaTeX notes (KaTeX + tectonic PDF)
Exam prep with self-evolving question bank
Background upload pipeline, resumable ⚠️ async, no stages ⚠️ session-bound
Cross-course retrieval ⚠️ within-notebook only
Podcast / multi-speaker audio generation ✅ (2-speaker)
Audio / video / web-page ingestion 🔜 future (today: PDF/PPTX/DOCX/MD) ⚠️ varies
Cost at scale Local GPU / API Local GPU / API Free tier capped Subscription

Choosing between nano-NotebookLM and open-notebook: open-notebook is strong at multi-modal ingestion (audio / video / web) and podcast generation. nano-NotebookLM is strong at the deep-reading loop — page-accurate citations into a built-in PDF reader, an editable knowledge graph, LaTeX notes (KaTeX + tectonic PDF), and an exam-prep mode that grows a question bank around the topics you got wrong. Provider coverage is comparable — both are OpenAI-compatible, so any /v1/chat/completions endpoint plugs into either.


Features

  • Chat with page-accurate citations — RAG (BM25 + FAISS + RRF) + knowledge-graph retriever (concept-cosine seed + BFS hop expansion). Every answer links back to the source page in the built-in PDF reader.
  • LaTeX notes — per-source-file streaming generation with a global review pass. KaTeX in the browser; optional tectonic compile to PDF.
  • Practice quizzes + Exam Prep — generates questions, grades them, and auto-generates variants of the ones you got wrong so the bank grows in the directions you actually need.
  • Editable knowledge graph — d3-force layout with relation filters, double-click edit, shift-drag to connect, N to add child, Del to remove. Edits persist as an overlay so re-extraction never clobbers your work.
  • Reader — built-in PDF / PPTX preview, click any citation chip in a chat answer or note to jump to the exact page.
  • Background upload pipeline — close the tab and come back; the ingest job keeps running.

See it in action

LaTeX Notes
LaTeX Notes — per-file streaming generation, KaTeX preview, click any citation chip to jump to the source page.
Knowledge Graph
Knowledge Graph — d3-force layout with relation filters (part-of / depends-on / related / example-of). Edit overlay survives re-extraction.
Exam Prep
Exam Prep — topics weighted by mastery; wrong answers auto-spawn variant questions so the bank grows where you need it.
Chat with cited sources
Chat with cited sources — every answer ends with chips like ch3.pdf, Page 32/51 pointing at the exact source file and page; click to jump there. Bilingual sectioned format (课件覆盖 / 补充背景) makes it explicit which part came from your slides vs general background.

Provider matrix

Provider Type OPENAI_BASE_URL Suggested model
DeepSeek ★ (used during development) Cloud, compat https://api.deepseek.com/v1 deepseek-v4-pro
OpenAI Cloud, native https://api.openai.com/v1 gpt-4o-mini
Anthropic Claude Cloud, native (uses Anthropic SDK) claude-sonnet-4-5
Moonshot (Kimi) Cloud, compat https://api.moonshot.cn/v1 moonshot-v1-8k
Zhipu GLM Cloud, compat https://open.bigmodel.cn/api/paas/v4 glm-4-flash
MiniMax Cloud, compat https://api.minimax.chat/v1 abab6.5-chat
Groq Cloud, compat https://api.groq.com/openai/v1 llama-3.3-70b-versatile
Together Cloud, compat https://api.together.xyz/v1 meta-llama/Llama-3.3-70B-Instruct-Turbo
Gemini (OpenAI mode) Cloud, compat https://generativelanguage.googleapis.com/v1beta/openai/ gemini-2.0-flash
xAI (Grok) Cloud, compat https://api.x.ai/v1 grok-4
OpenRouter Cloud, compat https://openrouter.ai/api/v1 openai/gpt-4o-mini (any vendor/model id)
Perplexity Cloud, compat https://api.perplexity.ai (no /v1) sonar-pro
Mistral Cloud, compat https://api.mistral.ai/v1 mistral-large-latest
DashScope (Qwen) Cloud, compat https://dashscope-intl.aliyuncs.com/compatible-mode/v1 qwen-plus
Fireworks Cloud, compat https://api.fireworks.ai/inference/v1 accounts/fireworks/models/deepseek-v3p1
SiliconFlow Cloud, compat https://api.siliconflow.cn/v1 Qwen/Qwen2.5-72B-Instruct
Cerebras Cloud, compat https://api.cerebras.ai/v1 llama-3.3-70b
Ollama Local http://localhost:11434/v1 qwen3:14b (suggested as open source model)
vLLM Local http://localhost:8001/v1 (pick any free port; not 8000 — that's the app server) Qwen/Qwen2.5-7B-Instruct
LM Studio Local http://localhost:1234/v1 (model loaded in LM Studio)
llama.cpp server Local http://localhost:8080/v1 (GGUF model loaded)

You can mix freely — add a cloud OpenAI key for high-quality KG extraction, point chat at a local 7B for privacy, and the Settings UI swaps between them per-task without restart.

Editing providers from the UI (no restart)

.env is just the first-boot seed. On first start the server synthesises artifacts/providers.json from your env vars and from then on the Settings page is the source of truth: add a second OpenAI-compatible endpoint, swap a model, set the active default, or one-click Test → 5s ping — all without restarting. Keys can stay in .env (api_key_ref: env:VAR) or be stored inline if you prefer (api_key_ref: literal:sk-…; written 0o600, never echoed in any response).

Endpoint Purpose
GET /api/providers List configured providers (redacted)
PUT /api/providers/{id} Create or update
DELETE /api/providers/{id} Remove (refuses default / last row)
POST /api/providers/{id}/test 5-token ping with 5s timeout
POST /api/providers/default Switch active default

Embeddings

Three switchable presets, picked from the Settings UI (no restart, no destructive rebuild — each preset gets its own FAISS namespace under indices/faiss/<preset>/, so toggling is a path-route):

Preset Model Notes
local_mini paraphrase-multilingual-MiniLM-L12-v2 Offline, 50+ languages, CJK-friendly (default)
openai_large text-embedding-3-large Best cross-lingual quality, costs money
bge_m3 BAAI/bge-m3 Strong CJK + EN, runs locally (heavier)

The first switch to a never-used preset kicks off a one-shot background rebuild of every course's index; the banner in the topbar tracks progress. Switching back to an already-built preset is instant.

The env vars in .env.example (EMBEDDING_MODE / EMBEDDING_MODEL / EMBEDDING_API_*) only seed the first-run default — once the Settings preset is set, it wins.


Main API endpoints

Endpoint Purpose
POST /api/chat RAG + KG retrieval chat with citations and intent routing.
POST /api/agent/stream Multi-turn tool-calling agent (NDJSON stream).
POST /api/notes/full-course/stream Per-file LaTeX note generation with review pass; incremental cache.
POST /api/quiz Practice quiz generation.
POST /api/exam-prep/* Topic planning, question seeding, quiz draw, submit + auto-variant.
GET/POST /api/mindmap/{course_id} Knowledge graph read; student edit ops.
POST /api/upload/{course_id} Upload files; returns {task_id, course_id} immediately.
GET /api/upload/status/{task_id} Poll background ingest progress (resume on tab reopen).
GET /api/status Configured backends, embedding mode, version, latency p50.

Example:

curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"question": "What is a receptive field?", "course_id": null, "backend": "openai-main"}'

backend is optional — set it to any provider id from GET /api/providers (e.g. "openai-main", "claude-main", or a user-added "openai-alt") to override the default routing for a single call. Unknown ids fall back to the active default with a server-side warn log, so a stale localStorage chip value won't 422 mid-conversation.


Project layout

api/server.py            FastAPI entry point
frontend/                React 18 (CDN, no build), served statically
nano_notebooklm/
  ├── ai/                LLM router + openai/claude/local backends
  ├── ingest/            PDF/PPTX/DOCX extractors + chunking
  ├── kb/                FAISS + BM25 + RRF hybrid + graph search
  ├── kg/                Two-stage knowledge graph extraction
  ├── skills/            QA, notes, quiz, exam-prep, report, mastery
  └── orchestrator/      Skill routing, multi-turn agent loop, memory
scripts/                 ingest + index + embedding helpers
tests/                   pytest suite — runs offline, no LLM keys needed
artifacts/               (gitignored) per-course chunks, indices, KG, notes
docs/screenshots/        README assets

Development

uv pip install -e ".[test]"    # or: pip install -e ".[test]"
pytest                         # unit + API smoke; no LLM keys required
pytest tests/test_api_smoke.py # quick subset

The frontend has no build step — it's React via the CDN and Babel standalone. Just edit a .jsx file and refresh the browser.

See CONTRIBUTING.md for the contributor checklist and CLAUDE.md for the code-map / conventions.


Roadmap

Recently shipped:

  • Background upload pipeline with NDJSON stage events
  • UI-managed provider matrix (add / swap / test without restart)
  • Three-tier embedding presets (local MiniLM / OpenAI 3-large / BGE-M3)
  • MinerU OCR ingest for scanned PDFs
  • Multi-turn chat with history-aware query rewriting
  • Selection-driven notes generation (per-file checkbox, incremental cache)
  • Central i18n table (frontend/i18n.js) — full zh + en UI parity

Planned (issues welcome):

  • Web URL as a source type (fetch + readable-content extraction, then through the existing chunk / embed / KG pipeline)
  • Vite build option (opt-in, CDN stays default)
  • Mastery-driven exam-prep difficulty curve
  • Cross-course graph linking

Production notes

nano-NotebookLM is designed for single-user / small-team self-hosting. There is no authentication, no rate limiting, no multi-tenant isolation, and no persistent task queue. If you expose it on the public internet:

  • Put it behind a reverse proxy with HTTP basic auth (or OAuth).
  • Disable force=true regen endpoints externally — they call the LLM on demand without per-IP throttling.
  • Move artifacts/ to a persistent volume.

License

Released under the Apache License 2.0. Free to use, modify, and redistribute — including in commercial products — provided the original copyright notice, license text, and NOTICE file are retained, and any modified files are marked as changed. See NOTICE for attribution requirements.

Acknowledgements

  • Inspired by Google's NotebookLM. Not affiliated with Google. NotebookLM is a trademark of Google LLC.
  • Naming convention follows nanoGPT and nano-vLLM — small, self-hosted, single-file-friendly homages.
  • Knowledge graph layout: d3-force.
  • PDF rendering: PyMuPDF for extraction, PDF.js in the browser. LaTeX → PDF via tectonic.
  • Embeddings: sentence-transformers, multilingual MiniLM-L12-v2 default.
  • OCR for scanned PDFs: MinerU.

About

An open-source AI study assistant that turns course materials into notes, knowledge graphs, cited answers, and exam prep questions.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors