NotebookTg is an open-source, Telegram-first research assistant for source-grounded Q&A, notebook organization, summaries, study materials, and future audio-ready workflows.
It is inspired by the general idea of source-aware research assistants, but all code, prompts, structure, and branding in this repository are original and released under Apache License 2.0.
- Creates personal notebooks for research topics, classes, reports, or team workspaces
- Accepts PDF, TXT, DOCX, Markdown, CSV, and EPUB uploads
- Extracts text, chunks it, embeds it, and stores notebook-scoped retrieval data
- Answers questions against notebook sources only
- Returns grounded answers with inline citation markers
- Falls back clearly when the answer is not present in the sources
- Generates summaries, FAQs, quizzes, flashcards, timelines, briefing notes, podcast scripts, and source comparisons
- Lets each notebook define persona, answer style, and answer length preferences
- Saves notes as first-class notebook sources and can promote the latest assistant reply into a note source
- Supports read-only notebook sharing through share tokens for future web and dashboard surfaces
- Exposes both a Telegram bot and a FastAPI backend
- Telegram bot with notebook CRUD, notebook settings, sharing, note saving, grounded Q&A, and study-generation commands
- File upload handlers for PDF, TXT, DOCX, MD, CSV, and EPUB
- PostgreSQL-backed metadata models with Alembic migrations
- Qdrant-backed vector retrieval with a memory backend for tests
- OpenAI-compatible provider abstraction for chat and embeddings
- Notebook-scoped grounded Q&A with citation mapping
- Summary, FAQ, quiz, flashcard, timeline, briefing, podcast-script, and compare generation
- Dockerized local development
- Basic tests for CRUD, ingestion, chunking, retrieval, citations, notebook preferences, share state, fallback behavior, and bot handlers
The repository includes screenshot placeholders in docs/screenshots/README.md.
See docs/architecture.md for the full diagram and subsystem notes.
Primary packages:
notebooktg/app/bot: Telegram UX and handlersnotebooktg/app/api: FastAPI routesnotebooktg/app/services: notebook, upload, note, sharing, answer, and generation workflowsnotebooktg/app/ingestion: extractors and ingestion pipelinenotebooktg/app/retrieval: chunking, vector search, citationsnotebooktg/app/llm: provider abstractionnotebooktg/app/db: sessions and migrations
python3 -m venv .venv
source .venv/bin/activate
cp .env.example .env
make setupUpdate .env:
- set
BOT_TOKEN - set
GROQ_API_KEY - keep
AI_BASE_URL=https://api.groq.com/openai/v1 - use
DEFAULT_MODEL=openai/gpt-oss-120bfor normal notebook chat flows - use
TOOL_MODEL=groq/compoundfor tool-oriented or heavier reasoning requests - adjust
ALLOWED_EXTENSIONSif you want to narrow or expand supported upload formats - optionally set
EMBEDDING_API_KEY - minimal local bot mode works with the defaults:
DATABASE_URL=sqlite+aiosqlite:///./data/notebooktg.dbVECTOR_BACKEND=memoryEMBEDDING_PROVIDER=local_hashHYBRID_SEARCH=falseENABLE_REDIS=falseSQL_ECHO=false
- for the full shared backend setup, change:
DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/notebooktgVECTOR_BACKEND=qdrantQDRANT_URL=http://localhost:6333
Minimal single-process bot:
python main.pydocker compose up -d postgres qdrant
make migratemake api
make botThe full stack can run in Docker Compose:
cp .env.example .env
docker compose up --buildServices:
- API:
http://localhost:8000 - Qdrant:
http://localhost:6333 - PostgreSQL:
localhost:5432
Key settings are documented in .env.example.
Core variables:
BOT_TOKEN: Telegram bot tokenGROQ_API_KEY: Groq API key used by the OpenAI-compatible chat clientAI_BASE_URL: OpenAI-compatible Groq endpoint rootDEFAULT_MODEL: main chat model used for normal Q&A and notebook generationTOOL_MODEL: advanced model used when routing detects tool/code/web-style requestsDATABASE_URL: async SQLAlchemy database URLVECTOR_BACKEND:qdrantormemoryHYBRID_SEARCH: enable sparse+dense retrieval when optional sparse dependencies are installedQDRANT_URL: Qdrant base URLLLM_PROVIDER: currentlyopenai_compatibleEMBEDDING_PROVIDER: currentlyopenai_compatibleEMBEDDING_MODEL: embedding model nameEMBEDDING_DIMENSION: vector dimension used by the collectionENABLE_REDIS: opt into Redis-backed FSM, caching, and ARQ queuesSQL_ECHO: enable SQLAlchemy SQL echo logging for debuggingUPLOAD_DIR: local source storage directoryALLOWED_EXTENSIONS: defaults topdf,txt,docx,md,csv,epub
Legacy compatibility:
LLM_API_KEYandAI_API_KEYmap toGROQ_API_KEYLLM_BASE_URLmaps toAI_BASE_URLLLM_MODELandAI_MODELmap toDEFAULT_MODEL
NotebookTg uses a small explicit router in notebooktg/app/llm/routing.py.
- Normal chat, grounded Q&A, summaries, quizzes, flashcards, and timelines default to
DEFAULT_MODEL - Compare requests always use
TOOL_MODEL - Requests that explicitly mention tools, browsing, shell usage, debugging, code execution, or step-by-step debugging also route to
TOOL_MODEL
The heuristic is intentionally simple so contributors can adjust it in one place without changing the bot handlers or provider code.
NotebookTg uses an OpenAI-compatible abstraction for both chat and embeddings, so you can connect providers that expose compatible APIs. Typical options include:
- OpenAI
- OpenRouter-backed compatible endpoints
- Groq-compatible gateways if they expose the required routes
- Local gateways that mimic the OpenAI API surface
When changing providers, verify:
- the chat completion path is compatible with
/chat/completions - the embeddings path is compatible with
/embeddings EMBEDDING_DIMENSIONmatches the chosen embedding model
Common commands:
make setup
make lint
make test
make migrate
make api
make botThe project targets modern async Python and is written with SQLAlchemy 2-style typing, Pydantic settings, aiogram, and FastAPI.
/newnotebook <title>/notebooks/use_notebook <id>/delete_notebook <id>/settings/setpersona <text>/setstyle <compact|balanced|detailed>/setlength <short|medium|long>/share_notebook/unshare_notebook/sources/delete_source <id>/asksource <id> <question>/summary/faq/briefing/podcast/flashcards/quiz/timeline/compare <id1> <id2> [id3...]/savenote <title> | <text>/save_last_note [title]
The test suite uses an in-memory vector store and SQLite-backed async sessions, so it can validate notebook workflows without external services.
make testThis repository includes:
- Apache 2.0
LICENSE NOTICECONTRIBUTING.mdCODE_OF_CONDUCT.mdSECURITY.md- issue templates and PR template
.env.example.pre-commit-config.yaml
See docs/roadmap.md.
Planned next steps:
- voice transcription
- audio overviews rendered from the new podcast-script output
- OCR and richer image/audio ingestion
- source deduplication improvements
- shared notebooks with richer public and collaborative surfaces
- web dashboard support on top of the same services
Apache License 2.0. See LICENSE.