Skip to content

SardorchikDev/NotebookTg

NotebookTg

NotebookTg is an open-source, Telegram-first research assistant for source-grounded Q&A, notebook organization, summaries, study materials, and future audio-ready workflows.

It is inspired by the general idea of source-aware research assistants, but all code, prompts, structure, and branding in this repository are original and released under Apache License 2.0.

What It Does

  • Creates personal notebooks for research topics, classes, reports, or team workspaces
  • Accepts PDF, TXT, DOCX, Markdown, CSV, and EPUB uploads
  • Extracts text, chunks it, embeds it, and stores notebook-scoped retrieval data
  • Answers questions against notebook sources only
  • Returns grounded answers with inline citation markers
  • Falls back clearly when the answer is not present in the sources
  • Generates summaries, FAQs, quizzes, flashcards, timelines, briefing notes, podcast scripts, and source comparisons
  • Lets each notebook define persona, answer style, and answer length preferences
  • Saves notes as first-class notebook sources and can promote the latest assistant reply into a note source
  • Supports read-only notebook sharing through share tokens for future web and dashboard surfaces
  • Exposes both a Telegram bot and a FastAPI backend

MVP Features

  • Telegram bot with notebook CRUD, notebook settings, sharing, note saving, grounded Q&A, and study-generation commands
  • File upload handlers for PDF, TXT, DOCX, MD, CSV, and EPUB
  • PostgreSQL-backed metadata models with Alembic migrations
  • Qdrant-backed vector retrieval with a memory backend for tests
  • OpenAI-compatible provider abstraction for chat and embeddings
  • Notebook-scoped grounded Q&A with citation mapping
  • Summary, FAQ, quiz, flashcard, timeline, briefing, podcast-script, and compare generation
  • Dockerized local development
  • Basic tests for CRUD, ingestion, chunking, retrieval, citations, notebook preferences, share state, fallback behavior, and bot handlers

Screenshots

The repository includes screenshot placeholders in docs/screenshots/README.md.

Architecture Overview

See docs/architecture.md for the full diagram and subsystem notes.

Primary packages:

  • notebooktg/app/bot: Telegram UX and handlers
  • notebooktg/app/api: FastAPI routes
  • notebooktg/app/services: notebook, upload, note, sharing, answer, and generation workflows
  • notebooktg/app/ingestion: extractors and ingestion pipeline
  • notebooktg/app/retrieval: chunking, vector search, citations
  • notebooktg/app/llm: provider abstraction
  • notebooktg/app/db: sessions and migrations

Quickstart

1. Local Python setup

python3 -m venv .venv
source .venv/bin/activate
cp .env.example .env
make setup

Update .env:

  • set BOT_TOKEN
  • set GROQ_API_KEY
  • keep AI_BASE_URL=https://api.groq.com/openai/v1
  • use DEFAULT_MODEL=openai/gpt-oss-120b for normal notebook chat flows
  • use TOOL_MODEL=groq/compound for tool-oriented or heavier reasoning requests
  • adjust ALLOWED_EXTENSIONS if you want to narrow or expand supported upload formats
  • optionally set EMBEDDING_API_KEY
  • minimal local bot mode works with the defaults:
    • DATABASE_URL=sqlite+aiosqlite:///./data/notebooktg.db
    • VECTOR_BACKEND=memory
    • EMBEDDING_PROVIDER=local_hash
    • HYBRID_SEARCH=false
    • ENABLE_REDIS=false
    • SQL_ECHO=false
  • for the full shared backend setup, change:
    • DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/notebooktg
    • VECTOR_BACKEND=qdrant
    • QDRANT_URL=http://localhost:6333

2. Run locally

Minimal single-process bot:

python main.py

3. Start full infrastructure

docker compose up -d postgres qdrant
make migrate

4. Run the API or bot

make api
make bot

Docker Usage

The full stack can run in Docker Compose:

cp .env.example .env
docker compose up --build

Services:

  • API: http://localhost:8000
  • Qdrant: http://localhost:6333
  • PostgreSQL: localhost:5432

Environment Variables

Key settings are documented in .env.example.

Core variables:

  • BOT_TOKEN: Telegram bot token
  • GROQ_API_KEY: Groq API key used by the OpenAI-compatible chat client
  • AI_BASE_URL: OpenAI-compatible Groq endpoint root
  • DEFAULT_MODEL: main chat model used for normal Q&A and notebook generation
  • TOOL_MODEL: advanced model used when routing detects tool/code/web-style requests
  • DATABASE_URL: async SQLAlchemy database URL
  • VECTOR_BACKEND: qdrant or memory
  • HYBRID_SEARCH: enable sparse+dense retrieval when optional sparse dependencies are installed
  • QDRANT_URL: Qdrant base URL
  • LLM_PROVIDER: currently openai_compatible
  • EMBEDDING_PROVIDER: currently openai_compatible
  • EMBEDDING_MODEL: embedding model name
  • EMBEDDING_DIMENSION: vector dimension used by the collection
  • ENABLE_REDIS: opt into Redis-backed FSM, caching, and ARQ queues
  • SQL_ECHO: enable SQLAlchemy SQL echo logging for debugging
  • UPLOAD_DIR: local source storage directory
  • ALLOWED_EXTENSIONS: defaults to pdf,txt,docx,md,csv,epub

Legacy compatibility:

  • LLM_API_KEY and AI_API_KEY map to GROQ_API_KEY
  • LLM_BASE_URL maps to AI_BASE_URL
  • LLM_MODEL and AI_MODEL map to DEFAULT_MODEL

Model Routing

NotebookTg uses a small explicit router in notebooktg/app/llm/routing.py.

  • Normal chat, grounded Q&A, summaries, quizzes, flashcards, and timelines default to DEFAULT_MODEL
  • Compare requests always use TOOL_MODEL
  • Requests that explicitly mention tools, browsing, shell usage, debugging, code execution, or step-by-step debugging also route to TOOL_MODEL

The heuristic is intentionally simple so contributors can adjust it in one place without changing the bot handlers or provider code.

Choosing an LLM Provider

NotebookTg uses an OpenAI-compatible abstraction for both chat and embeddings, so you can connect providers that expose compatible APIs. Typical options include:

  • OpenAI
  • OpenRouter-backed compatible endpoints
  • Groq-compatible gateways if they expose the required routes
  • Local gateways that mimic the OpenAI API surface

When changing providers, verify:

  • the chat completion path is compatible with /chat/completions
  • the embeddings path is compatible with /embeddings
  • EMBEDDING_DIMENSION matches the chosen embedding model

Local Development

Common commands:

make setup
make lint
make test
make migrate
make api
make bot

The project targets modern async Python and is written with SQLAlchemy 2-style typing, Pydantic settings, aiogram, and FastAPI.

Telegram Commands

  • /newnotebook <title>
  • /notebooks
  • /use_notebook <id>
  • /delete_notebook <id>
  • /settings
  • /setpersona <text>
  • /setstyle <compact|balanced|detailed>
  • /setlength <short|medium|long>
  • /share_notebook
  • /unshare_notebook
  • /sources
  • /delete_source <id>
  • /asksource <id> <question>
  • /summary
  • /faq
  • /briefing
  • /podcast
  • /flashcards
  • /quiz
  • /timeline
  • /compare <id1> <id2> [id3...]
  • /savenote <title> | <text>
  • /save_last_note [title]

Testing

The test suite uses an in-memory vector store and SQLite-backed async sessions, so it can validate notebook workflows without external services.

make test

Repository Hygiene

This repository includes:

  • Apache 2.0 LICENSE
  • NOTICE
  • CONTRIBUTING.md
  • CODE_OF_CONDUCT.md
  • SECURITY.md
  • issue templates and PR template
  • .env.example
  • .pre-commit-config.yaml

Roadmap

See docs/roadmap.md.

Planned next steps:

  • voice transcription
  • audio overviews rendered from the new podcast-script output
  • OCR and richer image/audio ingestion
  • source deduplication improvements
  • shared notebooks with richer public and collaborative surfaces
  • web dashboard support on top of the same services

License

Apache License 2.0. See LICENSE.

NotebookTg

About

NotebookLm for Telegram you host it for yourself

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages