RAG Document Assistant

A local Retrieval-Augmented Generation (RAG) app. Upload PDF or text documents, ask questions in natural language, and get answers generated by Claude that are grounded in your documents and backed by numbered citations pointing at the exact source and page.

Embeddings run locally (no embedding API calls); only answer generation uses the Anthropic Claude API.

Architecture

            +-------------------------------------------------+
            |                   app.py (Streamlit UI)         |
            |  sidebar: upload | doc list | clear KB | API key|
            |  main:    query box | answer | cited sources    |
            +-------+-------------------------+----------------+
                    |                         |
          ingest    |                         |  query
                    v                         v
        +------------------+        +------------------+
        |     ingest.py    |        |   retriever.py   |
        | load -> chunk -> |        | embed query ->   |
        | embed -> upsert  |        | similarity search|
        +--------+---------+        +--------+---------+
                 |                           |
                 v                           v
        +-------------------------------------------------+
        |   ChromaDB (persistent)  +  all-MiniLM-L6-v2    |
        +-------------------------------------------------+
                 | top-k chunks
                 v
        +------------------+
        |      rag.py      |  build context + citations -> Claude -> answer
        +------------------+
                 ^
            config.py (shared constants)

Separation of concerns

File	Responsibility
`config.py`	Single source of truth for all tunable constants.
`ingest.py`	Write path: load PDF/txt, chunk, embed, upsert into ChromaDB.
`retriever.py`	Read path: embed the query, run vector similarity search.
`rag.py`	LLM path: build the cited-context prompt and call the Claude API.
`app.py`	UI only: orchestrates the modules above; holds no business logic.

Request flow

Ingest — Each document is split into overlapping, token-bounded chunks. Every chunk is embedded locally with all-MiniLM-L6-v2 and stored in a persistent ChromaDB collection along with its source and page metadata.
Retrieve — The question is embedded with the same model; ChromaDB returns the top-k (default 5) most similar chunks by cosine similarity.
Generate — Those chunks are formatted into a numbered context block and sent to Claude with instructions to answer only from the context and to cite each claim with [n]. The UI shows the answer plus an expandable, numbered list of the exact source chunks (with page and similarity score).

Setup

Requires Python 3.10+.

Quick setup (one command)

A setup script creates the virtual environment, installs dependencies, and seeds a .env file for you:

# Windows (PowerShell)
.\setup.ps1
#   if you get an execution-policy error, run instead:
#   powershell -ExecutionPolicy Bypass -File setup.ps1

# macOS / Linux
bash setup.sh

Manual setup (alternative)

# 1. Create and activate a virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1      # Windows (PowerShell)
source .venv/bin/activate         # macOS/Linux

# 2. Install dependencies
pip install -r requirements.txt

# 3. Seed your environment file
cp .env.example .env              # Windows: copy .env.example .env

Either way, the API key is optional at setup time: add your Anthropic key to .env (get one at https://console.anthropic.com/), or just leave it blank and the app will prompt for it in the sidebar on first use.

Run

streamlit run app.py

First run note: on the very first launch, sentence-transformers downloads the all-MiniLM-L6-v2 embedding model (~90 MB). The app may appear to pause for a few seconds while this happens; the model is cached locally for every run afterward.

Then, in the browser:

Upload one or more PDF / .txt files in the sidebar and click Ingest documents.
Type a question in the main panel and click Ask.
Read the grounded answer and expand the Sources to verify each citation.

Ingested documents persist on disk (in chroma_db/) between runs. Use Clear knowledge base in the sidebar to wipe everything.

Design notes

Why 256-token chunks? all-MiniLM-L6-v2 truncates its input at 256 tokens. Chunking any larger would mean the tail of every chunk never influences its own embedding, silently degrading retrieval. Chunks are therefore sized to the embedder's real limit (256 tokens, 50-token overlap), measured with the embedding model's own tokenizer so the units line up exactly.
Real vector search, not keywords. Retrieval is cosine similarity over dense embeddings, so a question like "how did sales do?" can match a chunk that says "revenue increased 12%" even with no shared words.
Local embeddings. All embedding happens on-device via sentence-transformers, so document text is only sent to a third party at answer-generation time.
Grounded + cited. The system prompt forbids outside knowledge and requires [n] citations; if the retrieved context is insufficient, the model is told to say so rather than guess.

Tech stack

UI: Streamlit
Embeddings: sentence-transformers (all-MiniLM-L6-v2)
Vector store: ChromaDB (persistent, local)
LLM: Anthropic Claude (claude-sonnet-4-6)
PDF parsing: pypdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Document Assistant

Architecture

Separation of concerns

Request flow

Setup

Quick setup (one command)

Manual setup (alternative)

Run

Design notes

Tech stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
ingest.py		ingest.py
rag.py		rag.py
requirements.txt		requirements.txt
retriever.py		retriever.py
setup.ps1		setup.ps1
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

RAG Document Assistant

Architecture

Separation of concerns

Request flow

Setup

Quick setup (one command)

Manual setup (alternative)

Run

Design notes

Tech stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages