Skip to content

Dioxis34/RAG-Document-Assistant

Repository files navigation

RAG Document Assistant

A local Retrieval-Augmented Generation (RAG) app. Upload PDF or text documents, ask questions in natural language, and get answers generated by Claude that are grounded in your documents and backed by numbered citations pointing at the exact source and page.

Embeddings run locally (no embedding API calls); only answer generation uses the Anthropic Claude API.

Architecture

            +-------------------------------------------------+
            |                   app.py (Streamlit UI)         |
            |  sidebar: upload | doc list | clear KB | API key|
            |  main:    query box | answer | cited sources    |
            +-------+-------------------------+----------------+
                    |                         |
          ingest    |                         |  query
                    v                         v
        +------------------+        +------------------+
        |     ingest.py    |        |   retriever.py   |
        | load -> chunk -> |        | embed query ->   |
        | embed -> upsert  |        | similarity search|
        +--------+---------+        +--------+---------+
                 |                           |
                 v                           v
        +-------------------------------------------------+
        |   ChromaDB (persistent)  +  all-MiniLM-L6-v2    |
        +-------------------------------------------------+
                 | top-k chunks
                 v
        +------------------+
        |      rag.py      |  build context + citations -> Claude -> answer
        +------------------+
                 ^
            config.py (shared constants)

Separation of concerns

File Responsibility
config.py Single source of truth for all tunable constants.
ingest.py Write path: load PDF/txt, chunk, embed, upsert into ChromaDB.
retriever.py Read path: embed the query, run vector similarity search.
rag.py LLM path: build the cited-context prompt and call the Claude API.
app.py UI only: orchestrates the modules above; holds no business logic.

Request flow

  1. Ingest — Each document is split into overlapping, token-bounded chunks. Every chunk is embedded locally with all-MiniLM-L6-v2 and stored in a persistent ChromaDB collection along with its source and page metadata.
  2. Retrieve — The question is embedded with the same model; ChromaDB returns the top-k (default 5) most similar chunks by cosine similarity.
  3. Generate — Those chunks are formatted into a numbered context block and sent to Claude with instructions to answer only from the context and to cite each claim with [n]. The UI shows the answer plus an expandable, numbered list of the exact source chunks (with page and similarity score).

Setup

Requires Python 3.10+.

Quick setup (one command)

A setup script creates the virtual environment, installs dependencies, and seeds a .env file for you:

# Windows (PowerShell)
.\setup.ps1
#   if you get an execution-policy error, run instead:
#   powershell -ExecutionPolicy Bypass -File setup.ps1

# macOS / Linux
bash setup.sh

Manual setup (alternative)

# 1. Create and activate a virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1      # Windows (PowerShell)
source .venv/bin/activate         # macOS/Linux

# 2. Install dependencies
pip install -r requirements.txt

# 3. Seed your environment file
cp .env.example .env              # Windows: copy .env.example .env

Either way, the API key is optional at setup time: add your Anthropic key to .env (get one at https://console.anthropic.com/), or just leave it blank and the app will prompt for it in the sidebar on first use.

Run

streamlit run app.py

First run note: on the very first launch, sentence-transformers downloads the all-MiniLM-L6-v2 embedding model (~90 MB). The app may appear to pause for a few seconds while this happens; the model is cached locally for every run afterward.

Then, in the browser:

  1. Upload one or more PDF / .txt files in the sidebar and click Ingest documents.
  2. Type a question in the main panel and click Ask.
  3. Read the grounded answer and expand the Sources to verify each citation.

Ingested documents persist on disk (in chroma_db/) between runs. Use Clear knowledge base in the sidebar to wipe everything.

Design notes

  • Why 256-token chunks? all-MiniLM-L6-v2 truncates its input at 256 tokens. Chunking any larger would mean the tail of every chunk never influences its own embedding, silently degrading retrieval. Chunks are therefore sized to the embedder's real limit (256 tokens, 50-token overlap), measured with the embedding model's own tokenizer so the units line up exactly.
  • Real vector search, not keywords. Retrieval is cosine similarity over dense embeddings, so a question like "how did sales do?" can match a chunk that says "revenue increased 12%" even with no shared words.
  • Local embeddings. All embedding happens on-device via sentence-transformers, so document text is only sent to a third party at answer-generation time.
  • Grounded + cited. The system prompt forbids outside knowledge and requires [n] citations; if the retrieved context is insufficient, the model is told to say so rather than guess.

Tech stack

  • UI: Streamlit
  • Embeddings: sentence-transformers (all-MiniLM-L6-v2)
  • Vector store: ChromaDB (persistent, local)
  • LLM: Anthropic Claude (claude-sonnet-4-6)
  • PDF parsing: pypdf

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors