Skip to content

ctonneslan/rag-knowledge-base

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Knowledge Base

A retrieval-augmented generation system that lets you upload documents (PDF, Markdown, plain text) and ask questions about them with AI-powered answers grounded in your actual data.

Python FastAPI Next.js ChromaDB

Why This Exists

Most chatbot demos just wrap an LLM API with no retrieval layer. This project implements the full RAG pipeline that enterprises actually use: document ingestion, intelligent chunking, vector embeddings, semantic search, and grounded answer generation with source citations.

Architecture

┌─────────────┐     ┌──────────────────────────────────────────┐
│   Next.js    │     │              FastAPI Backend              │
│   Frontend   │────▶│                                          │
│              │     │  Upload ──▶ Extract ──▶ Chunk ──▶ Embed  │
│  • Chat UI   │     │                              │           │
│  • Streaming │     │                              ▼           │
│  • Upload    │     │  Query ──▶ Embed ──▶ Search ──▶ Generate │
│  • Sources   │     │                    ChromaDB    OpenAI    │
└─────────────┘     └──────────────────────────────────────────┘

Features

  • Document ingestion — Upload PDF, Markdown, or TXT files. Text is extracted, split into overlapping chunks, and embedded into a vector store.
  • Semantic search — Queries are embedded and matched against document chunks using cosine similarity via ChromaDB.
  • Streaming responses — Answers stream token-by-token via Server-Sent Events for real-time UI updates.
  • Source citations — Every answer includes the source documents and relevance scores so you can verify claims.
  • Conversation history — The LLM receives recent chat context for follow-up questions.
  • Document management — View all ingested documents with chunk counts, delete individual sources.

Tech Stack

Component Technology Why
Backend API FastAPI Async, fast, automatic OpenAPI docs
Vector store ChromaDB Local, no external service needed, persistent
Embeddings OpenAI text-embedding-3-small High quality, low cost ($0.02/1M tokens)
LLM GPT-4o-mini Fast, cheap, good at following RAG instructions
Text splitting LangChain RecursiveCharacterTextSplitter Handles code, prose, and mixed content
PDF parsing PyPDF Lightweight, no system dependencies
Frontend Next.js 15 + Tailwind CSS Modern React with great DX
Containerization Docker Compose One command to run everything

Quick Start

Prerequisites

1. Clone and configure

git clone https://github.com/ctonneslan/rag-knowledge-base.git
cd rag-knowledge-base

# Set up environment
cp backend/.env.example backend/.env
# Edit backend/.env and add your OPENAI_API_KEY

2. Start the backend

cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload

Backend runs at http://localhost:8000. API docs at http://localhost:8000/docs.

3. Start the frontend

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:3000.

Docker (alternative)

# Make sure backend/.env exists with your API key
docker compose up --build

API Endpoints

Method Endpoint Description
GET /health Health check
POST /documents/upload Upload a document (multipart form)
GET /documents List all ingested documents
DELETE /documents/{source} Delete a document by source name
POST /chat Ask a question (streaming SSE response)

How the RAG Pipeline Works

  1. Ingestion: Documents are parsed, then split into ~1000-character chunks with 200-character overlap using recursive character splitting. This preserves context across chunk boundaries.

  2. Embedding: Each chunk is embedded using OpenAI's text-embedding-3-small model (1536 dimensions). Embeddings are stored in ChromaDB with source metadata.

  3. Retrieval: When a user asks a question, the query is embedded and the top 5 most similar chunks are retrieved using cosine similarity.

  4. Generation: Retrieved chunks are injected into the LLM prompt as context. The model is instructed to only use provided context and cite sources. Responses stream back via SSE.

Project Structure

rag-knowledge-base/
├── backend/
│   ├── main.py           # FastAPI app, routes, CORS
│   ├── config.py         # Settings from environment
│   ├── ingestion.py      # Text extraction + chunking
│   ├── vectorstore.py    # ChromaDB operations + embeddings
│   ├── rag.py            # RAG pipeline + streaming generation
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/
│   ├── src/
│   │   ├── app/          # Next.js app router pages
│   │   ├── components/   # Chat UI + document sidebar
│   │   └── lib/api.ts    # API client with streaming support
│   └── Dockerfile
├── docker-compose.yml
└── README.md

Configuration

All settings are in backend/.env:

Variable Default Description
OPENAI_API_KEY required Your OpenAI API key
CHROMA_PERSIST_DIR ./chroma_data Where ChromaDB stores data
CHUNK_SIZE 1000 Characters per chunk
CHUNK_OVERLAP 200 Overlap between chunks

License

MIT

About

RAG pipeline: upload docs, ask questions with AI answers grounded in your data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors