Retrieval-Augmented Generation built entirely by hand. No LangChain. No LlamaIndex. Every algorithm implemented from first principles.
┌─────────────────────────────────────┐
│ INDEXING PHASE │
└─────────────────────────────────────┘
PDF / TXT
│
▼
┌──────────┐ raw text ┌───────────┐ chunks ┌─────────────┐
│ingestion │ ─────────────► │ chunker │ ─────────────► │ embeddings │
└──────────┘ └───────────┘ └─────────────┘
PyPDF2 recursive split all-MiniLM-L6
chunk_size=512 mean pooling
overlap=64 L2 normalize
│ │
│ chunks + embeddings │
▼ ▼
┌─────────┐ ┌────────────┐
│ BM25 │ │VectorStore │
└─────────┘ └────────────┘
TF-IDF math NumPy arrays
fit(corpus) cosine search
┌─────────────────────────────────────┐
│ QUERY PHASE │
└─────────────────────────────────────┘
User Query
│
├─────────────────────────────────────────────────┐
│ │
▼ ▼
┌──────────┐ query vec ┌────────────┐ ┌─────────┐
│embeddings│ ────────────►│VectorStore │ │ BM25 │
└──────────┘ │ .search() │ │get_top_n│
└────────────┘ └─────────┘
│ │
dense results sparse results
│ │
└──────────┬───────────┘
│
▼
┌────────────┐
│ retriever │
│ RRF │
└────────────┘
Reciprocal Rank
Fusion (k=60)
│
top candidates
│
▼
┌────────────┐
│ reranker │
│cross-encoder│
└────────────┘
ms-marco-MiniLM
joint attention
│
reranked chunks
│
▼
┌────────────┐
│ generator │
│ raw HTTP │
└────────────┘
/chat/completions
source attribution
│
▼
Answer + Sources
| File | Role | Key Technology |
|---|---|---|
src/ingestion.py |
Load PDF and text files | PyPDF2 |
src/chunker.py |
Split text into overlapping chunks | Recursive separator algorithm |
src/embeddings.py |
Batch sentence embeddings | HF Transformers + mean pooling |
src/vectorstore.py |
Exact cosine similarity index | NumPy dot product |
src/bm25.py |
Sparse lexical retrieval | Okapi BM25 from scratch |
src/retriever.py |
Hybrid dense+sparse fusion | Reciprocal Rank Fusion |
src/reranker.py |
Cross-encoder re-ranking | MS-MARCO MiniLM |
src/generator.py |
Grounded answer generation | Raw requests HTTP call |
src/evaluation.py |
Pipeline quality metrics | Precision, Recall, MRR |
app.py |
Interactive demo UI | Streamlit |
git clone https://github.com/Archit-Konde/RAG.git
cd RAG
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your API key
streamlit run app.py- Fork this repo
- Create a new Space (SDK: Streamlit)
- Push or link the repo — Spaces reads the YAML frontmatter above
- Enter your API key in the sidebar (no
.envneeded on Spaces)
# All tests (note: embeddings/reranker tests download models on first run)
pytest tests/ -v
# Fast tests only (no model downloads)
pytest tests/ -v --ignore=tests/test_embeddings.py --ignore=tests/test_reranker.py
# With coverage report
pytest tests/ --cov=src --cov-report=term-missingEvaluated on a 25-question QA set over a 30-section HTTP/1.1 protocol corpus (~12,000 characters).
Ground truth was verified by inspecting chunk boundaries before authoring test cases.
Reproduce with: python scripts/run_benchmark.py
| Metric | Dense only | Sparse only | Hybrid (RRF) | Hybrid + Rerank |
|---|---|---|---|---|
| Precision@5 | 0.2240 | 0.2160 | 0.2240 | 0.2240 |
| Recall@5 | 1.0000 | 0.9600 | 1.0000 | 1.0000 |
| MRR | 0.9733 | 0.8933 | 0.9800 | 1.0000 |
Hybrid + Rerank achieves MRR = 1.0 — the cross-encoder placed the most relevant chunk at rank 1 for every query. Precision@5 is low by design: a 30-chunk corpus with top-5 retrieval means 25 non-relevant chunks are always returned alongside the correct one.
No framework abstractions — every algorithm is implemented directly:
chunker.py: Recursive separator-based splitting with a deque-window overlapbm25.py: Okapi BM25 with Robertson-Walker IDF from the formula upvectorstore.py: Cosine similarity = dot product after L2 normalizationretriever.py: RRF withscore = Σ 1/(k + rank)across dense + sparse listsembeddings.py: HFAutoModel+ manual mean pooling (not sentence-transformers)reranker.py: Cross-encoder raw logit scoring (not softmax — ranking only needs order)generator.py:requests.postto/chat/completions— works with any OpenAI-compatible API
Learning document → docs/LEARNING.md — full math derivations for each algorithm, suitable for a blog post.
- Project Page — terminal-styled landing page
- Live Demo — HuggingFace Spaces
- Learning Document — full math derivations (blog post source)