RAG Pipeline — From Scratch

Retrieval-Augmented Generation built entirely by hand. No LangChain. No LlamaIndex. Every algorithm implemented from first principles.

Architecture

                        ┌─────────────────────────────────────┐
                        │           INDEXING PHASE            │
                        └─────────────────────────────────────┘

  PDF / TXT
      │
      ▼
 ┌──────────┐    raw text    ┌───────────┐    chunks     ┌─────────────┐
 │ingestion │ ─────────────► │  chunker  │ ─────────────► │  embeddings │
 └──────────┘                └───────────┘                └─────────────┘
   PyPDF2                  recursive split                 all-MiniLM-L6
                           chunk_size=512                  mean pooling
                           overlap=64                      L2 normalize
                                │                               │
                                │         chunks + embeddings   │
                                ▼                               ▼
                           ┌─────────┐                  ┌────────────┐
                           │  BM25   │                  │VectorStore │
                           └─────────┘                  └────────────┘
                           TF-IDF math                  NumPy arrays
                           fit(corpus)                  cosine search


                        ┌─────────────────────────────────────┐
                        │            QUERY PHASE              │
                        └─────────────────────────────────────┘

  User Query
      │
      ├─────────────────────────────────────────────────┐
      │                                                 │
      ▼                                                 ▼
 ┌──────────┐  query vec   ┌────────────┐         ┌─────────┐
 │embeddings│ ────────────►│VectorStore │         │  BM25   │
 └──────────┘              │  .search() │         │get_top_n│
                           └────────────┘         └─────────┘
                                │                      │
                          dense results          sparse results
                                │                      │
                                └──────────┬───────────┘
                                           │
                                           ▼
                                    ┌────────────┐
                                    │  retriever │
                                    │    RRF     │
                                    └────────────┘
                                    Reciprocal Rank
                                    Fusion (k=60)
                                           │
                                    top candidates
                                           │
                                           ▼
                                    ┌────────────┐
                                    │  reranker  │
                                    │cross-encoder│
                                    └────────────┘
                                    ms-marco-MiniLM
                                    joint attention
                                           │
                                    reranked chunks
                                           │
                                           ▼
                                    ┌────────────┐
                                    │ generator  │
                                    │ raw HTTP   │
                                    └────────────┘
                                    /chat/completions
                                    source attribution
                                           │
                                           ▼
                                    Answer + Sources

Components

File	Role	Key Technology
`src/ingestion.py`	Load PDF and text files	PyPDF2
`src/chunker.py`	Split text into overlapping chunks	Recursive separator algorithm
`src/embeddings.py`	Batch sentence embeddings	HF Transformers + mean pooling
`src/vectorstore.py`	Exact cosine similarity index	NumPy dot product
`src/bm25.py`	Sparse lexical retrieval	Okapi BM25 from scratch
`src/retriever.py`	Hybrid dense+sparse fusion	Reciprocal Rank Fusion
`src/reranker.py`	Cross-encoder re-ranking	MS-MARCO MiniLM
`src/generator.py`	Grounded answer generation	Raw `requests` HTTP call
`src/evaluation.py`	Pipeline quality metrics	Precision, Recall, MRR
`app.py`	Interactive demo UI	Streamlit

Setup

git clone https://github.com/Archit-Konde/RAG.git
cd RAG

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

pip install -r requirements.txt

cp .env.example .env
# Edit .env and add your API key

streamlit run app.py

HuggingFace Spaces

Fork this repo
Create a new Space (SDK: Streamlit)
Push or link the repo — Spaces reads the YAML frontmatter above
Enter your API key in the sidebar (no .env needed on Spaces)

Running Tests

# All tests (note: embeddings/reranker tests download models on first run)
pytest tests/ -v

# Fast tests only (no model downloads)
pytest tests/ -v --ignore=tests/test_embeddings.py --ignore=tests/test_reranker.py

# With coverage report
pytest tests/ --cov=src --cov-report=term-missing

Benchmarks

Evaluated on a 25-question QA set over a 30-section HTTP/1.1 protocol corpus (~12,000 characters). Ground truth was verified by inspecting chunk boundaries before authoring test cases. Reproduce with: python scripts/run_benchmark.py

Metric	Dense only	Sparse only	Hybrid (RRF)	Hybrid + Rerank
Precision@5	0.2240	0.2160	0.2240	0.2240
Recall@5	1.0000	0.9600	1.0000	1.0000
MRR	0.9733	0.8933	0.9800	1.0000

Hybrid + Rerank achieves MRR = 1.0 — the cross-encoder placed the most relevant chunk at rank 1 for every query. Precision@5 is low by design: a 30-chunk corpus with top-5 retrieval means 25 non-relevant chunks are always returned alongside the correct one.

Key Implementation Notes

No framework abstractions — every algorithm is implemented directly:

chunker.py: Recursive separator-based splitting with a deque-window overlap
bm25.py: Okapi BM25 with Robertson-Walker IDF from the formula up
vectorstore.py: Cosine similarity = dot product after L2 normalization
retriever.py: RRF with score = Σ 1/(k + rank) across dense + sparse lists
embeddings.py: HF AutoModel + manual mean pooling (not sentence-transformers)
reranker.py: Cross-encoder raw logit scoring (not softmax — ranking only needs order)
generator.py: requests.post to /chat/completions — works with any OpenAI-compatible API

Learning document → docs/LEARNING.md — full math derivations for each algorithm, suitable for a blog post.

Links

Project Page — terminal-styled landing page
Live Demo — HuggingFace Spaces
Learning Document — full math derivations (blog post source)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
data		data
docs		docs
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
app.py		app.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Pipeline — From Scratch

Architecture

Components

Setup

HuggingFace Spaces

Running Tests

Benchmarks

Key Implementation Notes

Links

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Pipeline — From Scratch

Architecture

Components

Setup

HuggingFace Spaces

Running Tests

Benchmarks

Key Implementation Notes

Links

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages