GitHub - ather-ops/Cortex_RAG: End-to-End RAG System (Embeddings + Vector DB +LLM)

What is Cortex-RAG?

This repository is a hands-on RAG system built from scratch — every concept explained, every line of code written and understood.

RAG = Retrieval Augmented Generation

It means: instead of asking an LLM to remember everything, we give it the ability to search a knowledge base and answer from real retrieved documents.

User Query
    ↓
Embed query into vector
    ↓
Search vector database for similar documents
    ↓
Pass retrieved documents + query to LLM
    ↓
LLM generates grounded, accurate answer

Repository Structure

Cortex-RAG/
├── 01_Embeddings_Basics_to_Advanced.ipynb   ← Start here
├── 02_Netflix_Semantic_Search_Pipeline.ipynb ← Full pipeline
├── 03_Vector_Databases_Chroma.ipynb          ← Coming soon
├── 04_LLM_Response_Generation.ipynb          ← Coming soon
├── 05_Complete_RAG_System.ipynb              ← Final project
├── Netflix_Dataset.csv
├── My_New_Netflix_Dataset.csv
└── README.md

Notebooks

01 — Embeddings: From Zero to Semantic Search

What you learn:

Concept	Description
One-Hot Encoding	Why it fails to capture meaning
Embedding Matrix	How dense vectors solve the problem
Cosine Similarity	Measure meaning — not just spelling
SentenceTransformer	State-of-the-art pretrained embeddings
Semantic Search	Ask anything — find meaning not keywords
Save to CSV	Persist embeddings for reuse

Key insight: King and Queen are close in embedding space. King and Pizza are far apart. That is how machines understand language.

02 — Netflix Semantic Search Pipeline

Full end-to-end pipeline on real data:

EDA Dashboard — 6 Charts:

Chart	What it shows
Pie chart	Movies vs TV Shows split
Horizontal bar	Top 10 content-producing countries
Genre bars	Most popular genres on Netflix
Line chart	Content release growth over years
Bar chart	Audience rating distribution
Dual line	Movies vs TV Shows trend over time

Semantic Search Results:

Query	Top Result
"Romantic movies"	Ankahi Kahaniya — love stories
"Action movies"	Prey — survival thriller
"Steven Spielberg"	Jaws — exact director match
"Comedy movies"	Relevant comedy titles
"Indian content"	Kota Factory, Indian productions

Custom vs Sklearn cosine similarity: Both give identical results. Custom helps you understand the math. Sklearn runs faster at scale.

The RAG Learning Path

Week 1  — Notebook 01 + 02 (Embeddings + Semantic Search)
Week 2  — Notebook 03 (Chroma vector database)
Week 3  — Notebook 04 (HuggingFace LLM response)
Week 4  — Notebook 05 (Complete RAG pipeline)

Related Repository

This repo builds on top of foundational ML work done in:

ML with Scikit-Learn — github.com/ather-ops/ML-with-Scikit-Learn

That repo covers the complete classical ML pipeline:

End-to-end pipelines for classification and regression
Feature engineering, EDA, model evaluation
ROC curves, AUC, confusion matrix, threshold tuning
Production-ready code patterns

Cortex-RAG is the next level — moving from classical ML into modern AI with embeddings and language models.

Prerequisites

pip install pandas numpy matplotlib seaborn
pip install sentence-transformers scikit-learn
pip install chromadb transformers   # for notebooks 03-05

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
01_Embeddings_Basics_to_Advanced.ipynb		01_Embeddings_Basics_to_Advanced.ipynb
02_Netflix_Semantic_Search_Pipeline.ipynb		02_Netflix_Semantic_Search_Pipeline.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Cortex-RAG?

Repository Structure

Notebooks

01 — Embeddings: From Zero to Semantic Search

02 — Netflix Semantic Search Pipeline

The RAG Learning Path

Related Repository

Prerequisites

Skills Demonstrated

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is Cortex-RAG?

Repository Structure

Notebooks

01 — Embeddings: From Zero to Semantic Search

02 — Netflix Semantic Search Pipeline

The RAG Learning Path

Related Repository

Prerequisites

Skills Demonstrated

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages