This project is a hands-on exploration of how modern AI retrieval systems work internally.
Instead of directly jumping into frameworks or full RAG pipelines, this project focuses on understanding the core foundation step-by-step:
Text → Embeddings → Similarity Search → Retrieval
The goal is not just to "use AI tools", but to understand what actually happens behind systems like:
- ChatGPT Retrieval
- RAG Pipelines
- AI Search Engines
- Vector Databases
- AI Document Search Systems
We are building a small semantic retrieval engine.
Traditional search systems work using exact keyword matching.
Example:
Query: "CEO of OpenAI"
Matches only if exact words exist.
Semantic Search works differently.
It tries to understand the meaning of text.
Example:
"Who runs OpenAI?"
can still retrieve:
"The CEO of the company is Sam Altman."
even though the words are different.
This is the core idea behind modern AI retrieval systems.
The model converts text into dense numerical vectors called embeddings.
Example:
"The CEO of OpenAI"
↓
[0.12, -0.44, 0.91, ...]
These vectors capture semantic meaning instead of exact words.
Texts with similar meaning produce vectors that are closer together in vector space.
We use:
SentenceTransformer('all-MiniLM-L6-v2')This is a pretrained embedding model optimized for semantic similarity tasks.
Its job is to transform text into embeddings.
Instead of:
- keyword search
- exact matching
we perform:
- meaning-based retrieval
This allows related sentences to be retrieved even when the wording changes.
After converting text into vectors, we compare them mathematically using cosine similarity.
Higher cosine similarity score means:
- vectors are closer
- meanings are more similar
Example:
0.92 → highly similar
0.15 → weak similarity
Instead of retrieving only one result, we retrieve the Top-K most relevant sentences.
Example:
top_k = 2This is how real retrieval systems work before passing context to LLMs.
The system now supports multiple queries.
For each query:
- Generate query embedding
- Compare against stored sentence embeddings
- Rank by similarity
- Retrieve Top-K matches
One important observation:
Semantic similarity ≠ factual understanding
Example:
A query about the CEO may sometimes retrieve:
- company-related information
- organization-related information
instead of the exact factual sentence.
Why?
Because embeddings capture semantic closeness, not strict factual reasoning.
This is one of the major challenges in real-world AI retrieval systems.
Production systems improve retrieval using:
- Better embedding models
- Re-ranking models
- Hybrid search
- Metadata filtering
- Vector databases
This project focuses on understanding the foundation first.
User Query
↓
Embedding Model
↓
Query Vector
↓
Cosine Similarity Search
↓
Top-K Retrieval
↓
Relevant Sentences
This is already the core retrieval backbone behind:
- RAG systems
- AI search
- semantic document retrieval
- vector database search
- Python
- sentence-transformers
- scikit-learn
Install dependencies:
pip install sentence-transformers scikit-learnall-MiniLM-L6-v2
A lightweight and fast sentence-transformer model for semantic similarity tasks.
multiple_queries = [
"Can you tell me about the CEO of OpenAI?",
"Where is the company headquartered?",
"Which company's headquarters are in San Francisco?",
"The main goal of OpenAI?"
]Query: Where is the company headquartered?
Relevant Sentence:
The headquarters of the company is located in San Francisco.
Similarity Score: 0.7475
This project will gradually evolve into a complete mini-RAG pipeline.
Next concepts:
- Chunking
- Vector Databases
- FAISS / ChromaDB
- Storing embeddings
- Retrieval from documents
- Context injection into LLMs
- Full RAG pipeline
The focus of this project is not just building features.
The focus is understanding:
- how retrieval actually works
- why embeddings matter
- how semantic search differs from traditional search
- why vector databases exist
- how modern RAG systems are built internally