Mini RAG - Semantic Search Fundamentals

This project is a hands-on exploration of how modern AI retrieval systems work internally.

Instead of directly jumping into frameworks or full RAG pipelines, this project focuses on understanding the core foundation step-by-step:

Text → Embeddings → Similarity Search → Retrieval

The goal is not just to "use AI tools", but to understand what actually happens behind systems like:

ChatGPT Retrieval
RAG Pipelines
AI Search Engines
Vector Databases
AI Document Search Systems

What We Are Building

We are building a small semantic retrieval engine.

Traditional search systems work using exact keyword matching.

Example:

Query: "CEO of OpenAI"

Matches only if exact words exist.

Semantic Search works differently.

It tries to understand the meaning of text.

Example:

"Who runs OpenAI?"

can still retrieve:

"The CEO of the company is Sam Altman."

even though the words are different.

This is the core idea behind modern AI retrieval systems.

Concepts Covered So Far

1. Embeddings

The model converts text into dense numerical vectors called embeddings.

Example:

"The CEO of OpenAI"
      ↓
[0.12, -0.44, 0.91, ...]

These vectors capture semantic meaning instead of exact words.

Texts with similar meaning produce vectors that are closer together in vector space.

2. Sentence Transformers

We use:

SentenceTransformer('all-MiniLM-L6-v2')

This is a pretrained embedding model optimized for semantic similarity tasks.

Its job is to transform text into embeddings.

3. Semantic Search

Instead of:

keyword search
exact matching

we perform:

meaning-based retrieval

This allows related sentences to be retrieved even when the wording changes.

4. Cosine Similarity

After converting text into vectors, we compare them mathematically using cosine similarity.

Higher cosine similarity score means:

vectors are closer
meanings are more similar

Example:

0.92 → highly similar
0.15 → weak similarity

5. Top-K Retrieval

Instead of retrieving only one result, we retrieve the Top-K most relevant sentences.

Example:

top_k = 2

This is how real retrieval systems work before passing context to LLMs.

6. Multi Query Retrieval

The system now supports multiple queries.

For each query:

Generate query embedding
Compare against stored sentence embeddings
Rank by similarity
Retrieve Top-K matches

Real World Insight

One important observation:

Semantic similarity ≠ factual understanding

Example:

A query about the CEO may sometimes retrieve:

company-related information
organization-related information

instead of the exact factual sentence.

Why?

Because embeddings capture semantic closeness, not strict factual reasoning.

This is one of the major challenges in real-world AI retrieval systems.

Why Modern RAG Systems Are More Advanced

Production systems improve retrieval using:

Better embedding models
Re-ranking models
Hybrid search
Metadata filtering
Vector databases

This project focuses on understanding the foundation first.

Current Retrieval Flow

User Query
    ↓
Embedding Model
    ↓
Query Vector
    ↓
Cosine Similarity Search
    ↓
Top-K Retrieval
    ↓
Relevant Sentences

This is already the core retrieval backbone behind:

RAG systems
AI search
semantic document retrieval
vector database search

Technologies Used

Python
sentence-transformers
scikit-learn

Installation

Install dependencies:

pip install sentence-transformers scikit-learn

Model Used

all-MiniLM-L6-v2

A lightweight and fast sentence-transformer model for semantic similarity tasks.

Example Queries

multiple_queries = [
    "Can you tell me about the CEO of OpenAI?",
    "Where is the company headquartered?",
    "Which company's headquarters are in San Francisco?",
    "The main goal of OpenAI?"
]

Example Output

Query: Where is the company headquartered?

Relevant Sentence:
The headquarters of the company is located in San Francisco.

Similarity Score: 0.7475

What Comes Next

This project will gradually evolve into a complete mini-RAG pipeline.

Next concepts:

Chunking
Vector Databases
FAISS / ChromaDB
Storing embeddings
Retrieval from documents
Context injection into LLMs
Full RAG pipeline

Main Learning Goal

The focus of this project is not just building features.

The focus is understanding:

how retrieval actually works
why embeddings matter
how semantic search differs from traditional search
why vector databases exist
how modern RAG systems are built internally

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini RAG - Semantic Search Fundamentals

What We Are Building

Concepts Covered So Far

1. Embeddings

2. Sentence Transformers

3. Semantic Search

4. Cosine Similarity

5. Top-K Retrieval

6. Multi Query Retrieval

Real World Insight

Why Modern RAG Systems Are More Advanced

Current Retrieval Flow

Technologies Used

Installation

Model Used

Example Queries

Example Output

What Comes Next

Main Learning Goal

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mini RAG - Semantic Search Fundamentals

What We Are Building

Concepts Covered So Far

1. Embeddings

2. Sentence Transformers

3. Semantic Search

4. Cosine Similarity

5. Top-K Retrieval

6. Multi Query Retrieval

Real World Insight

Why Modern RAG Systems Are More Advanced

Current Retrieval Flow

Technologies Used

Installation

Model Used

Example Queries

Example Output

What Comes Next

Main Learning Goal

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages