# **Understanding the RAG Architecture**

In [1]:
# Data Ingestion Pipeline

import os
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_chroma import Chroma
from dotenv import load_dotenv
from langchain_ollama import OllamaEmbeddings

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


False

In [2]:
def load_documents(directory_path="data/txt_docs"):
    """Load all the text documents from the specified directory."""
    print("Loading documents from directory:", directory_path)

    # Check if docs exists in the directory
    if not os.path.exists(directory_path):
        raise FileNotFoundError(f"The directory '{directory_path}' does not exist. Please create it and add your documents.")
    
    # Load documents using DirectoryLoader
    loader = DirectoryLoader(directory_path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs={"encoding": "utf-8"})
    documents = loader.load()

    if len(documents) == 0:
        raise ValueError(f"No text documents found in the directory '{directory_path}'. Please add some .txt files.")
    
    for i, doc in enumerate(documents):
        print(f"\nDoc {i+1}:")
        print(f"Document {i+1}: {doc.metadata['source']} (Length: {len(doc.page_content)} characters)")
        print(f"Content Preview: {doc.page_content[:200]}...\n")  # Print the first 200 characters of the document
        print("Metadata:", doc.metadata)

    return documents

In [4]:
# 2. Split Documents into Chunks

def split_documents(documents, chunk_size=1000, chunk_overlap=200):
    """Split into smaller chunks with overlap..."""
    print(f"Splitting documents into chunks ")

    text_splitter = CharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
    )

    chunks = text_splitter.split_documents(documents)

    if chunks:
        for i, chunk in enumerate(chunks):
            print(f"\n--- Chunk {i+1} ---")
            print(f"Source: {chunk.metadata['source']}")
            print(f"Chunk Length: {len(chunk.page_content)} characters")
            print(f"Chunk Content Preview: {chunk.page_content[:200]}...")
            print("-" *50)

        if len(chunks) > 5:
            print(f"\n...and {len(chunks) - 5} more chunks.")
    
    return chunks

In [3]:
# 3. Create Vector Store with Chroma

def create_vector_store(chunks, collection_name="text_files"):
    """Create and persist chromaDB vector store."""

    embedding_model = OllamaEmbeddings(model="nomic-embed-text")

    # Create Chroma vector store
    print("--- Creating Chrome Vector Store ---")

    vector_store = Chroma.from_documents(
        documents = chunks,
        embedding = embedding_model,
        persist_directory="/chroma_db",
        collection_name=collection_name,
        collection_metadata={"hnsw:space": "cosine"}
    )
    print("--- Finished creating Chrome Vector Store ---")
    print("Vector store created and saved to /chroma_db directory.")

    return vector_store

What is HNSW?

HNSW = Hierarchical Navigable Small World

It is:

A fast approximate nearest neighbor search algorithm

Used internally by Chroma for similarity search

In [5]:
def main():
    # 1. Load the files
    docs = load_documents(directory_path="../data/txt_docs")
    
    # 2. Split the documents into chunks
    chunks = split_documents(docs, chunk_overlap=0)

    # 3. Create and persist the vector store
    create_vector_store(chunks, collection_name="text_files")
    
main()

Loading documents from directory: ../data/txt_docs


Created a chunk of size 1055, which is longer than the specified 1000
Created a chunk of size 1436, which is longer than the specified 1000
Created a chunk of size 1039, which is longer than the specified 1000
Created a chunk of size 1078, which is longer than the specified 1000
Created a chunk of size 1043, which is longer than the specified 1000
Created a chunk of size 1019, which is longer than the specified 1000
Created a chunk of size 1068, which is longer than the specified 1000
Created a chunk of size 1211, which is longer than the specified 1000
Created a chunk of size 1450, which is longer than the specified 1000
Created a chunk of size 1762, which is longer than the specified 1000
Created a chunk of size 1038, which is longer than the specified 1000
Created a chunk of size 1120, which is longer than the specified 1000
Created a chunk of size 1076, which is longer than the specified 1000
Created a chunk of size 1090, which is longer than the specified 1000
Created a chunk of s


Doc 1:
Document 1: ..\data\txt_docs\Google.txt (Length: 232201 characters)
Content Preview: ﻿Google
Google LLC (/ˈɡuːɡəl/ ⓘ , GOO-gəl) is an Google LLC
American multinational corporation and technology
company focusing on online advertising, search engine
technology, cloud computing, compute...

Metadata: {'source': '..\\data\\txt_docs\\Google.txt'}

Doc 2:
Document 2: ..\data\txt_docs\Microsoft.txt (Length: 201014 characters)
Content Preview: ﻿Microsoft
Microsoft Corporation is an American multinational Microsoft Corporation
corporation and technology conglomerate
headquartered in Redmond, Washington.[2] Founded
in 1975, the company became...

Metadata: {'source': '..\\data\\txt_docs\\Microsoft.txt'}

Doc 3:
Document 3: ..\data\txt_docs\Nvidia.txt (Length: 148417 characters)
Content Preview: ﻿Nvidia
Nvidia Corporation[a] (/ɛnˈvɪdiə/ en-VID-ee-ə) is an Nvidia Corporation
American technology company headquartered in
Santa Clara, California. Founded in 1993 by Jensen
Huang (president an

## **Retrievial Pipeline**

In [8]:
# Retrievial Pipeline

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

embedding_model = OllamaEmbeddings(model="nomic-embed-text")

vector_store = Chroma(
    embedding_function=embedding_model,
    persist_directory="/chroma_db",
    collection_name="text_files",
    collection_metadata={"hnsw:space": "cosine"}
)

# Search for similar documents
query = "What is the capital of France?"
results = vector_store.similarity_search(query, k=5)
results

[Document(id='b52c9b89-70c5-44ce-a00c-93362272e2b2', metadata={'source': '..\\data\\txt_docs\\SpaceX.txt'}, page_content="Facilities\nSpaceX is headquartered at the SpaceX Starbase near Brownsville, Texas, where it manufactures and\nlaunches its Starship vehicle. However most of the company's operations are based out of its office in\nHawthorne, California where it was previously headquartered, where it builds Falcon rockets and Dragon\nspacecraft, and where it houses its mission control.\n\nThe company also operates a Starlink satellite manufacturing facilities in Redmond, Washington, a rocket\ndevelopment and test facility in McGregor, Texas,[212] and maintains an office in the Washington, D.C.\narea, close to key government customers.[213]\n\nSpaceX has two active launch sites in Florida, one active launch site in California and one active launch\nsite at Starbase in Texas."),
 Document(id='b2a76be9-de5e-4b83-9f6d-766eced835ab', metadata={'source': '..\\data\\txt_docs\\SpaceX.txt'},

In [None]:
# Search for similar documents

query = "Which island did spaceX launch lease for their launches?"
results = vector_store.similarity_search(query, k=5)
results

[Document(id='9c32a864-dd91-43bf-b28e-5413cc7b2ece', metadata={'source': '..\\data\\txt_docs\\SpaceX.txt'}, page_content='On April 14, 2014, SpaceX signed a 20-year lease for\nKennedy Space Center Launch Complex 39A.[249] The pad\nwas subsequently modified to support Falcon 9 and Falcon SpaceX west coast launch facility at\n\nVandenberg Space Force Base, during\nHeavy launches. As of 2024 it is the only pad that supports the launch of CASSIOPE\nFalcon Heavy launches. SpaceX launched its first crewed\nmission to the ISS from Launch Pad 39A on May 30,\n2020.[250] Pad 39A has been prepared since 2019 to eventually accommodate Starship launches. With\ndelays in launch FAA permits for Boca Chica, Texas, the 39A Starship preparation was accelerated in\n2022.[251]'),
 Document(id='abd86865-406d-4232-ad35-e04aba9a25f9', metadata={'source': '..\\data\\txt_docs\\SpaceX.txt'}, page_content='On April 14, 2014, SpaceX signed a 20-year lease for\nKennedy Space Center Launch Complex 39A.[249] The pad

In [10]:
retrieval_results = vector_store.as_retriever(search_kwargs={"k": 5})

query = "Which island did spaceX launch lease for their launches?"
results = retrieval_results.invoke(query)
results

[Document(id='9c32a864-dd91-43bf-b28e-5413cc7b2ece', metadata={'source': '..\\data\\txt_docs\\SpaceX.txt'}, page_content='On April 14, 2014, SpaceX signed a 20-year lease for\nKennedy Space Center Launch Complex 39A.[249] The pad\nwas subsequently modified to support Falcon 9 and Falcon SpaceX west coast launch facility at\n\nVandenberg Space Force Base, during\nHeavy launches. As of 2024 it is the only pad that supports the launch of CASSIOPE\nFalcon Heavy launches. SpaceX launched its first crewed\nmission to the ISS from Launch Pad 39A on May 30,\n2020.[250] Pad 39A has been prepared since 2019 to eventually accommodate Starship launches. With\ndelays in launch FAA permits for Boca Chica, Texas, the 39A Starship preparation was accelerated in\n2022.[251]'),
 Document(id='abd86865-406d-4232-ad35-e04aba9a25f9', metadata={'source': '..\\data\\txt_docs\\SpaceX.txt'}, page_content='On April 14, 2014, SpaceX signed a 20-year lease for\nKennedy Space Center Launch Complex 39A.[249] The pad

Very good question 👌 — this is an important LangChain design concept.

You’re comparing:

---

## 🔹 Method 1 — Direct Similarity Search

```python
results = vector_store.similarity_search(query, k=5)
```

### What this does:

* Directly queries the vector database
* Returns top-k most similar documents
* No abstraction layer
* No retriever logic
* No re-ranking
* No metadata filtering (unless manually added)

### Flow:

```
Query → Embed → Vector Store → Top-k Docs
```

It is a **raw vector search call**.

---

## 🔹 Method 2 — Using a Retriever

```python
retrieval_results = vector_store.as_retriever(search_kwargs={"k": 5})
results = retrieval_results.invoke(query)
```

### What this does:

* Wraps the vector store inside a Retriever interface
* Returns top-k documents
* Designed to plug into RAG chains
* Can support:

  * MMR search
  * Metadata filtering
  * Hybrid search
  * Re-ranking
  * Parent-child retrieval
  * Multi-query retrieval

### Flow:

```
Query → Retriever → Vector Store → Top-k Docs
```

It adds an abstraction layer.

---

# 🎯 So What’s the Actual Difference?

### 🔹 `similarity_search()`

* Low-level
* Direct call
* Simple
* Good for testing/debugging

### 🔹 `as_retriever()`

* High-level
* Standard LangChain interface
* Used inside RAG pipelines
* Extensible
* Production-ready

---

# 🧠 Why Retrievers Exist

LangChain RAG chains expect:

```
retriever.invoke(query)
```

Not:

```
vector_store.similarity_search()
```

Because retrievers are standardized.

Example:

```python
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retrieval_results
)
```

You cannot pass `similarity_search()` there.

You must pass a retriever.

---

# 🚀 Advanced Example

Retriever allows:

```python
vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "lambda_mult": 0.5}
)
```

Now you get:

* Diversity-aware retrieval (MMR)

You cannot easily do that with basic similarity_search.

---

# 🔥 In Simple Terms

| similarity_search | as_retriever      |
| ----------------- | ----------------- |
| Direct search     | Wrapped interface |
| Basic             | Extensible        |
| Manual usage      | Chain compatible  |
| Debugging         | Production RAG    |

---


- *It’s like wrapping the vector store inside a reusable function-like layer.*

✔ It adds a reusable layer

✔ You configure once

✔ You pass query easily

✔ It integrates cleanly with chains

✔ It’s cleaner architecture

In the background it still calls a search method on the vector store.
> ❗ But **which search method** depends on how the retriever is configured.

Let’s break it properly.

---

# 🔹 Case 1 — Default Retriever

If you do:

```python
retriever = vector_store.as_retriever(search_kwargs={"k": 5})
```

Then internally:

```
retriever.invoke(query)
    → vector_store.similarity_search(query, k=5)
```

So yes — it calls `similarity_search()` behind the scenes.

Nothing magical.

---

# 🔹 Case 2 — MMR Retriever

If you do:

```python
retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5}
)
```

Now internally:

```
retriever.invoke(query)
    → vector_store.max_marginal_relevance_search()
```

So now it is NOT plain similarity search.

It uses MMR (Max Marginal Relevance):

* Picks relevant documents
* Ensures diversity
* Avoids redundant chunks

---

# 🔹 Case 3 — Hybrid Search (Example: BM25 + Vector)

If you build a hybrid retriever:

* It may call:

  * vector search
  * keyword search (BM25)
  * Then fuse results (e.g., RRF)

So in that case:

```
retriever.invoke(query)
    → vector_search()
    → bm25_search()
    → combine_results()
```

So retriever is flexible.

---

# 🔹 Case 4 — Advanced Retrievers

Examples:

* MultiQueryRetriever
  → Generates multiple query variations
  → Calls search multiple times

* ParentDocumentRetriever
  → Retrieves child chunks
  → Then maps to parent document

* ContextualCompressionRetriever
  → Retrieves documents
  → Then filters/compresses them with LLM

In these cases, it’s much more than simple search.

---

# 🧠 So What Is a Retriever Really?

It is:

> A retrieval strategy wrapper.

The vector store only knows:

```
How to search vectors
```

The retriever knows:

```
When to search
How many times
What type of search
How to post-process results
```

---

# 🔥 Clean Concept

| Component    | Responsibility            |
| ------------ | ------------------------- |
| Vector Store | Store & search embeddings |
| Retriever    | Decide how to retrieve    |
| RAG Chain    | Use retrieved docs + LLM  |

---

System Design Perspective

Good RAG architecture follows:

```
User Query
   ↓
Retriever (strategy layer)
   ↓
Vector Store (storage layer)
   ↓
LLM (reasoning layer)
```

If you remove retriever:

```
User Query
   ↓
Vector Store (hardcoded)
   ↓
  LLM
```


No strategy layer.

That’s the limitation.


# 1️⃣ RAG (Retrieval-Augmented Generation)

**Definition:**
A system architecture where an LLM generates answers using externally retrieved documents as context.

**Formula conceptually:**

```
Query → Retrieve documents → Provide to LLM → Generate answer
```

Purpose:

* Reduce hallucination
* Use private knowledge base
* Avoid fine-tuning

---

# 2️⃣ Vector Retrieval

**Definition:**
Retrieving documents based on embedding similarity between the query vector and document vectors.

Steps:

1. Convert query → embedding
2. Compare with stored embeddings
3. Return top-k closest vectors

Used for:

* Semantic search
* Meaning-based retrieval

---

# 3️⃣ BM25

**Definition:**
A probabilistic keyword-based ranking algorithm used in traditional information retrieval.

It scores documents based on:

* Term frequency
* Inverse document frequency
* Document length normalization

Used for:

* Exact keyword matching
* Technical queries
* Structured data

---

# 4️⃣ Hybrid Retrieval

**Definition:**
Combining vector retrieval (semantic) and keyword retrieval (BM25) to improve recall.

Common fusion methods:

* Weighted sum
* RRF (Reciprocal Rank Fusion)

Purpose:

* Capture both semantic meaning and exact terms

---

# 5️⃣ Cosine Similarity

**Definition:**
A similarity metric that measures the cosine of the angle between two vectors.

Formula:

```
cos(θ) = (A · B) / (||A|| ||B||)
```

Range:

* 1 → identical direction
* 0 → unrelated
* -1 → opposite

Used when:

* Embeddings are normalized
* Direction matters more than magnitude

Most common metric in embedding systems.

---

# 6️⃣ Dot Product

**Definition:**
Multiplication of corresponding elements of two vectors and summing them.

Formula:

```
A · B = Σ AiBi
```

Difference from cosine:

* Dot product includes magnitude
* Cosine ignores magnitude (normalized)

Used in:

* Many large-scale embedding systems
* When embeddings are pre-normalized

---

# 7️⃣ Euclidean Distance (L2 Distance)

**Definition:**
Measures straight-line distance between two vectors.

Formula:

```
√Σ(Ai − Bi)²
```

Lower value = more similar.

Used when:

* Absolute distance matters
* Not always ideal for high-dimensional embeddings

---

# 8️⃣ HNSW (Hierarchical Navigable Small World)

**Definition:**
An Approximate Nearest Neighbor (ANN) algorithm used to efficiently search large vector spaces.

What it does:

* Builds graph structure between vectors
* Enables fast nearest neighbor search
* Trades slight accuracy for speed

Important:
HNSW is NOT a similarity metric.
It is a search strategy.

Metric (cosine/dot/L2) is applied after candidate search.

---

# 9️⃣ MMR (Maximal Marginal Relevance)

**Definition:**
A document selection strategy that balances relevance to the query and diversity among selected documents.

Formula concept:

```
Score = λ * relevance(query, doc)
        − (1 − λ) * similarity(doc, selected_docs)
```

Purpose:

* Reduce redundancy
* Increase diversity in retrieved context

Not a search algorithm.
Not a metric.
It is a selection strategy.

---

# 🔟 Re-ranking

**Definition:**
A second-stage ranking step that re-scores top retrieved documents using a more powerful model (often cross-encoder).

Process:

1. Retrieve top N (e.g., 20)
2. Pass (query, document) pairs to cross-encoder
3. Re-score precisely
4. Select best few

Purpose:

* Improve precision
* Better final ranking

Most powerful stage for quality.

---

# 1️⃣1️⃣ RRF (Reciprocal Rank Fusion)

**Definition:**
A rank-based fusion algorithm used to combine multiple ranked lists.

Formula:

```
RRF_score = Σ (1 / (k + rank))
```

Where:

* rank = position in each list
* k = constant (usually 60)

Why used:

* Simple
* Stable
* Does not require score normalization

Very common in hybrid retrieval.

---

# 1️⃣2️⃣ ANN (Approximate Nearest Neighbor)

**Definition:**
Algorithms that quickly find near-nearest vectors instead of exact nearest vectors.

Examples:

* HNSW
* IVF
* PQ

Used when:

* Dataset is large
* Exact search is slow

---

# 🔥 Final Architecture View

Full advanced pipeline:

```
Query
   ↓
Embedding
   ↓
Vector Search (HNSW + Cosine)
   ↓
MMR (optional)
   ↓
BM25
   ↓
RRF Fusion
   ↓
Re-ranking (Cross-Encoder)
   ↓
Top-k to LLM
```

In [12]:
# Test Question - 2

query = "What was Nvidias's First Graphic Card Called?"
results = retrieval_results.invoke(query)
results

[Document(id='2a89e86d-0d76-47b7-9e53-b27df921eff1', metadata={'source': '..\\data\\txt_docs\\Google.txt'}, page_content='71. Kawamoto, Dawn (April 29, 2004). "Want In on Google\'s IPO?" (https://www.zdnet.com/articl\ne/want-in-on-googles-ipo/). ZDNet. Archived (https://web.archive.org/web/20111228131306/\nhttp://www.zdnet.com/news/want-in-on-googles-ipo/135799) from the original on December\n28, 2011. Retrieved February 19, 2010.\n\n72. Webb, Cynthia L. (August 19, 2004). "Google\'s IPO: Grate Expectations" (https://www.washi\nngtonpost.com/wp-dyn/articles/A14939-2004Aug19.html). The Washington Post.\nWashington, D.C. Archived (https://web.archive.org/web/20121112172750/http://www.washi\nngtonpost.com/wp-dyn/articles/A14939-2004Aug19.html) from the original on November 12,\n2012. Retrieved February 19, 2010.'),
 Document(id='961ec3f4-9b85-4acf-9752-05301cbb8831', metadata={'source': '..\\data\\txt_docs\\Google.txt'}, page_content='71. Kawamoto, Dawn (April 29, 2004). "Want In on Googl

In [None]:
# Test Question - 3

query = "What was Nvidias's Firs Graphic accelerator Called?"
results = retrieval_results.invoke(query)
results

[Document(id='b6c9b0c8-7d9a-4d5a-bba9-2d2776aaeaa1', metadata={'source': '..\\data\\txt_docs\\Nvidia.txt'}, page_content='77. "The World\'s Most Powerful Graphics Card | NVIDIA TITAN V" (https://web.archive.org/web/\n20201112220854/https://www.nvidia.com/en-us/titan/titan-v/). NVIDIA. Archived from the\noriginal (https://www.nvidia.com/en-us/titan/titan-v/) on November 12, 2020. Retrieved\nDecember 8, 2017.\n\n78. "News Archive" (http://nvidianews.nvidia.com/news). NVIDIA Newsroom. Archived (https://w\neb.archive.org/web/20210421043716/https://nvidianews.nvidia.com/news) from the original\non April 21, 2021. Retrieved April 26, 2021.\n\n79. "Google Cloud gets support for Nvidia\'s Tesla P4 inferencing accelerators" (https://techcrunc\nh.com/2018/08/06/google-cloud-gets-support-for-nvidias-tesla-p4-inferencing-accelerators/).\nTech Crunch. August 6, 2018. Archived (https://web.archive.org/web/20201128231831/http\ns://techcrunch.com/2018/08/06/google-cloud-gets-support-for-nvidias-tesla-

In [14]:
# Test Question - 4

query = "What was the name of autonomous spaceport drone ship that have been achievied the first sea landing?"
results = retrieval_results.invoke(query)
results

[Document(id='f490f17f-f075-43c7-856c-18373ba585b7', metadata={'source': '..\\data\\txt_docs\\SpaceX.txt'}, page_content='SpaceX first achieved a successful landing and recovery of a first stage 2014\nin December 2015 with Falcon 9 Flight 20.[59] In April 2016, the\ncompany achieved the first successful landing on the\nautonomous spaceport drone ship (ASDS) Of Course I Still\nLove You in the Atlantic Ocean.[60] By October 2016,\nfollowing the successful landings, SpaceX indicated they\nwere offering their customers a 10% price discount if they\nchoose to fly their payload on a reused Falcon 9 first\nstage.[61]\n\nA second major rocket failure happened in early September\n2016, when a Falcon 9 exploded during a propellant fill Falcon 9 first stage on an autonomous\noperation for a standard pre-launch static fire test. The spaceport drone ship (ASDS) barge after\n\npayload, the AMOS-6 communications satellite valued at the first successful landing at sea,\n\n$200\xa0million, was destroye

In [15]:
# Test Question - 5

query = "Who acquired Github?"
results = retrieval_results.invoke(query)
results

[Document(id='25ff6d4c-4da0-4fb9-b15a-9a77ea25b26f', metadata={'source': '..\\data\\txt_docs\\Microsoft.txt'}, page_content='2020–present: Acquisitions, Xbox Series X/S, and Windows 11\nOn March 26, 2020, Microsoft announced it was acquiring Affirmed Networks for about\n$1.35\xa0 billion.[129][130] Due to the COVID-19 pandemic, Microsoft closed all of its retail stores\nindefinitely due to health concerns.[131] On July 22, 2020, Microsoft announced plans to close its Mixer\nservice, planning to move existing partners to Facebook Gaming.[132]\n\nOn July 31, 2020, it was reported that Microsoft was in talks to acquire TikTok after the Trump\nadministration ordered ByteDance to divest ownership of the application to the U.S.[133] On August 3,\n2020, after speculation on the deal, Donald Trump stated that Microsoft could buy the application,\nhowever, it should be completed by September 15, 2020, and that the United States Department of the\nTreasury should receive a portion if it were to 

In [None]:
# LLM Integration to the Retrieval Pipeline

retrieval_results = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 5,
        "score_threshold": 0.3
    }
) # configuring the retriever to use Maximal Marginal Relevance (MMR) search strategy with a score threshold of 0.3 to filter out less relevant results.

# 🔵 1️⃣ What is `score_threshold: 0.5`?

When you use:

```python
search_type="mmr"
search_kwargs={
    "k": 5,
    "score_threshold": 0.5
}
```

`score_threshold` means:

> Only return documents whose similarity score is ≥ 0.5.

---

### ⚠️ Important Detail

The meaning of `0.5` depends on the similarity metric.

If using:

* Cosine similarity → range is `[-1, 1]`
* Most embeddings → practically `[0, 1]`

So:

```
0.5 = moderately similar
0.8 = very similar
0.3 = weak similarity
```

If a document’s similarity score is below 0.5 → it gets filtered out.

---

### ⚠️ But With MMR

MMR does re-ranking internally.
So the threshold applies to the **initial similarity retrieval before MMR selection**.

If threshold too high → you may get fewer than `k` results.

---

# 🔵 2️⃣ What are Other `search_type` Options?

In LangChain (for Chroma retriever), common types are:

### 1️⃣ `"similarity"`

Standard top-k similarity search.

```
Retrieve top-k closest vectors.
```

Most basic and most common.

---

### 2️⃣ `"mmr"`

Maximal Marginal Relevance.

```
Retrieve candidates → select diverse + relevant.
```

Used when:

* Chunks overlap
* Redundancy exists

---

### 3️⃣ `"similarity_score_threshold"`

This is different.

Instead of top-k, it retrieves:

```
All documents above a similarity score threshold.
```

Example:

```python
search_type="similarity_score_threshold"
search_kwargs={"score_threshold": 0.7}
```

This means:

Return ALL documents with score ≥ 0.7.

Even if that’s:

* 2 docs
* 20 docs

It does not force k.

---

# 🔵 Difference: `similarity` vs `similarity_score_threshold`

### similarity

```
Always returns k documents.
```

Even if similarity is low.

---

### similarity_score_threshold

```
Returns only documents above threshold.
```

May return fewer than k.

Better when:

* You want quality control
* Avoid weak context

---

# 🔵 3️⃣ Your HNSW Doubt — Very Important

```python
collection_metadata={"hnsw:space": "cosine"}
```

This is defined at **database level**, not retriever level.

---

# 🔵 What is `"hnsw:space"`?

This defines the **distance metric used by HNSW index**.

Options:

* `"cosine"`
* `"l2"`
* `"ip"` (inner product / dot product)

This tells Chroma:

> When performing ANN search, use cosine distance (or l2, etc.)

---

# 🔵 So Architecture Separation Is:

### Vector DB Level

You define:

* ANN algorithm (HNSW)
* Distance metric (`hnsw:space`)

---

### Retriever Level

You define:

* Search strategy (similarity / mmr / threshold)
* k value
* score_threshold

---

# 🔵 4️⃣ About Nodes / Chunks

> while preparing retriever we arrange docs in nodes format right?

Correct.

Before retrieval:

1. Load documents
2. Split into chunks (nodes)
3. Embed chunks
4. Store embeddings in vector DB
5. HNSW index built
6. Then retriever works on those stored chunks

Retriever does NOT define nodes.

Chunking is done before database storage.

---

# 🔥 Full Clean Architecture

```
PDF
 ↓
Chunking (nodes)
 ↓
Embedding
 ↓
Store in Chroma
   - HNSW index
   - hnsw:space = cosine
 ↓
Retriever
   - similarity / mmr / threshold
   - k
   - score_threshold
 ↓
LLM
```

Perfect separation of responsibilities.

---

# 🔵 Final Quick Answers

✔ `score_threshold=0.5` → minimum similarity allowed
✔ `similarity_score_threshold` → retrieve all above threshold
✔ `search_type="similarity"` → standard top-k
✔ `search_type="mmr"` → diversity-aware selection
✔ `hnsw:space` → defines distance metric at database level
✔ Chunking is done before storing in vector DB

---


In [51]:
retrieval_results = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "k": 5,
        "score_threshold": 0.3
    }
)

In [52]:
query = "What was the first hardware product launched by Microsoft?"

In [36]:
# LLM Initialization

from langchain_ollama import OllamaLLM
from langchain_core.messages import SystemMessage, HumanMessage

# Initializing the Qwen2.5:1.5b model
llm = OllamaLLM(
    model="qwen2.5:1.5b",
)

In [39]:
# LLM Initialization

from langchain_ollama import OllamaLLM
from langchain_core.messages import SystemMessage, HumanMessage

# Initializing the Qwen2.5:1.5b model
llm = OllamaLLM(
    model="qwen2.5:1.5b",
    temperature=0.0,
    top_p=1.0
)

In [49]:
def answer_query_with_retrieved_docs(query):
    results = retrieval_results.invoke(query)
    # Combine the retriever with an LLM to create a RAG pipeline

    combined_input = f"""
    Based on the retrieved documents, answer the following question: {query}

    Documents:
    {chr(10).join([f"- {doc.page_content}" for doc in results])}

    Provide a concise and accurate answer to the question using the information from the retrieved documents. If the answer is not found in the documents, respond with "Answer not found in the system.  don't make up an answer."
    """

    messages = [
        SystemMessage(
            content="You are a helpful assistant that provides accurate and concise answers based on the retrieved documents."
        ),
        HumanMessage(content=combined_input)
    ]

    result = llm.invoke(messages)

    print("User Query:")
    print(query)
    print("Final Answer:")
    print(result)

In [53]:
answer_query_with_retrieved_docs("What was the first hardware product launched by Microsoft?")

User Query:
What was the first hardware product launched by Microsoft?
Final Answer:
The first hardware product launched by Microsoft was the Windows operating system, which was introduced on October 29, 1985.


In [None]:
answer_query_with_retrieved_docs("What was the first hardware product launched by Microsoft?")

User Query:
What was the first hardware product launched by Microsoft?
Final Answer:
The first hardware product launched by Microsoft was the Windows 95 operating system, which was released in October 1995.


In [50]:
answer_query_with_retrieved_docs("What was the first hardware product launched by Microsoft?")

User Query:
What was the first hardware product launched by Microsoft?
Final Answer:
The first hardware product launched by Microsoft was the Windows operating system, which was introduced on October 29, 1985.


In [54]:
answer_query_with_retrieved_docs("Why did Nvidea Joined Sega?")

User Query:
Why did Nvidea Joined Sega?
Final Answer:
Nvidia joined Sega due to their shared interest in gaming technology and the potential for collaboration on new gaming hardware or software advancements. The documents do not provide specific reasons for Nvidia joining Sega but suggest that both companies were interested in advancing computing security, which could have been a factor in their decision to partner together.


In [56]:
answer_query_with_retrieved_docs("When did layoffs happened in Nvidia? and who did it and when?")

User Query:
When did layoffs happened in Nvidia? and who did it and when?
Final Answer:
Based on the provided documents, there are multiple instances of layoffs at Microsoft:

1. In July 2014, Microsoft announced plans to lay off 18,000 employees.
   - This was a significant layoff as it represented about a 14% reduction in workforce for the largest-ever layoff.

2. In September 2014, Microsoft laid off 2,100 people, including 747 people in Seattle–Redmond area.
3. In October 2014, Microsoft revealed that it was almost done with eliminating 18,000 employees, which was its largest-ever layoff sweep.

These are the instances of layoffs mentioned in the documents provided.
