# **Guided Notebook: Module 2 - Hybrid Search**

*This notebook contains the guided hands-on exercise. Fill in the `... # YOUR CODE HERE` sections to complete the module.*

-----

### **Module 2: Improving Recall with Hybrid Search**

**Objective:**
In our first module, we saw a critical **Recall Failure**. Our basic RAG system, using only semantic search, completely missed the correct document chunk for a query about "share repurchases." It failed to find the right information in the knowledge base.

The objective of this module is to solve that recall problem by implementing a more powerful **Hybrid Search** system. We will combine traditional keyword-based search with the semantic search we've already learned. This will create a much more reliable retriever.

**Learning Objectives:**
By the end of this module, you will be able to:
- Explain the core concept of Hybrid Search and understand the distinct roles of dense (semantic) and sparse (keyword) vectors.
- Implement a hybrid data strategy by creating both dense and sparse embeddings for your documents using open-source models.
- Configure and populate a Qdrant collection that handles a sophisticated hybrid search workload.
- Build a custom retrieval function that performs both dense and sparse searches and fuses the results using **Reciprocal Rank Fusion (RRF)**.
- Diagnose a **Recall Failure** and understand why a narrow search (`k=4`) can cause the system to fail, even with a better algorithm.

**Core Concept: Hybrid Search with Qdrant**
We will create and store two types of vectors for each document chunk:
1.  **Dense Vector (from `bge-m3`):** Captures the *semantic meaning* and conceptual relationships.
2.  **Sparse Vector (from `Splade`):** Captures the *keyword importance*.

When a query comes in, our system will perform two separate searches—one for meaning and one for keywords—and then combine the results. This gives us the best of both worlds, making our system far more robust against the type of keyword-based failure we saw in Module 1.


### **Step 1: Install Dependencies**

In [None]:
# Install all required libraries
!pip install -q langchain langchain-community langchain-groq langchain_huggingface qdrant-client pypdf fastembed

# Ignore standard warnings
import warnings
warnings.filterwarnings('ignore')

-----

### **Step 2: Setup API Key & Document Loading**

This step remains the same as Module 1. We set up our API key, load the NVIDIA financial report PDF, and split it into chunks.

In [None]:
import os
from google.colab import userdata
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# --- Setup API Key ---
# Make sure you have added your GROQ_API_KEY to the Colab secrets manager
os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')

# --- Load and Split Document ---
# Make sure you have uploaded the NVIDIA Q1 FY26 PDF to your Colab session
pdf_path = "./NVIDIA-Q1-FY26-Financial-Results.pdf"
loader = PyPDFLoader(pdf_path)
documents = loader.load()

# Use the same chunking strategy as Module 1
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)

print(f"Document loaded and split into {len(docs)} chunks.")

-----

### **Step 3: Initialize Qdrant for Hybrid Search**

This is a key step. We will create a Qdrant client and then create a new **collection** that is specifically configured to handle both dense and sparse vectors. This is different from Module 1 where we only had one type of vector.


In [None]:
from qdrant_client import QdrantClient, models

# Initialize an in-memory Qdrant client
client = QdrantClient(location=":memory:")

# Define the collection name
collection_name = "rag_foundations_m2_guided"

# --- Best Practice: Check if collection exists before creating ---
if client.collection_exists(collection_name=collection_name):
    print(f"Collection '{collection_name}' already exists. Deleting it to start fresh.")
    client.delete_collection(collection_name=collection_name)

print(f"Creating Qdrant collection '{collection_name}' for hybrid search...")

# YOUR CODE HERE
# Use the client.create_collection() method.
# You need to configure two types of vectors inside the collection:
# 1. A 'dense' vector using models.VectorParams with a size of 1024 and 'DOT' distance.
# 2. A 'text-sparse' sparse vector using models.SparseVectorParams.
# HINT: The structure should look like: client.create_collection(collection_name=..., vectors_config=..., sparse_vectors_config=...)
...

print("Collection created successfully.")

-----

### **Step 4: Embed and Store Documents**

Now we will perform the main data processing. We will loop through every document chunk, create both a dense and a sparse vector for it, and then store them together in our new Qdrant collection.

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from fastembed import SparseTextEmbedding
from tqdm.auto import tqdm

print("Initializing local embedding models...")
# 1. Initialize our embedding models
dense_embed_model = HuggingFaceEmbeddings(
    model_name="BAAI/bge-m3", model_kwargs={"device": "cpu"}, encode_kwargs={"normalize_embeddings": True}
)
sparse_embed_model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
print("Models initialized.")

# 2. Embed and prepare all documents for upsert
print("Embedding and preparing all documents for upsert...")
points_to_upsert = []
for i, doc in enumerate(tqdm(docs, desc="Processing All Documents")):
    doc_text = doc.page_content

    # YOUR CODE HERE (Part 1)
    # Create the dense vector for 'doc_text' using the 'dense_embed_model'.
    # HINT: Use the .embed_query() method.
    dense_vec = ...

    # YOUR CODE HERE (Part 2)
    # Create the sparse vector for 'doc_text' using the 'sparse_embed_model'.
    # HINT: The .embed() method returns a generator, so you must convert it to a list first.
    sparse_vec = ...

    # YOUR CODE HERE (Part 3)
    # Create a Qdrant PointStruct to hold all the data.
    # It needs an id, a payload (with the text and metadata), and a vector dictionary.
    # The vector dictionary should have keys 'dense' and 'text-sparse'.
    # For the sparse vector, you must convert its indices and values to a list.
    # HINT: models.PointStruct(id=..., payload=..., vector={'dense':..., 'text-sparse':...})
    point = ...

    points_to_upsert.append(point)

# 3. Upsert the points to Qdrant
# YOUR CODE HERE (Part 4)
# Upload the prepared points to your Qdrant collection.
# HINT: Use the client.upsert() method.
...

print(f"Successfully embedded and upserted all {len(docs)} documents.")

-----

### **Step 5: Build the Hybrid RAG Chain with RRF**

Now we'll build our retrieval function. This function performs two separate searches in Qdrant (one for dense vectors, one for sparse) and then intelligently combines the results using **Reciprocal Rank Fusion (RRF)** before passing them to the LLM.

In [None]:
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.documents import Document

# Initialize the Groq LLM
llm = ChatGroq(temperature=0, model_name="meta-llama/llama-4-scout-17b-16e-instruct")

# --- Helper function to visualize the context ---
def pretty_print_docs(docs):
    print(f"Found {len(docs)} documents to pass to the LLM.\n")
    for i, doc in enumerate(docs):
        source = doc.metadata.get('source', 'Unknown Source'); page = doc.metadata.get('page', 'Unknown Page')
        print(f"  [{i+1}] Source: {source} (Page: {page})"); print(f"      Content: '{doc.page_content[:150]}...'")
    print("-" * 50)

# --- Custom Retrieval Function with RRF ---
def qdrant_hybrid_retrieve_rrf(query: str, top_k=4) -> list[Document]:
    """
    Performs hybrid search and returns a list of LangChain Document objects
    fused with Reciprocal Rank Fusion (RRF).
    """
    # YOUR CODE HERE (Part 1)
    # Create the dense and sparse vectors for the input 'query'.
    # Remember to handle the instruction prefix for the dense model!
    # HINT: query = f"query: {query}"
    ...
    dense_query_vec = ...
    sparse_query_vec = ...

    # YOUR CODE HERE (Part 2)
    # Perform the two separate searches (dense and sparse) using the client.search() method.
    # Remember to use models.NamedVector and models.NamedSparseVector to specify which vector to search.
    dense_results = ...
    sparse_results = ...

    # --- RRF Fusion Logic (This part is provided for you) ---
    rrf_scores = {}
    doc_lookup = {}
    k_constant = 60

    # Process dense results
    for rank, result in enumerate(dense_results):
        if result.id not in rrf_scores:
            rrf_scores[result.id] = 0
            doc_lookup[result.id] = Document(page_content=result.payload.get('text', ''), metadata={k: v for k, v in result.payload.items() if k != 'text'})
        rrf_scores[result.id] += 1 / (k_constant + rank + 1)

    # Process sparse results
    for rank, result in enumerate(sparse_results):
        if result.id not in rrf_scores:
            rrf_scores[result.id] = 0
            doc_lookup[result.id] = Document(page_content=result.payload.get('text', ''), metadata={k: v for k, v in result.payload.items() if k != 'text'})
        rrf_scores[result.id] += 1 / (k_constant + rank + 1)

    sorted_ids = sorted(rrf_scores.keys(), key=lambda x: rrf_scores[x], reverse=True)
    combined_documents = [doc_lookup[doc_id] for doc_id in sorted_ids]

    print(f"\n--- RRF Fusion Results (Hybrid Search with k={top_k}) ---")
    pretty_print_docs(combined_documents)

    return combined_documents

# --- Build the RAG Chain (This part is provided for you) ---
def format_docs(docs):
    return "\n---\n".join(doc.page_content for doc in docs)

prompt_template = "Answer the question based only on the following context:\n\nContext:\n{context}\n\nQuestion: {question}"
prompt = ChatPromptTemplate.from_template(prompt_template)
rag_chain = (
    {"context": qdrant_hybrid_retrieve_rrf, "question": RunnablePassthrough()} | 
    {"context": (lambda x: format_docs(x['context'])), "question": (lambda x: x['question'])} | 
    prompt | 
    llm | 
    StrOutputParser()
)
print("RAG chain with Qdrant hybrid retrieval is ready.")

-----

### **Step 6: Test the Hybrid RAG Chain**

This is the moment of truth. First, we will test the query that failed in Module 1 to see if our new hybrid search retriever has solved the problem. Then, we will try our new, more difficult query to see if we can find the limits of our current system.

In [None]:
# --- Run the Test Queries ---
# This part is provided for you

# Query #1: The query that failed in Module 1
module_1_failure_query = "How much did NVIDIA spend on share repurchases in the first quarter of fiscal year 2026?"

# Query #2: Our new, more difficult query for this module
module_2_failure_query = "What was the exact value for \"Tax withholding related to common stock from stock plans\" for the period ending April 27, 2025?"

print("--- Testing Query #1 (The Module 1 Failure) ---")
print(f"Query: {module_1_failure_query}\n")
answer_1 = rag_chain.invoke(module_1_failure_query)
print('\033[92m' + f"Answer: {answer_1}\n" + '\033[0m')
print("-" * 100)


print("\n\n--- Testing Query #2 (Our New Challenge) ---")
print(f"Query: {module_2_failure_query}\n")
answer_2 = rag_chain.invoke(module_2_failure_query)
print('\033[91m' + f"Answer: {answer_2}\n" + '\033[0m')
print("-" * 100)