#**RAG with Hybrid Search**
**✅ What is Hybrid Search?**

Hybrid Search combines:

- Dense retrieval (semantic search using embeddings, e.g. FAISS, **Pinecone**)

- Sparse retrieval (keyword-based search, e.g. **BM25**, TF-IDF)

It tries to balance:

- Recall (not missing relevant docs – sparse methods are good at this)

- Precision (understanding meaning/context – dense methods excel here)

**🧠 Why combine them?**

Some queries benefit from:

- Exact keywords (like names, dates, jargon)

- Semantic similarity (phrasing, paraphrasing, synonyms)

So hybrid search retrieves more relevant context, and that improves the generation quality in RAG.

**🔧 How does it work?**

Query → both (sparse + dense) retrievers.

Retrieved results are merged, scored, deduplicated, or reranked.

Top-k results are passed to the LLM for final response.

**🛠️ Tool that support Hybrid RAG in this demo**

Pinecone supports hybrid search natively, we can use **- PineconeHybridSearchRetriever** directly in our RAG pipeline.



###**Install Dependencies**

In [1]:
!pip install gradio pinecone pinecone-text langchain-community langchain-openai langchain-pinecone langchain-huggingface

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.2.0-py3-none-any.whl.metadata (941 bytes)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12

###**Downloads the ```hrdataset.zip``` file from the CloudYuga GitHub repo**
Saves it in the current working directory of notebook
(e.g., /content/ in Google Colab).


In [2]:
!wget https://github.com/cloudyuga/mastering-genai-w-python/raw/refs/heads/main/hrdataset.zip


--2025-06-05 08:29:32--  https://github.com/cloudyuga/mastering-genai-w-python/raw/refs/heads/main/hrdataset.zip
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/cloudyuga/mastering-genai-w-python/refs/heads/main/hrdataset.zip [following]
--2025-06-05 08:29:32--  https://raw.githubusercontent.com/cloudyuga/mastering-genai-w-python/refs/heads/main/hrdataset.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9530 (9.3K) [application/zip]
Saving to: ‘hrdataset.zip.2’


2025-06-05 08:29:32 (22.5 MB/s) - ‘hrdataset.zip.2’ saved [9530/9530]



###**Unzip ```hrdataset.zip``` file**
- It will automatically create hrdataset folder in our current working directory (/content/ in Google Colab)

In [3]:
!unzip hrdataset.zip

Archive:  hrdataset.zip
replace hrdataset/policies/leave_policies.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: hrdataset/policies/leave_policies.md  
  inflating: hrdataset/policies/training_and_development.md  
  inflating: hrdataset/policies/employee_benefits.md  
  inflating: hrdataset/policies/holiday_calendar.md  
  inflating: hrdataset/policies/events_calendar.md  
  inflating: hrdataset/surveys/Employee_Culture_Survey_Responses.csv  
  inflating: hrdataset/employees/108_Rajesh_Kulkarni.md  
  inflating: hrdataset/employees/106_Neha_Malhotra.md  
  inflating: hrdataset/employees/103_Anjali_Das.md  
  inflating: hrdataset/employees/105_Sunita_Patil.md  
  inflating: hrdataset/employees/101_Priya_Sharma.md  
  inflating: hrdataset/employees/102_Rohit_Mehra.md  
  inflating: hrdataset/employees/104_Karan_Kapoor.md  
  inflating: hrdataset/employees/109_Meera_Iyer.md  
  inflating: hrdataset/employees/110_Aditya_Jain.md  
  inflating: hrdataset/employees/107_Amit_Verma.md

###**Set OpenAI & Pinecone API key**

In [4]:
# Retrieve the API key from Colab's secrets
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
PINECONE_API_KEY = userdata.get('PINECONE_API_KEY')

In [5]:
 # Set as an ENVIROMENT var
import os
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
os.environ['PINECONE_API_KEY'] = PINECONE_API_KEY

 ### **Define a Model**

In [6]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")

###**Create an Index in Pincone with ```dotproduct``` metric**

In [7]:
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone()
INDEX_NAME = "hybrid-search-index"
# Create index if it doesn't exist
if INDEX_NAME not in [i.name for i in pc.list_indexes()]:
    pc.create_index(
        name=INDEX_NAME,
        dimension=384,  # for BGE/SBERT embeddings; use 1536 for OpenAI
        metric="dotproduct",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

index = pc.Index(INDEX_NAME)

###**Read All Markdown Files in hrdataset/**

In [8]:
import glob
from langchain_core.documents import Document

markdown_files = glob.glob("hrdataset/**/*.md", recursive=True)
documents = []

for path in markdown_files:
    with open(path, "r", encoding="utf-8") as f:
        text = f.read()
        documents.append(Document(page_content=text, metadata={"source": path}))

##**Chunk the Documents**

In [9]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(documents)
print(f"✅ Total Chunks Prepared: {len(chunks)}")

✅ Total Chunks Prepared: 20


###**Embed + Store records into Pinecone Index using LangChain**

In [10]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

# You can also use OpenAIEmbeddings if preferred
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

vectorstore = PineconeVectorStore.from_documents(
        documents=chunks,
        embedding=embeddings,
        index_name=INDEX_NAME
    )

print("✅ All records stored in Pinecone!")


✅ All records stored in Pinecone!


###**Create a Hybrid Search Retriever, Retrive Records from Pinecone based on Query and Generate Response using LLM**

In [11]:
from pinecone import Pinecone
from pinecone_text.sparse import BM25Encoder
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.retrievers import PineconeHybridSearchRetriever
import nltk

nltk.download('punkt_tab')
index = pc.Index(INDEX_NAME)

# Define embeddings and sparse encoder
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
bm25_encoder = BM25Encoder.default()

# Initialize the retriever
retriever = PineconeHybridSearchRetriever(
    embeddings=embeddings,
    sparse_encoder=bm25_encoder,
    index=index,
    text_key="text"
)


def generate_answer(question):
    # Retrieve relevant documents
    results = retriever.invoke(question)

    # Combine context from results
    context = " ".join(result.page_content for result in results)
    prompt = f"You are an expert in generating answers based on the following summarized chat content and the question: '{question}'. Please provide a complete summarized answer in 100 words.\n\n{context}"
    response = llm.invoke(prompt)

    # Generate answer using the LLM
    return context, response.content

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


###**Gradio Interface**

In [12]:
import gradio as gr
# Gradio Interface
query_interface = gr.Interface(
    fn=generate_answer,
    inputs=gr.Textbox(label="Enter your query"),
    outputs=[
            gr.Textbox(label="Chat Content"),   # To display the chat content
            gr.Textbox(label="Generated Answer")  # To display the generated answer
        ],
    title="Query WhatsApp Chat Data",
    description="Enter a query to retrieve information from the WhatsApp chat data. The app will generate an answer using an LLM."
)

if __name__ == "__main__":
    query_interface.launch()


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://e33a150a2e17060647.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
