### Hybrid Search in RAG

  
🔍 What’s Hybrid RAG?

Hybrid RAG is an advanced RAG technique that merges:

 1️⃣ Semantic Search (Dense): Great for understanding context.

 2️⃣ Keyword Search (Sparse): Ideal for capturing exact matches.

  to deliver context-aware retrieval that traditional RAG setups often miss.

 Hybrid search leverages the strengths of both dense and sparse retrieval methods to enhance the relevance and accuracy of the retrieved information, which in turn improves the generation quality of the LLM.

 Dense Retrieval:

    Uses embeddings (vector representations) of queries and documents to measure their similarity.

    Efficient in capturing semantic meaning and retrieving conceptually related content even when exact keywords are not present.

    Examples: DPR (Dense Passage Retrieval), SentenceBERT.

   Sparse Retrieval:

    Relies on exact term matching between the query and documents, often utilizing inverted indices.

    Effective in retrieving highly relevant documents when the query terms are known and precise.

    Examples: BM25, TFIDF.

  
  Hybrid Search in RAG:

Hybrid search combines both dense and sparse retrieval methods to harness their individual strengths:

 Dense Retrieval: Provides a broader search by capturing the semantic meaning of the query, ensuring that even documents without exact keyword matches are considered if they are contextually relevant.

 Sparse Retrieval: Ensures precision by retrieving documents that contain the exact keywords or phrases from the query, which is particularly useful for specific or technical terms.

 Hybrid RAG helps by blending retrieval methods for:

  ✅ Improved relevance in search results.

  ✅ Better performance with unfamiliar terms or concepts.


In [None]:
!pip install langchain langchain_community langchain_groq langchain_chroma chromadb langchain_huggingface sentence-transformers

Collecting langchain_community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain_groq
  Downloading langchain_groq-0.3.7-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain_chroma
  Downloading langchain_chroma-0.2.5-py3-none-any.whl.metadata (1.1 kB)
Collecting chromadb
  Downloading chromadb-1.0.19-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)
Collecting langchain_huggingface
  Downloading langchain_huggingface-0.3.1-py3-none-any.whl.metadata (996 bytes)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting groq<1,>=0.30.0 (from langchain_groq)
 

In [None]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-6.0.0-py3-none-any.whl.metadata (7.1 kB)
Downloading pypdf-6.0.0-py3-none-any.whl (310 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/310.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m153.6/310.5 kB[0m [31m4.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.5/310.5 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-6.0.0


In [None]:
!pip install rank_bm25

Collecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2


In [None]:
#Importing required libraries
from langchain_groq import ChatGroq
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_chroma import Chroma

In [None]:
from google.colab import userdata
groq_api_key = userdata.get("GROQ_API_KEY")

In [None]:
#LLM
llm = ChatGroq(model = "openai/gpt-oss-20b", api_key = groq_api_key)

In [None]:
#Download the data
!wget "https://arxiv.org/pdf/1810.04805.pdf"

--2025-08-18 10:33:53--  https://arxiv.org/pdf/1810.04805.pdf
Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.3.42, 151.101.195.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /pdf/1810.04805 [following]
--2025-08-18 10:33:53--  https://arxiv.org/pdf/1810.04805
Reusing existing connection to arxiv.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 775166 (757K) [application/pdf]
Saving to: ‘1810.04805.pdf’


2025-08-18 10:33:53 (12.8 MB/s) - ‘1810.04805.pdf’ saved [775166/775166]



In [None]:
#Load Document
loader = PyPDFLoader("1810.04805.pdf")
documents = loader.load()

In [None]:
len(documents)

16

In [None]:
#Splitting Documents into Chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
docs = text_splitter.split_documents(documents)

In [None]:
#After split documents
len(docs)

83

In [None]:
#Embedding
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
#Create Vectorstore
vectorstores = Chroma.from_documents(docs, embeddings)

In [None]:
#VectorStore Retriever
retriever = vectorstores.as_retriever()

In [None]:
#Keyword  Retriever

from langchain.retrievers import BM25Retriever
keyword_retriever = BM25Retriever.from_documents(docs)

In [None]:
#Ensemble Retriever
from langchain.retrievers import EnsembleRetriever

ensemble_retriever = EnsembleRetriever(retrievers=[retriever, keyword_retriever], weights = [0.5, 0.5])

In [None]:
ensemble_retriever.get_relevant_documents("Describe the Feature-based Approach with BERT?")

  ensemble_retriever.get_relevant_documents("Describe the Feature-based Approach with BERT?")


[Document(id='b2127fc4-d777-48e8-8e74-b401f9aa5877', metadata={'moddate': '2019-05-28T00:07:51+00:00', 'page': 1, 'creator': 'LaTeX with hyperref package', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) kpathsea version 6.2.2', 'source': '1810.04805.pdf', 'total_pages': 16, 'subject': '', 'page_label': '2', 'producer': 'pdfTeX-1.40.17', 'creationdate': '2019-05-28T00:07:51+00:00', 'title': '', 'keywords': '', 'author': '', 'trapped': '/False'}, page_content='trained left-to-right and right-to-left LMs.\n• We show that pre-trained representations reduce\nthe need for many heavily-engineered task-\nspeciﬁc architectures. BERT is the ﬁrst ﬁne-\ntuning based representation model that achieves\nstate-of-the-art performance on a large suite\nof sentence-level and token-level tasks, outper-\nforming many task-speciﬁc architectures.\n• BERT advances the state of the art for eleven\nNLP tasks. The code and pre-trained mod-\nels are available at https://github

In [None]:
#Improving Retrieval with Reranker
#Here we use an open source cross-encoder reranker model
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain.retrievers import ContextualCompressionRetriever

# download an open-source reranker model - BAAI/bge-reranker-v2-m3
reranker = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-v2-m3")
reranker_compressor = CrossEncoderReranker(model=reranker, top_n=5)

config.json:   0%|          | 0.00/795 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

In [None]:
# Retriever  - Uses a Reranker model to rerank retrieval results from the previous retriever
final_retriever = ContextualCompressionRetriever(
    base_compressor=reranker_compressor,
    base_retriever=ensemble_retriever
)

### RAG Pipeline

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import ChatPromptTemplate

template =  """"
You are a helpful assistant that answers questions based on the following context.
If you don't find the answer in the context, just say that you don't know.
Context: {context}

Question: {input}

Answer:

"""

prompt = ChatPromptTemplate.from_template(template)

rag_chain = (
    {"context": final_retriever, "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


In [None]:
response = rag_chain.invoke("Describe the Feature-based Approach with BERT?")

In [None]:
print(response)

**Feature‑based approach with BERT**  

In the feature‑based strategy the pre‑trained BERT model is *not* fine‑tuned on the downstream task. Instead, the model is used as a static feature extractor:

1. **Input representation** – Sentences (or documents) are tokenised with BERT’s case‑preserving WordPiece model and fed into the pre‑trained Transformer.  
2. **Feature extraction** – From the BERT encoder we take the hidden representations of each token. Empirically, concatenating the last four hidden layers gives the best results for token‑level tasks such as Named‑Entity Recognition (NER).  
3. **Downstream model** – These frozen feature vectors are then passed to a lightweight, task‑specific classifier (e.g., a linear layer or a small neural network). For NER this is a tagging model; for sentence‑level tasks it can be a single‑label classifier.  
4. **Advantages**  
   * **Computational efficiency** – The expensive BERT computation is performed once per data point, after which many ch