# Hybrid Search RAG Pipeline
This notebook demonstrates how to create a hybrid search Retrieval-Augmented Generation (RAG) pipeline using the LangChain library. The pipeline combines the strengths of BM25 and vector search to provide more accurate and relevant results for question-answering tasks.

## Installation and Imports
First, let's install the necessary dependencies:

In [1]:
!pip install langchain langchain_community chromadb requests langchain sentence-transformers langchain_community pypdf

Collecting sentence-transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Collecting transformers<5.0.0,>=4.34.0 (from sentence-transformers)
  Downloading transformers-4.42.3-py3-none-any.whl.metadata (43 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torch>=1.11.0 (from sentence-transformers)
  Downloading torch-2.3.1-cp312-none-macosx_11_0_arm64.whl.metadata (26 kB)
Collecting scikit-learn (from sentence-transformers)
  Downloading scikit_learn-1.5.1-cp312-cp312-macosx_12_0_arm64.whl.metadata (12 kB)
Collecting scipy (from sentence-transformers)
  Downloading scipy-1.14.0-cp312-cp312-macosx_14_0_arm64.whl.metadata (60 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
Collecting networkx (from torch>=1.11.0->sentence-transformers)
  Downloading networkx-3.3-p

Now, we can import the required modules:

In [3]:
import os
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain_community.llms import LlamaCpp
from langchain.chains import RetrievalQA, LLMChain
from langchain.retrievers import BM25Retriever, EnsembleRetriever

## Set up Environment Variables
Before we can use the Hugging Face Hub API, we need to set up the API token as an environment variable. We'll use the os and getpass modules for this purpose.

In [4]:

# from google.colab import userdata

In [7]:
# import os
# from getpass import getpass

# from getpass import getpass

# HUGGINGFACEHUB_API_TOKEN = userdata.get("HUGGINGFACEHUB_API_TOKEN")

# # Set the API token in the environment variable
# os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

## Load and Split Documents
Here we load the PDF documents from the specified directory and splits them into smaller chunks using the RecursiveCharacterTextSplitter. The chunk size is set to 500 characters with a 50-character overlap.

In [25]:

# Load your documents (assuming they are PDFs in a directory)

path = "./data/"
loader = PyPDFDirectoryLoader(path)
documents = loader.load()

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)


## Create Prompt Template:

In [26]:
from langchain.prompts.prompt import PromptTemplate

prompt_template = """
<|system|>
You are an AI Assistant that follows instructions extremely well.
Please be truthful and give direct answers. Please tell 'I don't know' if user query is not in CONTEXT

CONTEXT: {context}
</s>
<|user|>
{query}
</s>
<|assistant|>
Your answer:
"""
prompt = PromptTemplate.from_template(prompt_template)

In [27]:
prompt.format(context= "This is a context", query= "this is a query")

"\n<|system|>\nYou are an AI Assistant that follows instructions extremely well.\nPlease be truthful and give direct answers. Please tell 'I don't know' if user query is not in CONTEXT\n\nCONTEXT: This is a context\n</s>\n<|user|>\nthis is a query\n</s>\n<|assistant|>\nYour answer:\n"

## Initialize Embeddings and Vector Store
We initialize the Hugging Face embeddings model and use it to create a Chroma vector store from the document chunks.

In [30]:
emb_model = "thenlper/gte-large"
emb_model = "thenlper/gte-base"
emb_model = "BAAI/bge-small-en-v1.5"
embeddings = HuggingFaceEmbeddings(model_name=emb_model)

In [31]:
vectorstore = Chroma.from_documents(chunks, embeddings)

In [None]:
vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x32855afc0>

In [32]:
query = "what is electrolysis"
search = vectorstore.similarity_search(query)

search

[Document(metadata={'page': 8, 'source': 'data/lekl101.pdf'}, page_content='Talking about the T ext\nDiscuss in groups\n1. In spite of all the rationality that human beings are capable of,\nmost of us are suggestible and yield to archaic superstitions.\n2. Dreams and clairvoyance are as much an element of the poetic\nvision as religious superstition.\nAppreciation\n1. The story hinges on a gold ring shaped like a serpent with\nemerald eyes. Comment on the responses that this image\nevokes in the r eader .\n2. The craft of a master story-teller lies in the ability to interweave'),
 Document(metadata={'page': 8, 'source': 'data/ncert short story.pdf'}, page_content='Talking about the T ext\nDiscuss in groups\n1. In spite of all the rationality that human beings are capable of,\nmost of us are suggestible and yield to archaic superstitions.\n2. Dreams and clairvoyance are as much an element of the poetic\nvision as religious superstition.\nAppreciation\n1. The story hinges on a gold ring 

## Create BM25 and Vector Retrievers
Here,We create the BM25 and vector retrievers. The BM25 retriever is created directly from the document chunks, while the vector retriever is created from the Chroma vector store.

In [33]:
!pip install rank_bm25

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
699.74s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Collecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [35]:
bm25_retriever = BM25Retriever.from_documents(chunks)
vector_retriever = vectorstore.as_retriever()

## Set Up the EnsembleRetriever
 The EnsembleRetriever combines the BM25 and vector retrievers. The weights parameter is set to 0.5 for each retriever, giving them equal importance in the ensemble.

In [36]:
from langchain.retrievers.ensemble import EnsembleRetriever

retrievers = [bm25_retriever, vector_retriever]
ensemble_retriever = EnsembleRetriever(retrievers=retrievers, weights=[0.5, 0.5])


## Initialize the Large Language Model

In [58]:
# from langchain_community.llms import HuggingFaceHub

# llm = HuggingFaceHub(
#     repo_id="HuggingFaceH4/zephyr-7b-beta",
#     task="text-generation",
#     model_kwargs={
#         "max_new_tokens": 512,
#         "top_k": 30,
#         "temperature": 0.1,
#         "repetition_penalty": 1.1,
#         "return_full_text":False
#     },
# )

from langchain_community.llms import Ollama
model_kwargs={
    # "max_new_tokens": 512,
    "top_k": 2,
    "temperature": 0.0,
    # "repetition_penalty": 1.1,
    # "return_full_text":False,
}

llm = Ollama(
    model="llama3:8b",
    **model_kwargs
)

## Create the RAG Pipeline

In [59]:
from langchain_core.output_parsers import StrOutputParser

In [60]:
from langchain_core.runnables import RunnablePassthrough

In [61]:
output_parser = StrOutputParser()

In [62]:
retriever= ensemble_retriever
chain = (
    {"context": retriever, "query": RunnablePassthrough()}
    | prompt
    | llm
    | output_parser
)

## Run a Query


In [63]:
query = "what is an electrolyte"


In [64]:
response = chain.invoke(query)

In [65]:
print(response)
# An electrolyte is a substance that, when dissolved in a solvent (such as water), becomes an electrically conductive solution. Electrolytes contain ions (charged particles) that can move through a membrane or between two electrodes, allowing electrical current to flow. Examples of electrolytes include table salt (sodium chloride), lemon juice (citric acid), and baking soda (sodium bicarbonate). In the human body, electrolytes such as sodium, potassium, calcium, magnesium, and chloride play important roles in various physiological processes, including muscle contractions, nerve impulses, and maintaining proper fluid balance.


I don't know. The context of our conversation appears to be about a story, specifically about waves and cars, rather than chemistry or biology where the term "electrolyte" would typically be discussed. If you'd like to provide more context or clarify what you mean by "electrolyte", I'll do my best to help!


In [66]:
print(chain.invoke("what is electrolysis?"))
# Electrolysis is a chemical process that uses electricity to break down a compound, usually in a solution, into its constituent elements or simpler compounds. In other words, it is the process of using electric current to drive nonspontaneous chemical reactions. Electrolysis is commonly used in industry for the production of metals such as aluminum, chlorine, and sodium hydroxide (caustic soda). It can also be used to purify water by removing impurities like minerals and gases through a process called electrodeionization.


I don't know. The provided context does not contain any information about electrolysis.
