<a href="https://colab.research.google.com/github/UdaraChamidu/Generative-AI/blob/main/hybrid_search_rag_langchain_openai_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Hybrid Search RAG** using Langchain and OpenAI

In [4]:
!pip install pypdf -q
!pip install langchain -q
!pip install langchain_community -q
!pip install langchain_openai -q
!pip install langchain_chroma -q
!pip install rank_bm25 -q  # calculate the spase vectors
!pip install --upgrade numpy


Collecting numpy
  Downloading numpy-2.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy-2.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.4/16.4 MB[0m [31m35.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.26.4
    Uninstalling numpy-1.26.4:
      Successfully uninstalled numpy-1.26.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-chroma 0.2.2 requires numpy<2.0.0,>=1.22.4; python_version < "3.12", but you have numpy 2.2.4 which is incompatible.
tensorflow 2.18.0 requires numpy<2.1.0,>=1

In [5]:
# Import necessary libraries
import os
from google.colab import userdata

### Initialize OpenAI LLM

In [6]:
from langchain_openai import ChatOpenAI

# Set OpenAI API key
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

# Initialize the ChatOpenAI model
llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0
)

### Initialize Embedding Model

In [8]:
# for semantic search
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

### Load PDF Document

In [10]:
from langchain_community.document_loaders import PyPDFLoader

loader=PyPDFLoader("codeprolk.pdf")

docs=loader.load()

### Split Documents into Chunks

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=250,chunk_overlap=30)

chunks = splitter.split_documents(docs)

In [12]:
len(chunks)

33

### Create Semantic Search Retriever

In [13]:
from langchain_chroma import Chroma

vectorstore=Chroma.from_documents(chunks, embedding_model)

vectorstore_retreiver = vectorstore.as_retriever(search_kwargs={"k": 2})
# k = number of documents need to be retrieved

In [14]:
vectorstore_retreiver

# previous same things ....

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7cb3200ce3d0>, search_kwargs={'k': 2})

### Create Keyword Search Retriever

In [16]:
# new things happenning from here

from langchain.retrievers import BM25Retriever

keyword_retriever = BM25Retriever.from_documents(chunks)

keyword_retriever.k =  2

In [17]:
keyword_retriever

BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x7cb319340850>, k=2)

### Create Hybrid Search Retriever

In [19]:
from langchain.retrievers import EnsembleRetriever

ensemble_retriever = EnsembleRetriever(retrievers = [vectorstore_retreiver, keyword_retriever], weights = [0.5, 0.5])
# we can enter all retrievers here now we are using.
# weights also can be give for different retrievers.

In [20]:
ensemble_retriever

EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7cb3200ce3d0>, search_kwargs={'k': 2}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x7cb319340850>, k=2)], weights=[0.5, 0.5])

### Define Prompt Template

In [21]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Define a message template for the chatbot
message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""

# Create a chat prompt template from the message
prompt = ChatPromptTemplate.from_messages([("human", message)])

### Create RAG Chain with Hybrid Search

In [22]:
chain = (
    {
      "context": ensemble_retriever,  # use hybrid ...
      "question": RunnablePassthrough()
    }
    | prompt
    | llm
)

### Invoke RAG Chain with Example Questions

In [23]:
response = chain.invoke("what are the popular videos in codeprolk")

print(response.content)

The popular videos in CodePRO LK are those that have assisted viewers in their learning journeys.


In [None]:
# keyword_retriever, vectorstore_retreiver, ensemble_retriever

In [24]:
for doc in keyword_retriever.invoke("what are the popular videos in codeprolk"):
  print(doc.page_content)
  print("---------------------")

appreciation and sharing how the videos have assisted them in their learning journeys. 
Impact 
The CodePRO LK YouTube channel has played a significant role in democratizing tech
---------------------
industry, ensuring that learners are well-prepared for real-world challenges. 
Enhanced Learning Tools 
The platform plans to integrate more interactive and adaptive learning tools to personalize the
---------------------


In [25]:
for doc in vectorstore_retreiver.invoke("what are the popular videos in codeprolk"):
  print(doc.page_content)
  print("---------------------")

appreciation and sharing how the videos have assisted them in their learning journeys. 
Impact 
The CodePRO LK YouTube channel has played a significant role in democratizing tech
---------------------
CodePRO LK is committed to strengthening its community through regular engagement 
activities such as webinars, live coding sessions, hackathons, and tech talks. These events
---------------------


In [26]:
for doc in ensemble_retriever.invoke("what are the popular videos in codeprolk"):
  print(doc.page_content)
  print("---------------------")

appreciation and sharing how the videos have assisted them in their learning journeys. 
Impact 
The CodePRO LK YouTube channel has played a significant role in democratizing tech
---------------------
CodePRO LK is committed to strengthening its community through regular engagement 
activities such as webinars, live coding sessions, hackathons, and tech talks. These events
---------------------
industry, ensuring that learners are well-prepared for real-world challenges. 
Enhanced Learning Tools 
The platform plans to integrate more interactive and adaptive learning tools to personalize the
---------------------
