<a href="https://colab.research.google.com/github/Sankalpa0011/Simple-RAG-App-LangChain-And-OpenAI/blob/main/Simple_RAG_App_LangChain_And_OpenAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Install Necessary Packages**

In [1]:
!pip install langchain -qU
!pip install langchain-openai -qU    # Embedding Model
!pip install langchain-chroma -qU    # Database

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m990.3/990.3 kB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m374.2/374.2 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.8/139.8 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.1/141.1 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.7/46.7 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m337.0/337.0 kB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m45.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## **Import Necessary Libraries**

In [2]:
import os
from google.colab import userdata

## **Initialize OpenAI LLM**

In [3]:
from langchain_openai import ChatOpenAI

# Set OpenAI API Key
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

# Initialize the ChatOpenAI Model
llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0
)

## **Initialize Embedding Model**

In [4]:
from langchain_openai import OpenAIEmbeddings

# Initialize the OpenAIEmbeddings Model
embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

## **Create Embeded Documents**

In [5]:
from langchain_core.documents import Document

# Define a list of documents with content and metadata
documents = [
    Document(
        page_content="""The T20 World Cup 2024 is in full swing, bringing excitement and drama to cricket fans worldwide. India's team, captained by Rohit Sharma, is preparing for a crucial match against Ireland, with standout player Jasprit Bumrah expected to play a pivotal role in their campaign. The tournament has already seen controversy, particularly concerning the pitch conditions at Nassau County International Cricket Stadium in New York, which came under fire after a low-scoring game between Sri Lanka and South Africa.""",
        metadata={"source": "cricket news"},
    ),
    Document(
        page_content="""The world of football is buzzing with excitement as major tournaments and league matches continue to captivate fans globally. In the UEFA Champions League, the semi-final matchups have been set, with defending champions Real Madrid set to face Manchester City, while Bayern Munich will take on Paris Saint-Germain. Both ties promise thrilling encounters, featuring some of the best talents in world football.""",
        metadata={"source": "football news"},
    ),
    Document(
        page_content="""As election season heats up, the latest developments reveal a highly competitive atmosphere across several key races. The presidential election has seen intense campaigning from all major candidates, with recent polls indicating a tight race. Incumbent President Jane Doe is seeking re-election on a platform of economic stability and healthcare reform, while her main rival, Senator John Smith, focuses on education and climate change initiatives.""",
        metadata={"source": "election news"},
    ),
    Document(
        page_content="""The AI revolution continues to transform industries and reshape the global economy. Significant advancements in artificial intelligence have led to breakthroughs in healthcare, with AI-driven diagnostics improving patient outcomes and reducing costs. Autonomous systems are becoming increasingly prevalent in logistics and transportation, enhancing efficiency and safety.""",
        metadata={"source": "ai revolution news"}
    )
]

## **Create A Vector Store**

In [6]:
# Create a vector store using the documents and embedding model
from langchain_chroma import Chroma

vector_store = Chroma.from_documents(
    documents,
    embedding=embedding_model
)

## **Perform Similarity Search**

In [7]:
results = vector_store.similarity_search("Test match")

for result in results:
  print("-----------------")
  print(result.page_content)
  print(result.metadata)

-----------------
The T20 World Cup 2024 is in full swing, bringing excitement and drama to cricket fans worldwide. India's team, captained by Rohit Sharma, is preparing for a crucial match against Ireland, with standout player Jasprit Bumrah expected to play a pivotal role in their campaign. The tournament has already seen controversy, particularly concerning the pitch conditions at Nassau County International Cricket Stadium in New York, which came under fire after a low-scoring game between Sri Lanka and South Africa.
{'source': 'cricket news'}
-----------------
The world of football is buzzing with excitement as major tournaments and league matches continue to captivate fans globally. In the UEFA Champions League, the semi-final matchups have been set, with defending champions Real Madrid set to face Manchester City, while Bayern Munich will take on Paris Saint-Germain. Both ties promise thrilling encounters, featuring some of the best talents in world football.
{'source': 'footbal

In [8]:
results = vector_store.similarity_search("Machine learning")

for result in results:
  print("-----------------")
  print(result.page_content)
  print(result.metadata)

-----------------
The AI revolution continues to transform industries and reshape the global economy. Significant advancements in artificial intelligence have led to breakthroughs in healthcare, with AI-driven diagnostics improving patient outcomes and reducing costs. Autonomous systems are becoming increasingly prevalent in logistics and transportation, enhancing efficiency and safety.
{'source': 'ai revolution news'}
-----------------
The world of football is buzzing with excitement as major tournaments and league matches continue to captivate fans globally. In the UEFA Champions League, the semi-final matchups have been set, with defending champions Real Madrid set to face Manchester City, while Bayern Munich will take on Paris Saint-Germain. Both ties promise thrilling encounters, featuring some of the best talents in world football.
{'source': 'football news'}
-----------------
As election season heats up, the latest developments reveal a highly competitive atmosphere across sever

## **Embedded Query And Perform Similarity Search By Vector**

In [9]:
# Embed a query using the embedding model
query_embedding = embedding_model.embed_query("Machine learning")

# Check first ten values
query_embedding[:10]

[-0.009323582984507084,
 -0.002345611108466983,
 -0.0008498057723045349,
 -0.02188452146947384,
 0.04223865643143654,
 0.01093637477606535,
 0.007651930674910545,
 0.03967231884598732,
 -0.01960071362555027,
 0.022508447989821434]

In [10]:
# Print the length of the query embedding
len(query_embedding)  # this length same for all the queries

1536

In [11]:
results = vector_store.similarity_search_by_vector(query_embedding, k=3)

for result in results:
  print("-----------------")
  print(result.page_content)
  print(result.metadata)

-----------------
The AI revolution continues to transform industries and reshape the global economy. Significant advancements in artificial intelligence have led to breakthroughs in healthcare, with AI-driven diagnostics improving patient outcomes and reducing costs. Autonomous systems are becoming increasingly prevalent in logistics and transportation, enhancing efficiency and safety.
{'source': 'ai revolution news'}
-----------------
The world of football is buzzing with excitement as major tournaments and league matches continue to captivate fans globally. In the UEFA Champions League, the semi-final matchups have been set, with defending champions Real Madrid set to face Manchester City, while Bayern Munich will take on Paris Saint-Germain. Both ties promise thrilling encounters, featuring some of the best talents in world football.
{'source': 'football news'}
-----------------
As election season heats up, the latest developments reveal a highly competitive atmosphere across sever

## **Create Retriever**

In [16]:
# create a retriever from the vector
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1}
)

# Perform batch retrieval using the retriever
# batch_results = retriever.get_batches("machine learning", "test match")
result1 = retriever.get_relevant_documents("machine learning")
result2 = retriever.get_relevant_documents("test match")

# Print results
print("Results for 'machine learning':")
for doc in result1:
    print("-----------------")
    print(doc.page_content)
    print(doc.metadata)

print("\nResults for 'test match':")
for doc in result2:
    print("-----------------")
    print(doc.page_content)
    print(doc.metadata)

  warn_deprecated(


Results for 'machine learning':
-----------------
The AI revolution continues to transform industries and reshape the global economy. Significant advancements in artificial intelligence have led to breakthroughs in healthcare, with AI-driven diagnostics improving patient outcomes and reducing costs. Autonomous systems are becoming increasingly prevalent in logistics and transportation, enhancing efficiency and safety.
{'source': 'ai revolution news'}

Results for 'test match':
-----------------
The T20 World Cup 2024 is in full swing, bringing excitement and drama to cricket fans worldwide. India's team, captained by Rohit Sharma, is preparing for a crucial match against Ireland, with standout player Jasprit Bumrah expected to play a pivotal role in their campaign. The tournament has already seen controversy, particularly concerning the pitch conditions at Nassau County International Cricket Stadium in New York, which came under fire after a low-scoring game between Sri Lanka and South

## **Create Prompt Template**

In [18]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Define the prompt template
message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""

# Create a chat prompt template from the message
prompt = ChatPromptTemplate.from_messages([
    ("human", message)
])

## **Chain Retriever And Prompt Template With LLM**

In [19]:
chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm

In [20]:
response = chain.invoke("current state of 2024 t20 world cup")  # this question for our own dataset questions in RAG architecture
print(response.content)

The T20 World Cup 2024 is currently in full swing, with India's team led by Rohit Sharma gearing up for a crucial match against Ireland. Jasprit Bumrah is expected to play a key role in their campaign. The tournament has already faced controversy over pitch conditions at Nassau County International Cricket Stadium in New York.


In [21]:
response = chain.invoke("How are you")  # Normal conversation messages
print(response.content)

I am doing well, thank you.


In [22]:
response = chain.invoke("Football updates?")
print(response.content)

The semi-final matchups in the UEFA Champions League have been set, with Real Madrid facing Manchester City and Bayern Munich taking on Paris Saint-Germain.
