# Building a Chatbot

- We will build a LLm-powered Chatbot . This will be able to have a conversation and remember previous interaction .
- The chatbot is built on only the use of language model to have a conversation . There are several other related concepts too .

The idea involve 
- Conversational RAG : Enable a chatbot experience over an extrenal source of data .
- Agent : Build a chatbot that can take actions .

In [1]:
import os 
from dotenv import load_dotenv
load_dotenv()


groq_api_key = os.getenv("GROQ_API_KEY")
groq_api_key

'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'

In [6]:
from langchain_groq import ChatGroq
model = ChatGroq(model = "Gemma2-9b-It" , groq_api_key=groq_api_key)
model 

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x0000020FE4F2D9C0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x0000020FE4F2E6E0>, model_name='Gemma2-9b-It', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [None]:
from langchain_core.messages import HumanMessage
model.invoke([HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ")])

AIMessage(content="Hi Ashis,\n\nIt's great to meet you!  That's awesome that you're an upcoming software engineer.  \n\nWhat kind of software engineering are you most interested in?  Do you have a favorite language or area you're focusing on?\n\nI'm happy to chat about anything related to software engineering, or just general tech topics if you'd like.  \n\nLooking forward to hearing more about you! 😊 \n\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 96, 'prompt_tokens': 24, 'total_tokens': 120, 'completion_time': 0.174545455, 'prompt_time': 0.002127426, 'queue_time': 0.23692686500000001, 'total_time': 0.176672881}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-567f0fb9-f290-46d1-b049-7287fda0235c-0', usage_metadata={'input_tokens': 24, 'output_tokens': 96, 'total_tokens': 120})

In [None]:
'''  
Here AI is remembering the prev context that u have given and it is responding the human asked question accordingly .

'''


from langchain_core.messages import AIMessage

model.invoke([
    HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ") ,
    AIMessage(content = "Hi Ashis,\n\nIt's great to meet you!  That's awesome that you're an upcoming software engineer.") ,
    HumanMessage(content = "Hey say what's my name and what do I do ?") 
])

AIMessage(content="You're Ashis, and you're an upcoming software engineer!  \n\nIt's nice to know a little bit about you.  \n\nWhat kind of software engineering are you most interested in? 💻  What are you working on right now?\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 56, 'prompt_tokens': 74, 'total_tokens': 130, 'completion_time': 0.101818182, 'prompt_time': 0.007177035, 'queue_time': 0.234395185, 'total_time': 0.108995217}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-99035500-0b22-4feb-abc9-a86a17fe6906-0', usage_metadata={'input_tokens': 74, 'output_tokens': 56, 'total_tokens': 130})

### Message History 
- This will actually store the prev msgs in some database , and if any user ask for any question . So based ont the prev response it has stored , it will give out the result .
- This actually acts as a wrapper which make it stateful and track the inputs and outputs of the model 

In [9]:
!pip install langchain_community



In [10]:
'''  
ChatMessageHistory → Stores chat history for a particular session.
BaseChatMessageHistory → A base class for handling chat histories.
RunnableWithMessageHistory → Allows chaining chat history with a LangChain model.
'''


from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store={}   # to store the session id 

''' 
Created session id's to distinguish between the chat history of different users.
'''

def get_session_history(session_id:str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


In [None]:
with_message_history = RunnableWithMessageHistory(model, get_session_history)   # chain between the modela and get_session_history
config = {"configurable" : {"session_id" : "chat1"}}   #hardcoded value 

response = with_message_history.invoke([
    HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ")
] , config=config)

In [13]:
response.content

"Hi Ashis, it's great to meet you!  That's awesome that you're pursuing a career in software engineering. It's a really exciting field with lots of possibilities.\n\nWhat areas of software engineering are you most interested in? Web development, mobile apps, game development, data science?  \n\nI'm happy to chat more about it and see if I can be of any help as you start your journey.  Good luck!\n"

In [14]:
# Now other question if I ask 

with_message_history.invoke(
    [
    HumanMessage(content = "Hey say what's my name and what do I do ?")
    ] , config=config
)

AIMessage(content="You're Ashis, and you're an upcoming software engineer!  😊  \n\nIs there anything else you'd like to tell me about yourself or your software engineering goals? I'm happy to listen and learn more.  \n\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 53, 'prompt_tokens': 273, 'total_tokens': 326, 'completion_time': 0.096363636, 'prompt_time': 0.010621049, 'queue_time': 0.23753139, 'total_time': 0.106984685}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-46c012cd-b8bf-4c88-81b2-e7f048d04a35-0', usage_metadata={'input_tokens': 273, 'output_tokens': 53, 'total_tokens': 326})

In [16]:
# Now changing the session id 
config1 = {"configurable" : {"session_id": "chat2"}}
resp = with_message_history.invoke(
    [
        HumanMessage(content = "Hey say what's my name and what do I do ?")
    ]
    ,config=config1
)

In [None]:
resp.content

# As an AI, I have no memory of past conversations and do not know your name or what you do.

"As an AI, I have no memory of past conversations and do not know your name or what you do.\n\nIf you'd like to tell me, I'm happy to learn! 😄  \n\n"

In [18]:
# This all mean that the AI is storing the corresponding session id and the question will be answered as per the stored content only . 

### Prompt Templates 

In [None]:
# How to remove the functionality of manual giving in the form of list , and pass by giving Placeholder with key as messages


from langchain_core.prompts import ChatPromptTemplate , MessagesPlaceholder 
# prompt = ChatPromptTemplate.from_messages(
#     [
#         ("system" , "You are a helpful assistant . Answer all the question to the best of your ability") , 
#         MessagesPlaceholder(variable_name="messages")
#     ]
# )


# To add more complexcity to can add like 
prompt = ChatPromptTemplate.from_messages(
    [
        ("system" , "You are a helpful assistant . Answer all the question to the best of your ability in {language}") , 
        MessagesPlaceholder(variable_name="messages")
    ]
)

chain = prompt|model

In [None]:
# chain.invoke({"messages":[HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ")]})

chain.invoke( 
    {
        "messages":[HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ")] ,
        "language":"Hindi"
    }
)

AIMessage(content="Hi Ashis, it's nice to meet you!  It's great to hear you're an upcoming software engineer. That's an exciting field to be in. \n\nI'm here to help in any way I can.  What can I do for you today?\n\nDo you have any questions about software engineering, programming, or anything else? I can try my best to answer them!\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 88, 'prompt_tokens': 40, 'total_tokens': 128, 'completion_time': 0.16, 'prompt_time': 0.002334245, 'queue_time': 0.23235461, 'total_time': 0.162334245}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-a2a4c4bc-71a9-42d6-a56a-51c41b8d4e5e-0', usage_metadata={'input_tokens': 40, 'output_tokens': 88, 'total_tokens': 128})

In [None]:
# with_message_history = RunnableWithMessageHistory(chain, get_session_history)

with_message_history = RunnableWithMessageHistory(chain, get_session_history , input_messages_key="messages")

In [37]:
config = {"configurable" : {"session_id": "chat4"}}

In [38]:
resp = with_message_history.invoke(
    [
        HumanMessage(content = "Hey say what's my name and what do I do ?")
    ]
    ,config=config
)

In [39]:
resp

AIMessage(content="As an AI, I have no memory of past conversations and don't know your name or what you do. \n\nIf you'd like to tell me, I'm happy to learn! 😊  \n\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 47, 'prompt_tokens': 38, 'total_tokens': 85, 'completion_time': 0.085454545, 'prompt_time': 0.002506165, 'queue_time': 0.23866799, 'total_time': 0.08796071}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-1fcf57a1-42b4-4b8e-a372-c7eac8e933ff-0', usage_metadata={'input_tokens': 38, 'output_tokens': 47, 'total_tokens': 85})

### Managing the Conversation History

In [None]:
from langchain_core.messages import SystemMessage , trim_messages
''' 
Trim-messages basically helps to reducing the number of messages we are sending to the model . The trimmer allows us to specify how many 
tokens wr want to keep , along with other parameter like if we want to always keep the system message and whether allow partial messages .


The function trim_messages() is used to limit the number of tokens in a chat history while keeping the conversation meaningful. 
It ensures that the model does not exceed token limits, which is crucial when dealing with LLMs that have token constraints.


'''

' \nTrim-messages basically helps to reducing the number of messages we are sending to the model . The trimmer allows us to specify how many \ntokens wr want to keep , along with other parameter like if we want to always keep the system message and whether allow partial messages .\n'

In [47]:
trimmer = trim_messages(
    max_tokens=40,         # Limit the total message history to 70 tokens
    strategy="last",       # Keep the most recent messages; discard older ones
    token_counter=model,   # Uses the model’s token counter to estimate tokens
    include_system=True,   # Keep the system message (assistant instructions)
    allow_partial=False,  # Do not allow cutting messages in half
    start_on="human"       # Start trimming from the human/user messages
)


In [48]:
# set of messages 

messages = [
    SystemMessage(content = "you're a good assistant") ,
    HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ") ,
    AIMessage(content = "Hi Ashis") ,
    HumanMessage(content = "I like chocolate icecreme ") ,
    AIMessage(content="nice") ,
    HumanMessage(content="I like to play cricket") ,
    AIMessage(content="That's awesome") ,
    HumanMessage(content="I am from India") ,
    AIMessage(content="That's cool")
]

In [49]:
trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like to play cricket', additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's awesome", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I am from India', additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's cool", additional_kwargs={}, response_metadata={})]

In [50]:
# if using chain

from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

chain = (
    RunnablePassthrough.assign( messages=itemgetter("messages") | trimmer ) | prompt | model
)

response = chain.invoke(
    {
        "messages": messages + [HumanMessage(content="What I love to play ?")],
        "language":"Hindi"
    }
)

response.content


'You said you love to play **cricket**! 🏏  \n\nIs there anything else you enjoy doing besides cricket?\n'

In [52]:
# Lets wrap this message history 

with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages"
)

config = {"configurable" : {"session_id": "chat5"}}
response = with_message_history.invoke(
    {
        "messages": messages + [HumanMessage(content="What country I belongs to ?")],
        "language":"Hindi"
    }
    ,config=config
)
response.content

'You said you are from India!  🇮🇳 🏏 \n\n\nLet me know if you have any other questions about cricket or anything else. 😊\n'

### Working with VectorStrores and Retriever

Langchain implement a Document Abstraction which is intended to represent a unit of text and associated metadata . It has two attributes :-

- page_content: a string representing the content 
- metadata: a dict containing arbitrary metadata .

In [62]:
from langchain_core.documents import Document

# Document represent a chunk of large page , each Document can reprsent a page of a book .

documents = [
    Document(
        page_content="LangChain simplifies working with LLMs.",
        metadata={"source": "blog", "author": "John Doe"}
    ),
    Document(
        page_content="AI is transforming industries worldwide.",
        metadata={"source": "news", "date": "2025-03-20"}
    ),
    Document(
        page_content="OpenAI's ChatGPT is a powerful language model.",
        metadata={"source": "research", "keywords": "OpenAI"}
    ),
    Document(
        page_content="Machine learning models require large datasets for training.",
        metadata={"source": "research_paper", "publisher": "IEEE"}
    ),
    Document(
        page_content="Deep learning has revolutionized image and speech recognition.",
        metadata={"source": "conference", "event": "NeurIPS 2024"}
    ),
    Document(
        page_content="Quantum computing could potentially accelerate AI computations.",
        metadata={"source": "tech_magazine", "issue": "March 2025"}
    ),
    Document(
        page_content="Cybersecurity is becoming increasingly important in the AI era.",
        metadata={"source": "whitepaper", "organization": "CyberSec Corp"}
    )
]

In [63]:
documents

[Document(metadata={'source': 'blog', 'author': 'John Doe'}, page_content='LangChain simplifies working with LLMs.'),
 Document(metadata={'source': 'news', 'date': '2025-03-20'}, page_content='AI is transforming industries worldwide.'),
 Document(metadata={'source': 'research', 'keywords': 'OpenAI'}, page_content="OpenAI's ChatGPT is a powerful language model."),
 Document(metadata={'source': 'research_paper', 'publisher': 'IEEE'}, page_content='Machine learning models require large datasets for training.'),
 Document(metadata={'source': 'conference', 'event': 'NeurIPS 2024'}, page_content='Deep learning has revolutionized image and speech recognition.'),
 Document(metadata={'source': 'tech_magazine', 'issue': 'March 2025'}, page_content='Quantum computing could potentially accelerate AI computations.'),
 Document(metadata={'source': 'whitepaper', 'organization': 'CyberSec Corp'}, page_content='Cybersecurity is becoming increasingly important in the AI era.')]

In [55]:
# Using huggingFace , we can call open source embeddings and LLMs

os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

In [57]:
llm = ChatGroq(model="Gemma2-9b-It" , groq_api_key=groq_api_key)
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x0000020F90D738E0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x0000020F90D72080>, model_name='Gemma2-9b-It', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [58]:
# Before storing converting into embeddings 
# wrt hugging face 
!pip install langchain_huggingface



In [64]:
from langchain_huggingface import HuggingFaceEmbeddings

# sentence-transformers/all-MiniLM-L6-v2

embeddings = HuggingFaceEmbeddings(
    model_name = "all-MiniLM-L6-v2",
)

In [65]:
# VectorStores 
# Here everything will get converted into vectors and embeddings and then store it into vectorStores 

from langchain_chroma import Chroma 
vectorstore = Chroma.from_documents(documents , embedding=embeddings)

In [66]:
vectorstore.similarity_search("OpenAI's ChatGPT")

[Document(id='b1adf05b-e8cd-48ef-926e-2701c1640871', metadata={'keywords': 'OpenAI', 'source': 'research'}, page_content="OpenAI's ChatGPT is a powerful language model."),
 Document(id='4714fc34-66a5-4da8-90aa-e6c29c46597f', metadata={'author': 'John Doe', 'source': 'blog'}, page_content='LangChain simplifies working with LLMs.'),
 Document(id='0f1f65aa-8bd4-411a-8f87-ea2a04e3fca8', metadata={'date': '2025-03-20', 'source': 'news'}, page_content='AI is transforming industries worldwide.'),
 Document(id='8ed66233-4e57-4776-986e-b36d67ed6b05', metadata={'event': 'NeurIPS 2024', 'source': 'conference'}, page_content='Deep learning has revolutionized image and speech recognition.')]

In [67]:
# for async query 
await vectorstore.asimilarity_search("OpenAI's ChatGPT")

[Document(id='b1adf05b-e8cd-48ef-926e-2701c1640871', metadata={'keywords': 'OpenAI', 'source': 'research'}, page_content="OpenAI's ChatGPT is a powerful language model."),
 Document(id='4714fc34-66a5-4da8-90aa-e6c29c46597f', metadata={'author': 'John Doe', 'source': 'blog'}, page_content='LangChain simplifies working with LLMs.'),
 Document(id='0f1f65aa-8bd4-411a-8f87-ea2a04e3fca8', metadata={'date': '2025-03-20', 'source': 'news'}, page_content='AI is transforming industries worldwide.'),
 Document(id='8ed66233-4e57-4776-986e-b36d67ed6b05', metadata={'event': 'NeurIPS 2024', 'source': 'conference'}, page_content='Deep learning has revolutionized image and speech recognition.')]

### Retrievers

- Langchain VectorStore objects do not subclass Runnable , so cannot immediately be inegrated into Langchain Expression Language chains .
- But Langchain Retrievers are subclass Runnables so they implement a standard set of measurement (e.g synchronous and asynchronous  invoke and batch operation ) .

In [69]:
from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1)     # It will iterate over the batch and get the first matching element .
retriever.batch(['OpenAi' , 'Machine learning'])

[[Document(id='b1adf05b-e8cd-48ef-926e-2701c1640871', metadata={'keywords': 'OpenAI', 'source': 'research'}, page_content="OpenAI's ChatGPT is a powerful language model.")],
 [Document(id='6b16486b-f0d0-47f9-b645-bbf47324cd26', metadata={'publisher': 'IEEE', 'source': 'research_paper'}, page_content='Machine learning models require large datasets for training.')]]

In [70]:
# Now how to quesry from the vectorstore , best way is to take from retriever 

retriever = vectorstore.as_retriever(
    search_type="similarity" , 
    search_kwargs={"k":1}
)

retriever.invoke("Deep Learning")

[Document(id='8ed66233-4e57-4776-986e-b36d67ed6b05', metadata={'event': 'NeurIPS 2024', 'source': 'conference'}, page_content='Deep learning has revolutionized image and speech recognition.')]

In [71]:
# chaining the retriever 
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """    

Answer this question using the provided context only .
{question}

Context:
{context}

"""


prompt = ChatPromptTemplate.from_messages([("human" , message)])

In [None]:
rag_chain = {"context" : retriever , "question":RunnablePassthrough() } | prompt | llm   
# Here the retriever has all the info from the vectorstore 

In [73]:
response = rag_chain.invoke("tell me about OpenAI")

In [74]:
response.content

'OpenAI created ChatGPT, a powerful language model.  \n'