# Building a Chatbot

- We will build a LLm-powered Chatbot . This will be able to have a conversation and remember previous interaction .
- The chatbot is built on only the use of language model to have a conversation . There are several other related concepts too .

The idea involve 
- Conversational RAG : Enable a chatbot experience over an extrenal source of data .
- Agent : Build a chatbot that can take actions .

In [15]:
import os 
from dotenv import load_dotenv
load_dotenv()


groq_api_key = os.getenv("GROQ_API_KEY")
groq_api_key

'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'

In [16]:
from langchain_groq import ChatGroq
model = ChatGroq(model = "Gemma2-9b-It" , groq_api_key=groq_api_key)
model 

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000001A0066BC5B0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001A0066BD810>, model_name='Gemma2-9b-It', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [17]:
from langchain_core.messages import HumanMessage
model.invoke([HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ")])

AIMessage(content="Hi Ashis, it's nice to meet you! \n\nThat's great to hear you're an upcoming software engineer. What area of software engineering are you most interested in?  \n\nI'm happy to chat about anything related to software development, from programming languages to career advice.  \n\nLet me know how I can help you on your journey!\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 78, 'prompt_tokens': 24, 'total_tokens': 102, 'completion_time': 0.141818182, 'prompt_time': 0.002165226, 'queue_time': 0.232007462, 'total_time': 0.143983408}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-32b57ac9-6715-413b-bae5-633acd4641fc-0', usage_metadata={'input_tokens': 24, 'output_tokens': 78, 'total_tokens': 102})

In [18]:
'''  
Here AI is remembering the prev context that u have given and it is responding the human asked question accordingly .

'''


from langchain_core.messages import AIMessage

model.invoke([
    HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ") ,
    AIMessage(content = "Hi Ashis,\n\nIt's great to meet you!  That's awesome that you're an upcoming software engineer.") ,
    HumanMessage(content = "Hey say what's my name and what do I do ?") 
])

AIMessage(content="You said your name is Ashis and you are an upcoming software engineer.  \n\nIs there anything else you'd like to tell me about yourself or your work? I'm happy to chat!\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 44, 'prompt_tokens': 74, 'total_tokens': 118, 'completion_time': 0.08, 'prompt_time': 0.003886053, 'queue_time': 0.233978018, 'total_time': 0.083886053}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-4772b3f3-ea19-492a-8759-bd0fdad2f938-0', usage_metadata={'input_tokens': 74, 'output_tokens': 44, 'total_tokens': 118})

### Message History 
- This will actually store the prev msgs in some database , and if any user ask for any question . So based ont the prev response it has stored , it will give out the result .
- This actually acts as a wrapper which make it stateful and track the inputs and outputs of the model 

In [19]:
!pip install langchain_community



In [20]:
'''  
ChatMessageHistory → Stores chat history for a particular session.
BaseChatMessageHistory → A base class for handling chat histories.
RunnableWithMessageHistory → Allows chaining chat history with a LangChain model.
'''


from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store={}   # to store the session id 

''' 
Created session id's to distinguish between the chat history of different users.
'''

def get_session_history(session_id:str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


In [21]:
with_message_history = RunnableWithMessageHistory(model, get_session_history)   # chain between the modela and get_session_history
config = {"configurable" : {"session_id" : "chat1"}}   #hardcoded value 

response = with_message_history.invoke([
    HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ")
] , config=config)

In [22]:
response.content

"Hi Ashis, it's nice to meet you!  \n\nThat's awesome that you're an upcoming software engineer. What are you most excited about in your journey?  \n\nDo you have any specific areas of software development that you're interested in?  \n\nI'm happy to chat about anything related to software engineering, or just general tech stuff. 😊  \n\n"

In [23]:
# Now other question if I ask 

with_message_history.invoke(
    [
    HumanMessage(content = "Hey say what's my name and what do I do ?")
    ] , config=config
)

AIMessage(content="You are Ashis, and you are an upcoming software engineer!  \n\nIs there anything else you'd like me to remember about you? 😁  \n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 126, 'total_tokens': 160, 'completion_time': 0.061818182, 'prompt_time': 0.0055341, 'queue_time': 0.236159008, 'total_time': 0.067352282}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-ef701dcf-503b-474a-bff9-518ae1011a73-0', usage_metadata={'input_tokens': 126, 'output_tokens': 34, 'total_tokens': 160})

In [24]:
# Now changing the session id 
config1 = {"configurable" : {"session_id": "chat2"}}
resp = with_message_history.invoke(
    [
        HumanMessage(content = "Hey say what's my name and what do I do ?")
    ]
    ,config=config1
)

In [25]:
resp.content

# As an AI, I have no memory of past conversations and do not know your name or what you do.

"As an AI, I have no memory of past conversations and do not know your name or what you do.\n\nIf you'd like to tell me, I'm happy to learn! 😊  \n\n"

In [26]:
# This all mean that the AI is storing the corresponding session id and the question will be answered as per the stored content only . 

### Prompt Templates 

In [27]:
# How to remove the functionality of manual giving in the form of list , and pass by giving Placeholder with key as messages


from langchain_core.prompts import ChatPromptTemplate , MessagesPlaceholder 
# prompt = ChatPromptTemplate.from_messages(
#     [
#         ("system" , "You are a helpful assistant . Answer all the question to the best of your ability") , 
#         MessagesPlaceholder(variable_name="messages")
#     ]
# )


# To add more complexcity to can add like 
prompt = ChatPromptTemplate.from_messages(
    [
        ("system" , "You are a helpful assistant . Answer all the question to the best of your ability in {language}") , 
        MessagesPlaceholder(variable_name="messages")
    ]
)

chain = prompt|model

In [28]:
# chain.invoke({"messages":[HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ")]})

chain.invoke( 
    {
        "messages":[HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ")] ,
        "language":"Hindi"
    }
)

AIMessage(content='नमस्ते अशिस!  मुझे जानकर बहुत अच्छा लगा। एक आगामी सॉफ्टवेयर इंजीनियर बनना बहुत अच्छा लक्ष्य है।  \n\nआप किस तरह की मदद चाहते हैं? \n\nमैं आपको सॉफ्टवेयर इंजीनियरिंग के बारे में जानकारी दे सकता हूँ, प्रोग्रामिंग भाषाओं के बारे में बता सकता हूँ, या आपके प्रोजेक्ट्स के लिए सुझाव दे सकता हूँ। \n\nकृपया पूछने में संकोच न करें! 😊\n\n', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 127, 'prompt_tokens': 42, 'total_tokens': 169, 'completion_time': 0.230909091, 'prompt_time': 0.002666705, 'queue_time': 0.232009865, 'total_time': 0.233575796}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-1357f7cd-a27d-4aa7-bc32-7a3847a4da94-0', usage_metadata={'input_tokens': 42, 'output_tokens': 127, 'total_tokens': 169})

In [29]:
# with_message_history = RunnableWithMessageHistory(chain, get_session_history)

with_message_history = RunnableWithMessageHistory(chain, get_session_history , input_messages_key="messages")

In [30]:
config = {"configurable" : {"session_id": "chat4"}}

In [31]:
resp = with_message_history.invoke(
    [
        HumanMessage(content = "Hey say what's my name and what do I do ?")
    ]
    ,config=config
)

ValueError: The input to RunnablePassthrough.assign() must be a dict.

In [None]:
resp

AIMessage(content="As an AI, I have no memory of past conversations and don't know your name or what you do. \n\nIf you'd like to tell me, I'm happy to learn! 😊  \n\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 47, 'prompt_tokens': 38, 'total_tokens': 85, 'completion_time': 0.085454545, 'prompt_time': 0.002506165, 'queue_time': 0.23866799, 'total_time': 0.08796071}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-1fcf57a1-42b4-4b8e-a372-c7eac8e933ff-0', usage_metadata={'input_tokens': 38, 'output_tokens': 47, 'total_tokens': 85})

### Managing the Conversation History

In [None]:
from langchain_core.messages import SystemMessage , trim_messages
''' 
Trim-messages basically helps to reducing the number of messages we are sending to the model . The trimmer allows us to specify how many 
tokens wr want to keep , along with other parameter like if we want to always keep the system message and whether allow partial messages .


The function trim_messages() is used to limit the number of tokens in a chat history while keeping the conversation meaningful. 
It ensures that the model does not exceed token limits, which is crucial when dealing with LLMs that have token constraints.


'''

' \nTrim-messages basically helps to reducing the number of messages we are sending to the model . The trimmer allows us to specify how many \ntokens wr want to keep , along with other parameter like if we want to always keep the system message and whether allow partial messages .\n'

In [None]:
trimmer = trim_messages(
    max_tokens=40,         # Limit the total message history to 70 tokens
    strategy="last",       # Keep the most recent messages; discard older ones
    token_counter=model,   # Uses the model’s token counter to estimate tokens
    include_system=True,   # Keep the system message (assistant instructions)
    allow_partial=False,  # Do not allow cutting messages in half
    start_on="human"       # Start trimming from the human/user messages
)


In [None]:
# set of messages 

messages = [
    SystemMessage(content = "you're a good assistant") ,
    HumanMessage(content = "Hi , My name is Ashis and I am a upcoming software Engineer ") ,
    AIMessage(content = "Hi Ashis") ,
    HumanMessage(content = "I like chocolate icecreme ") ,
    AIMessage(content="nice") ,
    HumanMessage(content="I like to play cricket") ,
    AIMessage(content="That's awesome") ,
    HumanMessage(content="I am from India") ,
    AIMessage(content="That's cool")
]

In [None]:
trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like to play cricket', additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's awesome", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I am from India', additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's cool", additional_kwargs={}, response_metadata={})]

In [None]:
# if using chain

from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

chain = (
    RunnablePassthrough.assign( messages=itemgetter("messages") | trimmer ) | prompt | model
)

response = chain.invoke(
    {
        "messages": messages + [HumanMessage(content="What I love to play ?")],
        "language":"Hindi"
    }
)

response.content


'You said you love to play **cricket**! 🏏  \n\nIs there anything else you enjoy doing besides cricket?\n'

In [None]:
# Lets wrap this message history 

with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages"
)

config = {"configurable" : {"session_id": "chat5"}}
response = with_message_history.invoke(
    {
        "messages": messages + [HumanMessage(content="What country I belongs to ?")],
        "language":"Hindi"
    }
    ,config=config
)
response.content

'You said you are from India!  🇮🇳 🏏 \n\n\nLet me know if you have any other questions about cricket or anything else. 😊\n'

### Working with VectorStrores and Retriever

Langchain implement a Document Abstraction which is intended to represent a unit of text and associated metadata . It has two attributes :-

- page_content: a string representing the content 
- metadata: a dict containing arbitrary metadata .

In [None]:
from langchain_core.documents import Document

# Document represent a chunk of large page , each Document can reprsent a page of a book .

documents = [
    Document(
        page_content="LangChain simplifies working with LLMs.",
        metadata={"source": "blog", "author": "John Doe"}
    ),
    Document(
        page_content="AI is transforming industries worldwide.",
        metadata={"source": "news", "date": "2025-03-20"}
    ),
    Document(
        page_content="OpenAI's ChatGPT is a powerful language model.",
        metadata={"source": "research", "keywords": "OpenAI"}
    ),
    Document(
        page_content="Machine learning models require large datasets for training.",
        metadata={"source": "research_paper", "publisher": "IEEE"}
    ),
    Document(
        page_content="Deep learning has revolutionized image and speech recognition.",
        metadata={"source": "conference", "event": "NeurIPS 2024"}
    ),
    Document(
        page_content="Quantum computing could potentially accelerate AI computations.",
        metadata={"source": "tech_magazine", "issue": "March 2025"}
    ),
    Document(
        page_content="Cybersecurity is becoming increasingly important in the AI era.",
        metadata={"source": "whitepaper", "organization": "CyberSec Corp"}
    )
]

In [None]:
documents

[Document(metadata={'source': 'blog', 'author': 'John Doe'}, page_content='LangChain simplifies working with LLMs.'),
 Document(metadata={'source': 'news', 'date': '2025-03-20'}, page_content='AI is transforming industries worldwide.'),
 Document(metadata={'source': 'research', 'keywords': 'OpenAI'}, page_content="OpenAI's ChatGPT is a powerful language model."),
 Document(metadata={'source': 'research_paper', 'publisher': 'IEEE'}, page_content='Machine learning models require large datasets for training.'),
 Document(metadata={'source': 'conference', 'event': 'NeurIPS 2024'}, page_content='Deep learning has revolutionized image and speech recognition.'),
 Document(metadata={'source': 'tech_magazine', 'issue': 'March 2025'}, page_content='Quantum computing could potentially accelerate AI computations.'),
 Document(metadata={'source': 'whitepaper', 'organization': 'CyberSec Corp'}, page_content='Cybersecurity is becoming increasingly important in the AI era.')]

In [None]:
# Using huggingFace , we can call open source embeddings and LLMs

os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

In [None]:
llm = ChatGroq(model="Gemma2-9b-It" , groq_api_key=groq_api_key)
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x0000020F90D738E0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x0000020F90D72080>, model_name='Gemma2-9b-It', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [None]:
# Before storing converting into embeddings 
# wrt hugging face 
!pip install langchain_huggingface



In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

# sentence-transformers/all-MiniLM-L6-v2

embeddings = HuggingFaceEmbeddings(
    model_name = "all-MiniLM-L6-v2",
)

In [None]:
# VectorStores 
# Here everything will get converted into vectors and embeddings and then store it into vectorStores 

from langchain_chroma import Chroma 
vectorstore = Chroma.from_documents(documents , embedding=embeddings)

In [None]:
vectorstore.similarity_search("OpenAI's ChatGPT")

[Document(id='b1adf05b-e8cd-48ef-926e-2701c1640871', metadata={'keywords': 'OpenAI', 'source': 'research'}, page_content="OpenAI's ChatGPT is a powerful language model."),
 Document(id='4714fc34-66a5-4da8-90aa-e6c29c46597f', metadata={'author': 'John Doe', 'source': 'blog'}, page_content='LangChain simplifies working with LLMs.'),
 Document(id='0f1f65aa-8bd4-411a-8f87-ea2a04e3fca8', metadata={'date': '2025-03-20', 'source': 'news'}, page_content='AI is transforming industries worldwide.'),
 Document(id='8ed66233-4e57-4776-986e-b36d67ed6b05', metadata={'event': 'NeurIPS 2024', 'source': 'conference'}, page_content='Deep learning has revolutionized image and speech recognition.')]

In [None]:
# for async query 
await vectorstore.asimilarity_search("OpenAI's ChatGPT")

[Document(id='b1adf05b-e8cd-48ef-926e-2701c1640871', metadata={'keywords': 'OpenAI', 'source': 'research'}, page_content="OpenAI's ChatGPT is a powerful language model."),
 Document(id='4714fc34-66a5-4da8-90aa-e6c29c46597f', metadata={'author': 'John Doe', 'source': 'blog'}, page_content='LangChain simplifies working with LLMs.'),
 Document(id='0f1f65aa-8bd4-411a-8f87-ea2a04e3fca8', metadata={'date': '2025-03-20', 'source': 'news'}, page_content='AI is transforming industries worldwide.'),
 Document(id='8ed66233-4e57-4776-986e-b36d67ed6b05', metadata={'event': 'NeurIPS 2024', 'source': 'conference'}, page_content='Deep learning has revolutionized image and speech recognition.')]

### Retrievers

- Langchain VectorStore objects do not subclass Runnable , so cannot immediately be inegrated into Langchain Expression Language chains .
- But Langchain Retrievers are subclass Runnables so they implement a standard set of measurement (e.g synchronous and asynchronous  invoke and batch operation ) .

In [None]:
from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1)     # It will iterate over the batch and get the first matching element .
retriever.batch(['OpenAi' , 'Machine learning'])

[[Document(id='b1adf05b-e8cd-48ef-926e-2701c1640871', metadata={'keywords': 'OpenAI', 'source': 'research'}, page_content="OpenAI's ChatGPT is a powerful language model.")],
 [Document(id='6b16486b-f0d0-47f9-b645-bbf47324cd26', metadata={'publisher': 'IEEE', 'source': 'research_paper'}, page_content='Machine learning models require large datasets for training.')]]

In [None]:
# Now how to quesry from the vectorstore , best way is to take from retriever 

retriever = vectorstore.as_retriever(
    search_type="similarity" , 
    search_kwargs={"k":1}
)

retriever.invoke("Deep Learning")

[Document(id='8ed66233-4e57-4776-986e-b36d67ed6b05', metadata={'event': 'NeurIPS 2024', 'source': 'conference'}, page_content='Deep learning has revolutionized image and speech recognition.')]

In [None]:
# chaining the retriever 
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """    

Answer this question using the provided context only .
{question}

Context:
{context}

"""


prompt = ChatPromptTemplate.from_messages([("human" , message)])

In [None]:
rag_chain = {"context" : retriever , "question":RunnablePassthrough() } | prompt | llm   
# Here the retriever has all the info from the vectorstore 

In [None]:
response = rag_chain.invoke("tell me about OpenAI")

In [None]:
response.content

'OpenAI created ChatGPT, a powerful language model.  \n'

### Building conversation Q&A Chatbot



In [None]:
from langchain_groq import ChatGroq
llm = ChatGroq(model="Gemma2-9b-It" , groq_api_key=groq_api_key)
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000001A04DB76290>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001A04DBBC7C0>, model_name='Gemma2-9b-It', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [None]:
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name = "all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm





In [32]:
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader   # this is because we will read the whole web page as context and then ask the bot 
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [35]:
from langchain.chains import create_retrieval_chain     # if ypu want to create a chain between the particular retreivala and vectorstore 
from langchain.chains.combine_documents import create_stuff_documents_chain   # combine all the document and sends to the prompt template

In [37]:
import bs4    # since we want to extract the particular part of the web page 

loader = WebBaseLoader(
    web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/" ,) ,
    bs_kwargs = dict(
        parse_only = bs4.SoupStrainer(
            class_=("post-content" , "post-title" , "post-header")
        )
    )
)
docs = loader.load()
docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

In [39]:
# ones we get the docs , better to break it into chunks 
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000 , chunk_overlap=200 )
splits = text_splitter.split_documents(docs)

In [41]:
# now store the splittings into vector store db 
vectorstore = Chroma.from_documents(splits , embedding=embeddings)

In [43]:
# Now is you want to convert into retriever
retriever = vectorstore.as_retriever()
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001A00A97F100>, search_kwargs={})

In [44]:
# Now creating the Prompt Template 
system_prompt = (
    "you are an assistant for question-answering task ." 
    "use the following peice of retrieved context to answer " 
    "the question . If you don't know the answer , you can say " 
    " no ' I don't know ' . Use three sentences maximum and keep the "  
    " answer concise ."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system" , system_prompt) ,
        ("human" , "{input}")
    ]
)

In [45]:
# now time to chain the retriever and the prompt and the system prompt 


# How to get the context for the system prompt 
question_answer_chain = create_stuff_documents_chain(llm , prompt)  # it will combine all the document and passes the context to the prompt
rag_chain = create_retrieval_chain(retriever , question_answer_chain)  # it will chain the retriever and the prompt

In [49]:
response = rag_chain.invoke({"input":"What is task Decomposition ?"})
response

{'input': 'What is task Decomposition ?',
 'context': [Document(id='a6367689-c279-4d8c-93b8-8abf9e4cf394', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'),
  Document(id='34f74f32-9341-4a55-94f2-52ec4fba5da0', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomou

In [50]:
response['answer']

'Task decomposition is the process of breaking down a complex task into smaller, more manageable subtasks. \n\nThis makes it easier for an agent to plan and execute the task, as it can focus on completing one subtask at a time. Chain of thought (CoT) and Tree of Thoughts (ToT) are prompting techniques used to help models perform task decomposition.  \n'

#### Working on adding Chat History


- This we need because the new prompt that i will give , the AI must know i want to know the answer that i asked on based on this present context .

In [51]:
from langchain.chains import create_history_aware_retriever    
# this means the retriever will know about the history of the past retriever actions 
from langchain_core.prompts import MessagesPlaceholder
# this is for , if we define any variable , then all the chats will be stored in that variable

contextual_q_system_prompt = (
    "Given a chat history and the latest user question"
    "which might refernece context in the chat history  "
    " formulate a standalone question which can be understood"
    " without the chat history . Do not answer the question ,"
    " just reformulate it if needed and otherwise return as it is . "
)
# The above prompt made for the system

contextual_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system" , contextual_q_system_prompt) ,
        MessagesPlaceholder(variable_name="chat_history") ,   # here the history of conversation will be stored
        ("human" , "{input}")
    ]
) 



In [52]:
# instead of prev retriever we will create a create_history_aware_retriever
history_aware_retriever = create_history_aware_retriever( llm , retriever , contextual_q_prompt)

# upgrading the retriever

In [53]:
history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001A00A97F100>, search_kwargs={}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessag

In [55]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system" , system_prompt) ,
        MessagesPlaceholder(variable_name="chat_history") ,
        ("human" , "{input}")
    ]
)

In [56]:
# Now creating a chain 
question_answer_chain = create_stuff_documents_chain(llm , qa_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever , question_answer_chain)

In [59]:
from langchain_core.messages import HumanMessage , AIMessage
chat_history=[]
question = " what is self reflection ?"
response1 = rag_chain.invoke({"chat_history":chat_history , "input":question})


# now appending it to chat_history
chat_history.extend(
    [
        HumanMessage(content=question) ,
        AIMessage(content=response1['answer'])

    ]
)

In [61]:
print(response1['answer'])

Self-reflection is a process where the agent learns from its failures.  

It involves showing the LLM pairs of (failed trajectory, ideal reflection) and storing up to three reflections in the agent's working memory. These reflections then guide the agent's future plan changes when querying the LLM. 



In [60]:
question2 = " Tell me more about it "
response2 = rag_chain.invoke({"chat_history":chat_history , "input":question2})
print(response2['answer'])

Self-reflection, in the context of this framework, is a way for the agent to improve its planning by learning from its mistakes. 

It works by providing the LLM with examples of failed trajectories and the ideal reflections that could have been used to improve them.  The agent stores these reflections in its memory and uses them as context when making future plans, helping it avoid repeating past errors. 





In [63]:
# now appending it to chat_history
chat_history.extend(
    [
        HumanMessage(content=question2) ,
        AIMessage(content=response2['answer'])

    ]
)

In [64]:
chat_history

[HumanMessage(content=' what is self reflection ?', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Self-reflection is a process where the agent learns from its failures.  \n\nIt involves showing the LLM pairs of (failed trajectory, ideal reflection) and storing up to three reflections in the agent's working memory. These reflections then guide the agent's future plan changes when querying the LLM. \n", additional_kwargs={}, response_metadata={}),
 HumanMessage(content=' Tell me more about it ', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Self-reflection, in the context of this framework, is a way for the agent to improve its planning by learning from its mistakes. \n\nIt works by providing the LLM with examples of failed trajectories and the ideal reflections that could have been used to improve them.  The agent stores these reflections in its memory and uses them as context when making future plans, helping it avoid repeating past errors. \n\n\n'

In [65]:
# One practice is to add the session id while working with LCEL 
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store={}   # to store the session id 

''' 
Created session id's to distinguish between the chat history of different users.
'''

def get_session_history(session_id:str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]



conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="output"
)

In [66]:
# Now we can also invoke based on the converstional chat session id 
conversational_rag_chain.invoke(
    {"input": "What is task decomposition ?"} ,
    config = {
        "configurable":{"session_id":"abc123"} 
    },
)['answer']

Error in RootListenersTracer.on_chain_end callback: KeyError('output')


'Task decomposition is the process of breaking down a complex task into smaller, more manageable subtasks. \n\nThis makes it easier for an agent to plan and execute the task effectively.  Chain of thought (CoT) and Tree of Thoughts are prompting techniques used to facilitate task decomposition. \n'