<a href="https://colab.research.google.com/github/amal2334/NLP-and-LLM-Project/blob/main/chatbots.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Building Specialized Chatbots with LangChain: From Prompts to RAG**

In [None]:
!pip install dotenv
from dotenv import load_dotenv
import os

# Load .env file
load_dotenv()

# Access the keys
hf_token = os.getenv("HUGGINGFACE_API_KEY")
groq_api_key = os.getenv("GROQ_API_KEY")

# Check if loaded (optional for testing)
print("HuggingFace key loaded:", hf_token is not None)
print("Groq key loaded:", groq_api_key is not None)


HuggingFace key loaded: True
Groq key loaded: True


# **Table of content**
In this notebook, we'll explore:

- **Introduction**

- **Setup & Downloads**
  - Install required libraries

  - Import core modules

  - Set API keys and environment

- **1. Human Text Generation with Context**
  - Mini Chatbot: ✈️ Travel Assistant by Recommending  travel ideas based on user input

- **2. Conversation Management and Multi-User Support**
  -  Mini Chatbot: 📅 Personal Productivity Coach by Helping  users plan goals and routines with memory per user

- **3. Prompt Engineering**
  - Mini Chatbot: 🛒 Shopping Assistant by Helping  users choose the right laptop based on preferences

- **4. Multilingual Capabilities with Chat Histor**y
   - Mini Chatbot: 🗣️ Language Learning Assistant with Memory by Helping users learn and practice Arabic, English, etc.

- **5. Token Trimming for Long Conversations**
   - Mini Chatbot: 📞 Customer Support Assistant by Assisting  users in long chats without losing coherence

- **6. Retrieval-Augmented Generation (RAG)**
    - Mini Chatbot: 📖 AI Study Assistant (RAG + Memory) and Answers research questions using real blog content.
    - Tracks chat history for follow-up questions
-  Conclusion
- Final Thoughts


# **Introduction**

- In this project we will be building a modern Q&A chatbot using Large Language Models (LLMs) and the LangChain framework in which we will explore the field of generative AI and creating a functional, context-aware conversational system.

# **Overview**
- Conversational AI is transforming how humans interact with computers, enabling more natural and intuitive interfaces. The techniques covered in this notebook form the foundation of many modern AI applications, from customer service bots to virtual assistants and knowledge management systems.

# **Envirenment Setup**
The key libraries we'll use include:
- `langchain`: Core framework for working with LLMs
- `langchain_community`: Community-contributed components and integrations
- `langchain_core`: Essential components like message types and prompt templates
- `langchain_chroma`: Vector database integration for knowledge retrieval


**Note:**
- Throughtout this notebook , we'll be using API key from Groq and Hugging Face

In [None]:
!pip install langchain-groq
!pip install dotenv
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_groq import ChatGroq
!pip install -U langchain-community
!pip install langchain_chroma
!pip install bs4



In [None]:
from langchain_groq import ChatGroq
model=ChatGroq(model="Gemma2-9b-It",groq_api_key=groq_api_key)
model

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7e847a9679d0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7e8478153ed0>, model_name='Gemma2-9b-It', model_kwargs={}, groq_api_key=SecretStr('**********'))

**Note:**

The foundation of our chatbot is the Large Language Model (LLM). In this project, we're using **Gemma2 9B-IT**, a powerful instruction-tuned model accessible through Ollama.

**What is Gemma2?**:
Gemma2 is Google's lightweight, state-of-the-art open model that delivers strong performance while being efficient enough to run on consumer hardware. The "9B" refers to the number of parameters (9 billion), and "IT" indicates it's instruction-tuned, meaning it's specifically optimized for following instructions and generating helpful responses.

**What is Ollama?**
Ollama is a framework that simplifies running open-source LLMs locally. It handles model downloading, optimization, and inference, providing a simple API for applications to interact with these models.



# **1-Human Text Generation with Context**

In [None]:
from langchain_core.messages import AIMessage
from langchain_core.messages import HumanMessage
from langchain_core.messages import SystemMessage

model.invoke(
    [

        HumanMessage(content="Hi , My name is Amal and I am a data analyst , later on inshallah a data scientis"),
        AIMessage(content="Hello Amal! It's nice to meet you. \n\nAs a data  Analyst, what kind of projects are you working on these days? \n\nI'm always eager to learn more about the exciting work being done in the field of AI and data analysis.\n"),
        HumanMessage(content="Hey What's my name and what do I do?")
    ]
)

AIMessage(content="You are Amal, and you are a data analyst, aspiring to become a data scientist!  😄  \n\nIs there anything else you'd like to tell me about yourself or your work?  I'm happy to chat!\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 109, 'total_tokens': 159, 'completion_time': 0.090909091, 'prompt_time': 0.005083973, 'queue_time': 0.016439864, 'total_time': 0.095993064}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run--c7747a25-1652-4b32-a59a-3b30ff352854-0', usage_metadata={'input_tokens': 109, 'output_tokens': 50, 'total_tokens': 159})

In [None]:
model.invoke(
    [
        SystemMessage(content="You are a helpful assistant"),
        HumanMessage(content="Hi , My name is Ajay and I am a data analyst"),
        AIMessage(content="Hello Ajay ! It's nice to meet you. \n\nAs a data Analyst, what kind of projects are you working on these days? \n\nI'm always eager to learn more about the exciting work being done in the field of AI and data analysis.\n"),
        HumanMessage(content="Hey What's my name and what do I do?")
    ]
)

AIMessage(content="You are Ajay, a data analyst!  \n\nIs there anything specific you'd like to talk about regarding your work as a data analyst?  Perhaps you have a question about a particular technique or are looking for insights on a dataset?  I'm here to help!\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 59, 'prompt_tokens': 103, 'total_tokens': 162, 'completion_time': 0.107272727, 'prompt_time': 0.005102172, 'queue_time': 0.019285664, 'total_time': 0.112374899}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run--236d8b54-f94b-4739-92d0-8d4ce774f81b-0', usage_metadata={'input_tokens': 103, 'output_tokens': 59, 'total_tokens': 162})

# **Mini chatbot :Travel Assistant**

In [None]:
model.invoke(
    [
        SystemMessage(content="You are a smart travel assistant who helps people plan their vacations."),
        HumanMessage(content="Hi, I’m Sarah and I’d like to plan a trip to Italy next month."),
        AIMessage(content="Hi Sarah! That sounds exciting. I'd love to help you plan your trip to Italy. Do you have any cities or activities in mind?"),
        HumanMessage(content="What’s my name and where do I want to go?")
    ]
)


AIMessage(content="You're Sarah and you want to go to Italy!  \n\nIs there a particular region or type of experience you're hoping for? For example, are you interested in: \n\n* **Art and history:**  Rome, Florence, Venice\n* **Food and wine:** Tuscany, Emilia Romagna\n* **Beaches and relaxation:** Amalfi Coast, Sicily\n* **Hiking and nature:** Dolomites, Cinque Terre\n\nTell me more about your dream Italian vacation! 🇮🇹 ☀️ 🍷  🍕\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 109, 'prompt_tokens': 95, 'total_tokens': 204, 'completion_time': 0.198181818, 'prompt_time': 0.00427938, 'queue_time': 0.016511577, 'total_time': 0.202461198}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run--fabf9676-0365-47b4-8565-ac59b53b69cb-0', usage_metadata={'input_tokens': 95, 'output_tokens': 109, 'total_tokens': 204})

# **Note:**
Message Types in Conversational AI

In a conversational system, we need to distinguish between different types of messages. LangChain provides several message classes to represent different participants in a conversation:

1. **HumanMessage**: Represents messages from the user
2. **AIMessage**: Represents responses from the AI assistant
3. **SystemMessage**: Contains instructions or context for the AI that aren't part of the visible conversation

# **2-Conversation management and Multi-User Support**
- We can use a Message History class to wrap our model and make it stateful. This will keep track of inputs and outputs of the model, and store them in some datastore.

In [None]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store={}

def get_session_history(session_id:str)->BaseChatMessageHistory:
    if session_id not in store:
        store[session_id]=ChatMessageHistory()
    return store[session_id]

with_message_history=RunnableWithMessageHistory(model,get_session_history)

In [None]:
config={"configurable":{"session_id":"chat1"}}

In [None]:
response=with_message_history.invoke(
    [HumanMessage(content="Hi , My name is Amal and I am a data analyst")],
    config=config
)

In [None]:
response.content

"Hello Amal! It's nice to meet you.  \n\nWhat kind of data analysis do you specialize in? What are you working on these days?  I'm always interested in learning more about how people use data to solve problems. 😊 \n\n"

In [None]:
with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

AIMessage(content='Your name is Amal.  😊  I remember!  \n\nIs there anything else I can help you with?\n', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 89, 'total_tokens': 115, 'completion_time': 0.047272727, 'prompt_time': 0.005739685, 'queue_time': 0.021543041, 'total_time': 0.053012412}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run--5f91318c-b3f8-455e-9f23-193fc8ff2472-0', usage_metadata={'input_tokens': 89, 'output_tokens': 26, 'total_tokens': 115})

In [None]:
## change the config-->session id
config1={"configurable":{"session_id":"chat2"}}
response=with_message_history.invoke(
    [HumanMessage(content="Whats my name")],
    config=config1
)
response.content

"As an AI, I have no memory of past conversations and do not know your name. If you'd like to tell me, I'm happy to remember it! 😊\n"

# **Note:**
- Model memory allows the AI to remember what was said earlier in a conversation, just like talking to a person who listens and responds based on context. In LangChain, this memory is stored using a session ID.
- When you stay in the same session, the model can recall your name, role, or anything you previously mentioned. But when you change the session ID, it’s like starting a brand-new chat.
- the model has no memory of past messages. So, memory works only within the same session, and changing the session resets the conversation history.

In [None]:
response=with_message_history.invoke(
    [HumanMessage(content="Hey My name is Ajay")],
    config=config1
)
response.content

"Hi Ajay, it's nice to meet you!  How can I help you today? 😊  \n\n"

In [None]:
response=with_message_history.invoke(
    [HumanMessage(content="Whats my name")],
    config=config1
)
response.content

'Your name is Ajay! 😊  \n\nI remember you told me a little while ago.  Is there anything else I can help you with?  \n'

# **Note:**
- If we switch sessions, it’s like talking to the AI for the first time.

- We must build memory step by step within each session

# **Mini Chatbot: Personal Productivity Coach**

In [None]:

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

system_prompt = (
    "You are a personal productivity coach. "
    "Help users build routines, focus on goals, and stay productive. "
    "Remember what each user tells you during the session."
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder(variable_name="messages"),
    ]
)


chain = prompt | model


with_memory_chain = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages"
)

user1_session = "amal123"
user2_session = "Ajay456"

# Amal's message
response1 = with_memory_chain.invoke(
    {"messages": [HumanMessage(content="give me some advices on how i can  build a Fashion brand")]},
    config={"configurable": {"session_id": user1_session}}
)

# Ajay 's message
response2 = with_memory_chain.invoke(
    {"messages": [HumanMessage(content="What would a meaningful life look like to you.")]},
    config={"configurable": {"session_id": user2_session}}
)

response1.content


"That's fantastic! Building a fashion brand is an exciting journey.  It takes passion, creativity, and a solid plan. I'm here to help you structure your approach and stay focused. \n\nFirst, let's talk about **defining your brand**. \n\nTell me:\n\n* **What kind of fashion are you passionate about?** (e.g., streetwear, sustainable clothing, luxury wear, etc.)\n* **Who is your target audience?** (e.g., age, lifestyle, values)\n* **What makes your brand unique?** (e.g., design aesthetic, materials, ethical practices)\n* **What is your brand's story?** (What inspired you to start this brand?)\n\nOnce we have a clear vision for your brand, we can move on to building a **productive routine** that will help you bring it to life.  \n\nThink about:\n\n* **What are your core tasks?** (e.g., design, sourcing, marketing, sales)\n* **How much time can you realistically dedicate to your brand each week?**\n* **What tools and resources do you need?** (e.g., design software, manufacturing connections

In [None]:
response2.content

"That's a great question! To me, a meaningful life is about living with intention and purpose. It's about waking up each day feeling excited about what you're going to accomplish and contribute to the world. \n\nIt's about nurturing strong relationships, pursuing passions that set your soul on fire, and constantly learning and growing.  \n\nIt's not about perfection or achieving some arbitrary checklist of accomplishments, but about finding joy in the journey and making a positive impact, no matter how small, on the lives of others. \n\nNow, tell me, what does a meaningful life look like to you? What are some things that would bring you a sense of fulfillment and purpose?  \n\nI'm here to help you create a life that aligns with your values and aspirations. We can work together to build routines, set goals, and develop strategies that empower you to live your best life.\n"

## **3- Prompt Engineering**

In [None]:
from langchain_core.prompts import ChatPromptTemplate,MessagesPlaceholder
prompt=ChatPromptTemplate.from_messages(
    [
        ("system","You are a helpful assistant.Answer all the question to the best of your ability"),
        MessagesPlaceholder(variable_name="messages")
    ]
)

chain=prompt|model

In [None]:
chain.invoke({"messages":[HumanMessage(content="Hi My name is Amal")]})

AIMessage(content="Hello Amal! It's nice to meet you.\n\nWhat can I do for you today? 😊  I'm ready for any questions you have!  \n\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 37, 'prompt_tokens': 31, 'total_tokens': 68, 'completion_time': 0.067272727, 'prompt_time': 0.002566647, 'queue_time': 0.016089950999999998, 'total_time': 0.069839374}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run--5976fbea-067d-489d-8732-f97436a1ccd5-0', usage_metadata={'input_tokens': 31, 'output_tokens': 37, 'total_tokens': 68})

In [None]:
with_message_history=RunnableWithMessageHistory(chain,get_session_history)

In [None]:
config = {"configurable": {"session_id": "chat3"}}
response=with_message_history.invoke(
    [HumanMessage(content="Hi My name is Amal")],
    config=config
)

response

AIMessage(content="Hello Amal! It's nice to meet you.\n\nI'm happy to help answer any questions you have to the best of my ability. What can I do for you today? 😊  \n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 43, 'prompt_tokens': 31, 'total_tokens': 74, 'completion_time': 0.078181818, 'prompt_time': 0.002614587, 'queue_time': 0.08027486, 'total_time': 0.080796405}, 'model_name': 'Gemma2-9b-It', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run--3c78536b-6238-4051-a61e-339dbcf49f1a-0', usage_metadata={'input_tokens': 31, 'output_tokens': 43, 'total_tokens': 74})

In [None]:
response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

'Your name is Amal.  You told me at the beginning! 😊 \n'

# **Mini chatbot: Shopping Assistant**

In [None]:

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant. Answer all the questions to the best of your ability"),
        MessagesPlaceholder(variable_name="messages")
    ]
)

chain = prompt | model
with_message_history = RunnableWithMessageHistory(chain, get_session_history)

config = {"configurable": {"session_id": "shop_user1"}}

response = with_message_history.invoke(
    {"messages": [HumanMessage(content="Hi, I’m looking for a new laptop for programming.")]},
    config=config
)

response = with_message_history.invoke(
    {"messages": [HumanMessage(content="Hi I care about GPU and RAM ")]},
    config=config
)

print(response.content)


Great! You're prioritizing GPU and RAM, which are important for certain types of programming. 

To give you the best recommendations, I need a little more information:

1. **What's your budget?** 
2. **What kind of programming will you be doing?** (Data science, game development, web development, etc.)  Knowing this will help me determine how powerful of a GPU and how much RAM you'll need. 



Let me know, and I'll help you find a laptop that fits your needs!



# **4-multilingual capabilities with chat History**

In [None]:
ch
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability in {language}.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | model

In [None]:
response=chain.invoke({"messages":[HumanMessage(content="Hi My name is Amal")],"language":"Arabic"})
response.content

'مرحباً يا أمل، يسعدني معرفتك 😊 \n\nما الذي يمكنني أن أساعدك به اليوم؟\n'

In [None]:
response=chain.invoke({"messages":[HumanMessage(content="Hi My name is Ajay")],"language":"Telugu"})
response.content

'నమస్తే Ajay! మీకు సహాయం చేయడానికి నేను ఇక్కడ ఉన్నాను. ఏమి అడుగుతున్నారు? 😊\n'

In [None]:
response=chain.invoke({"messages":[HumanMessage(content="Hi My name is selcuk")],"language":"Turkish"})
response.content

'Merhaba Selçuk! \n\nNasıl yardımcı olabilirim? 😊 \n'

In [None]:
response=chain.invoke({"messages":[HumanMessage(content="Hi My name is Amal")],"language":"English"})
response.content

"Hello Amal! It's nice to meet you. \n\nWhat can I do for you today? 😄  \n\n"

In [None]:
with_message_history=RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages"
)

In [None]:
config = {"configurable": {"session_id": "chat4"}}
repsonse=with_message_history.invoke(
    {'messages': [HumanMessage(content="Hi,I am Amal")],"language":"English"},
    config=config
)
repsonse.content

"Hello Amal! 👋\n\nIt's nice to meet you. How can I help you today? 😊  \n"

In [None]:
response = with_message_history.invoke(
    {"messages": [HumanMessage(content="whats my name?")], "language": "English"},
    config=config,
)

In [None]:
response.content

'Your name is Amal.  😊 I remember! \n\nIs there anything else I can help you with?\n'

# **Mini ChatBot :Language learning Assistant with Memory**

- We’ve built a smart language learning tutor that responds in your chosen language **(like English or Spanish)** and remembers what you’ve learned during the session. This tutor can help you practice vocabulary, correct your mistakes, and track your progress over time using memory. It's like having a personal language coach who understands you and guides you step by step.

In [None]:



prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a supportive language tutor. Respond in {language}. Encourage and correct the student. Remember what they learned."),
    MessagesPlaceholder(variable_name="messages")
])


chain = prompt | model

store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages"
)

config = {"configurable": {"session_id": "english_learning_amal"}}


In [None]:
response1 = with_message_history.invoke(
    {
        "messages": [HumanMessage(content="How do I say 'libro 'in English?")],
        "language": "English"
    },
    config=config
)
print("Bot:", response1.content)


Bot: That's a great question!  "Libro" is the Spanish word for **book**. 

It's wonderful that you're exploring different languages! 🎉  Is there anything else you'd like to translate or practice?  I'm happy to help. 📚  



In [None]:
response2 = with_message_history.invoke(
    {
        "messages": [HumanMessage(content="What new word did I learn?")],
        "language": "English"
    },
    config=config
)
print("Bot:", response2.content)


Bot: You learned that "libro" in Spanish means "**book**" in English!  

Remember, learning new words from different languages is like adding colorful pieces to a big puzzle!  

Would you like to try another word?  😊  








# **5-Token Trimming for Long Conversation**

In [None]:
from langchain_core.messages import SystemMessage,trim_messages
trimmer=trim_messages(
    max_tokens=45,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human"
)
messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm Amal"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]
trimmer.invoke(messages)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like vanilla ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='whats 2 + 2', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

In [None]:
from operator import itemgetter

from langchain_core.runnables import RunnablePassthrough

chain=(
    RunnablePassthrough.assign(messages=itemgetter("messages")|trimmer)
    | prompt
    | model

)

response=chain.invoke(
    {
    "messages":messages + [HumanMessage(content="What ice cream do i like")],
    "language":"English"
    }
)
response.content

"As a large language model, I have no memory of past conversations or personal information about you, including your ice cream preferences.\n\nWhat's your favorite flavor? 😊  \n"

In [None]:
response = chain.invoke(
    {
        "messages": messages + [HumanMessage(content="what math problem did i ask")],
        "language": "English",
    }
)
response.content

'You asked "what\'s 2 + 2"  😊  \n\n\n\nIs there anything else I can help you with?\n'

**Note:**

- The model remembers the math problem ("what’s 2 + 2") because that message appears near the end of the conversation, and the  trimming strategy keeps the most recent messages within a 45-token limit. Since the math question and its answer are still within that limit, they are included when the model is called.
- In contrast, earlier messages like "I like vanilla ice cream" are further back in the history and may be trimmed out, so the model can no longer see or remember them.










In [None]:
## Lets wrap this in the MEssage History
with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages",
)
config={"configurable":{"session_id":"chat5"}}

In [None]:
response = with_message_history.invoke(
    {
        "messages": messages + [HumanMessage(content="whats my name?")],
        "language": "English",
    },
    config=config,
)

response.content

"As an AI, I don't have memory of past conversations or personal information about you.  So I don't know your name. 😊\n\nWhat's your name?\n"

**Note:**
- The reason the model replied:“As an AI, I don’t have memory.”is because this was the first time the session ID "chat5" was used.



# **Mini Customer Support chatbot**

In [None]:


trimmer = trim_messages(
    max_tokens=100,
    strategy="last",  # Keep the most recent messages
    token_counter=model,
    include_system=True,
    allow_partial=False
)


prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful customer support assistant."),
    MessagesPlaceholder(variable_name="messages")
])

chain = (
    RunnablePassthrough.assign(messages=itemgetter("messages") | trimmer)
    | prompt
    | model
)

messages = [
    SystemMessage(content="You are a helpful customer support assistant."),
    HumanMessage(content="Hi, I need help with a refund."),
    AIMessage(content="Sure, can you share your order number?"),
    HumanMessage(content="It's #12345"),
    AIMessage(content="Thanks. I see your order."),
    HumanMessage(content="Also, I was charged twice."),
    AIMessage(content="Sorry to hear that. We'll fix it."),
    HumanMessage(content="And I never received the item."),
    AIMessage(content="Let me check that for you."),
    HumanMessage(content="Can I still get my money back?"),
]

response = chain.invoke({
    "messages": messages + [HumanMessage(content="When can I expect a refund?")],
})

print("Bot:", response.content)


Bot: I understand your frustration.  Let me look into the details of your order and investigate why you were charged twice and didn't receive the item.  

Once I have more information, I can give you a more accurate timeframe for when you can expect a refund. 

Please bear with me while I look into this for you.



- In this customer **support chatbot**, token **trimming** is used to keep the conversation within a safe limit by automatically removing older messages and preserving the most recent ones.
- This ensures that the chatbot focuses only on the relevant part of the conversation, especially during long chats.
- When combined with message history, the system can track each user’s session separately while still **applying token limits** to avoid exceeding the model’s capacity.
- This approach keeps the **chatbot efficient, responsive, and context-aware**, even during extended support interactions.

# **6- Retrieval-Augmented Generation (RAG)**

In [None]:

os.environ["GROQ_API_KEY"]="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"
groq_api_key=os.environ["GROQ_API_KEY"]


llm=ChatGroq(groq_api_key=groq_api_key,model_name="Llama3-8b-8192")

llm


ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7e84783a9950>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7e847b3acad0>, model_name='Llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [None]:

os.environ['HF_TOKEN']=hf_token
hf_token=os.environ['HF_TOKEN']
from langchain.embeddings import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

In [None]:

from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain



**overview:**
- we loaded a webpage and used **BeautifulSoup** to **parse and filter the HTML content**. By targeting specific class names, we extracted only the relevant sections like the title, header, and main content, allowing us to focus on useful information while ignoring unnecessary parts of the page.

In [None]:
import bs4
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

docs=loader.load()
docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

In [None]:
text_splitter=RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
splits=text_splitter.split_documents(docs)
vectorstore=Chroma.from_documents(documents=splits,embedding=embeddings)
retriever=vectorstore.as_retriever()
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7e8471708250>, search_kwargs={})

In [None]:
## Prompt Template
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [None]:
question_answer_chain=create_stuff_documents_chain(llm,prompt)
rag_chain=create_retrieval_chain(retriever,question_answer_chain)

# **Note:**
- This setup creates a RAG-based question-answering system that retrieves relevant information and generates concise answers using a language model. It enhances the accuracy and relevance of responses by grounding them in real-world data, making it ideal for tasks like document-based chat, knowledge assistants, and smart search systems.










In [None]:
response=rag_chain.invoke({"input":"What is Self-Reflection"})
response

{'input': 'What is Self-Reflection',
 'context': [Document(id='1a91b406-1322-4799-a773-4fdb6f3ff976', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Self-Reflection#\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\nThought: ...\nAction: ...\nObservation: ...\n... (Repeated many times)'),
  Document(id='f1ef5291-a353-4fb7-8d

In [None]:
response['answer']

'Self-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.'

In [None]:
rag_chain.invoke({"input":"Howw do we achieve it"})

{'input': 'Howw do we achieve it',
 'context': [Document(id='7d9a4f06-8eba-455a-bc01-4c1142ff7dca', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='}\n]\nChallenges#\nAfter going through key ideas and demos of building LLM-centered agents, I start to see a couple common limitations:'),
  Document(id='93591d70-cbd2-44c6-9fdc-08ed0e12d7c7', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to com

In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [None]:
history_aware_retriever=create_history_aware_retriever(llm,retriever,contextualize_q_prompt)
history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7e8471708250>, search_kwargs={}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChu

In [None]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [None]:
question_answer_chain=create_stuff_documents_chain(llm,qa_prompt)
rag_chain=create_retrieval_chain(history_aware_retriever,question_answer_chain)

In [None]:
from langchain_core.messages import AIMessage,HumanMessage
chat_history=[]
question="What is Self-Reflection"
response1=rag_chain.invoke({"input":question,"chat_history":chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=response1["answer"])
    ]
)

question2="Tell me more about it?"
response2=rag_chain.invoke({"input":question,"chat_history":chat_history})
print(response2['answer'])

According to the context, Self-Reflection is a mechanism that allows agents to reflect on their past actions and correct previous mistakes, enabling them to improve iteratively.


In [None]:
chat_history

[HumanMessage(content='What is Self-Reflection', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Self-Reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.', additional_kwargs={}, response_metadata={})]

# **Note:**
- This setup demonstrates a conversational RAG (Retrieval-Augmented Generation) system that can understand and answer questions based not only on retrieved documents but also on previous interactions.
- The first user question is answered using relevant content retrieved from a knowledge source. That question and answer are saved as chat history.
- When the user asks a follow-up question like “Tell me more about it?”, the system uses the stored history to understand what “it” refers to and generate a meaningful, context-aware answer.
-This shows how combining document retrieval with chat history enables more natural, coherent, and intelligent multi-turn conversations.

In [None]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [None]:
conversational_rag_chain.invoke(
    {"input": "What is Task Decomposition?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

'Task decomposition is the process of breaking down a complex task into smaller, more manageable subtasks. This can be done through various methods, such as using language models like LLM with simple prompting or by relying on external classical planners. Task decomposition is essential for planning and achieving complex goals, as it allows agents to focus on one step at a time and make progress towards completing the overall task.'

In [None]:
conversational_rag_chain.invoke(
    {"input": "who is the president of US ?"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]

"I don't know."

In [None]:
conversational_rag_chain.invoke(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]

'Common ways of task decomposition include using language models with simple prompts, task-specific instructions, or human inputs.'

In [None]:
store

{'abc123': InMemoryChatMessageHistory(messages=[HumanMessage(content='What is Task Decomposition?', additional_kwargs={}, response_metadata={}), AIMessage(content='Task decomposition is the process of breaking down a complex task into smaller, more manageable subtasks. This can be done through various methods, such as using language models like LLM with simple prompting or by relying on external classical planners. Task decomposition is essential for planning and achieving complex goals, as it allows agents to focus on one step at a time and make progress towards completing the overall task.', additional_kwargs={}, response_metadata={}), HumanMessage(content='who is the president of US ?', additional_kwargs={}, response_metadata={}), AIMessage(content="I don't know.", additional_kwargs={}, response_metadata={}), HumanMessage(content='What are common ways of doing it?', additional_kwargs={}, response_metadata={}), AIMessage(content='Common ways of task decomposition include using langua

# **Note:**
- The **conversational AI** was able to answer the question “What are common ways of doing it?” because it had access to previous context in the chat history—specifically, a prior question about “Task Decomposition.”
- This allowed the model to understand what “it” referred to and retrieve relevant information from the external source (the linked article) that discussed strategies for task decomposition.
- In contrast, when asked “**Who is the president of the US**?”, the model responded “**I don’t know**” because that information was not present in the external document. RAG systems can only answer based on what they retrieve, so if the knowledge isn’t in the source documents, the model cannot provide an answer—even if the question is clear.

# **Mini AI Study Chatbot  with RAG + Chat History**



In [None]:



session_id = "student_ai_study"

question1 = "What is Task Decomposition?"
response1 = conversational_rag_chain.invoke(
    {"input": question1},
    config={"configurable": {"session_id": session_id}},
)
print("👩‍🎓", question1)
print("🤖", response1["answer"])


question2 = "What are some common ways of doing it?"
response2 = conversational_rag_chain.invoke(
    {"input": question2},
    config={"configurable": {"session_id": session_id}},
)
print("\n👩‍🎓", question2)
print("🤖", response2["answer"])

question3 = "What evidence supports the existence of dark matter in galaxies?"
response3 = conversational_rag_chain.invoke(
    {"input": question3},
    config={"configurable": {"session_id": session_id}},
)
print("\n👩‍🎓", question3)
print("🤖", response3["answer"])


👩‍🎓 What is Task Decomposition?
🤖 Task decomposition is the process of breaking down a complex task into smaller, more manageable subtasks.

👩‍🎓 What are some common ways of doing it?
🤖 Common ways of doing task decomposition include using LLM with simple prompting, task-specific instructions, or human inputs.

👩‍🎓 What evidence supports the existence of dark matter in galaxies?
🤖 I don't know.


# **Conclusion**
we have built a sophisticated Q&A chatbot with several advanced capabilities:

1. **Context-Aware Conversations**: Your chatbot remembers previous exchanges and maintains coherent dialogues
2. **Multi-User Support**: The system can handle multiple conversation sessions simultaneously
3. **Multilingual Communication**: Your assistant can converse in various languages
4. **Memory Management**: The chatbot efficiently handles conversation history
5. **Knowledge Integration**: External information can be


# **Final Thoughts**

The field of conversational AI is evolving rapidly, with new models and techniques emerging regularly. By understanding the fundamental components covered in this project, we have built a strong foundation that will allow us to adapt to these changes and create increasingly sophisticated AI applications.

Building effective AI systems is an iterative process that benefits from continuous testing and refinement. As we  continue to experiment and learn, the  chatbots will become more capable, helpful, and natural in their interactions.

**Thank you for exploring this fascinating field with us!**