In [15]:
import os
import pprint
from langchain.agents import create_agent
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
load_dotenv(dotenv_path="/home/bipin/Documents/genai/bot/learning/.env")
llm = ChatOpenAI(model="gpt-5-nano")

In [16]:
simple_agent = create_agent(
    model=llm,
    tools=[],  # No tools for now, just conversation
    system_prompt="You are a helpful assistant. Answer questions concisely."
)

print("Simple agent created successfully!")

Simple agent created successfully!


In [17]:
response1 = simple_agent.invoke({
    "messages": [{"role": "user", "content": "Hi! My name is Bipin."}]
})

In [18]:
response1

{'messages': [HumanMessage(content='Hi! My name is Bipin.', additional_kwargs={}, response_metadata={}, id='65be5ff8-690c-42ba-8586-a9bbb09a6e58'),
  AIMessage(content='Nice to meet you, Bipin! How can I help today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 279, 'prompt_tokens': 30, 'total_tokens': 309, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 256, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-5-nano-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-CnxwBgcDJcnPNXhGm6d96AX8tWnc1', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019b2f43-6ffa-7f52-b3b4-9a06ad57bcd4-0', usage_metadata={'input_tokens': 30, 'output_tokens': 279, 'total_tokens': 309, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 

In [20]:
response1['messages'][-1].content

'Nice to meet you, Bipin! How can I help today?'

In [21]:
response2 = simple_agent.invoke({
    "messages": [{"role": "user", "content": "What's my name?"}]
})

In [22]:
response2['messages'][-1].content

'I don’t know your name. What would you like me to call you? Tell me your name and I’ll use it in this chat.'

## The Problem - No Memory!

### What Just Happened?

You probably noticed the chatbot **doesn't remember your name**! This is because:

1. Each `invoke()` call is independent
2. We only sent the current message, not the conversation history
3. The agent has no way to remember previous interactions

### Why This Is a Problem

- Users expect chatbots to remember context
- Real conversations build on previous messages
- Without memory, the chatbot feels robotic and unhelpful

### The Solution: Short-Term Memory

We need to:
1. Store conversation history
2. Send the full history with each new message
3. Use a "checkpointer" to persist the conversation

# Part2  Adding short term memeory

### Step 1: Import Memory Components

* MemorySaver stores conversation history in memory
* For production, you'd use a database checkpointer (SQLite, Postgres, etc.)

In [2]:
from langgraph.checkpoint.memory import MemorySaver

In [23]:
# Create a checkpointer to store conversation history
memory = MemorySaver()

# Create an agent WITH memory
agent_with_memory = create_agent(
    model=llm,
    tools=[],
    system_prompt="You are a helpful assistant. Remember details from our conversation.",
    checkpointer=memory  # This enables memory!
)

print("Agent with memory created!")

Agent with memory created!


### Step 3: Understanding Thread IDs

To maintain separate conversations, we use **thread IDs**. Think of a thread as a conversation session.

- Each thread has its own memory
- Same thread ID = same conversation
- Different thread ID = new conversation

In [24]:
# Configuration with thread ID
config = {"configurable": {"thread_id": "conversation_1"}}

# First message in the conversation
response1 = agent_with_memory.invoke(
    {"messages": [{"role": "user", "content": "Hi! My name is Bipin."}]},
    config  # Pass the config to maintain thread
)
response1["messages"][-1].content

'Nice to meet you, Bipin! I’ll remember your name in this chat. How can I help you today? \nWould you like to chat, learn something, get help with a task, plan something, or practice a skill? Tell me what you’re interested in or any problem you’re working on.'

In [25]:
# Second message - the bot should remember the name now!
response2 = agent_with_memory.invoke(
    {"messages": [{"role": "user", "content": "What's my name?"}]},
    config  # Same config = same conversation thread
)

print("Bot:", response2["messages"][-1].content)

Bot: Your name is Bipin. Nice to meet you again, Bipin! What would you like help with today?


In [26]:
# Let's have a longer conversation to test memory
response3 = agent_with_memory.invoke(
    {"messages": [{"role": "user", "content": "I like chicken and Butter"}]},
    config
)

response3["messages"][-1].content

'Nice to hear that, Bipin! Chicken with butter is a delicious combo. Want a quick recipe or just some ideas? Here are three easy options:\n\n1) Garlic Butter Chicken Skillet (about 20 minutes)\n- Ingredients: chicken breasts or thighs, salt, pepper, 2–3 tbsp olive oil, 3–4 tbsp butter, 3 cloves garlic, 1/2 cup chicken broth or wine, chopped parsley.\n- Steps: Season chicken and sear in oil until browned. Remove. In the same pan, melt butter, sauté garlic until fragrant, add broth to deglaze, simmer a couple minutes. Return chicken, cook through, finish with parsley.\n\n2) Lemon Butter Chicken (about 25 minutes)\n- Ingredients: chicken thighs, salt, pepper, 2–3 tbsp butter, 2 cloves garlic, 1/2 cup chicken broth, juice of 1 lemon, thyme.\n- Steps: Sear chicken until golden. Remove. In same pan, melt butter, sauté garlic, add broth and lemon juice, simmer a few minutes, return chicken, cook until done, finish with thyme.\n\n3) Creamy Butter Chicken (Murgh Makhani) — a bit more involved\n

In [27]:
print(response3["messages"][-1].content)

Nice to hear that, Bipin! Chicken with butter is a delicious combo. Want a quick recipe or just some ideas? Here are three easy options:

1) Garlic Butter Chicken Skillet (about 20 minutes)
- Ingredients: chicken breasts or thighs, salt, pepper, 2–3 tbsp olive oil, 3–4 tbsp butter, 3 cloves garlic, 1/2 cup chicken broth or wine, chopped parsley.
- Steps: Season chicken and sear in oil until browned. Remove. In the same pan, melt butter, sauté garlic until fragrant, add broth to deglaze, simmer a couple minutes. Return chicken, cook through, finish with parsley.

2) Lemon Butter Chicken (about 25 minutes)
- Ingredients: chicken thighs, salt, pepper, 2–3 tbsp butter, 2 cloves garlic, 1/2 cup chicken broth, juice of 1 lemon, thyme.
- Steps: Sear chicken until golden. Remove. In same pan, melt butter, sauté garlic, add broth and lemon juice, simmer a few minutes, return chicken, cook until done, finish with thyme.

3) Creamy Butter Chicken (Murgh Makhani) — a bit more involved
- A flavorfu

In [28]:
response4 = agent_with_memory.invoke(
    {"messages": [{"role": "user", "content": "What do I like?"}]},
    config
)

print("Bot:", response4["messages"][-1].content)

Bot: From our chat, you like chicken and butter. So buttery, flavorful chicken dishes are right up your alley, especially quick-to-make ones.

Would you like me to tailor a recipe or a short menu around those prefs? Tell me your available time, ingredients on hand, and how spicy or creamy you like it, and I’ll customize.


In [30]:
len(response4['messages'])

8

In [None]:
print("\n=== Full Conversation History ===")
for i, msg in enumerate(response4["messages"], 1):
    role = msg.__class__.__name__.replace("Message", "")
    content = msg.content
    print(f"{i}. [{role}]: {content}")

In [None]:
new_config = {"configurable": {"thread_id": "conversation_2"}}

response_new = agent_with_memory.invoke(
    {"messages": [{"role": "user", "content": "What's my name?"}]},
    new_config
)

print("Bot (new conversation):", response_new["messages"][-1].content)

In [None]:
def chat_with_bot():

    
    # Create a new conversation thread
    thread_config = {"configurable": {"thread_id": "session 1"}}
    
    while True:
        # Get user input
        user_message = input("You: ")
        
        # Exit condition
        if user_message.lower() in ['quit', 'exit', 'bye']:
            print("Goodbye!")
            break
        
        # Send message to agent
        response = agent_with_memory.invoke(
            {"messages": [{"role": "user", "content": user_message}]},
            thread_config
        )
        
        # Print bot response
        print(f"Bot: {response['messages'][-1].content}\n")
# chat_with_bot()

###  The Memory Problem
- Without memory, chatbots can't maintain context
- Each conversation starts from scratch
- Poor user experience

### Key Concepts

1. **create_agent()** - The standard way to build agents in LangChain
2. **Checkpointer** - Saves conversation state (MemorySaver, SQLite, Postgres)
3. **Thread ID** - Identifies a conversation session
4. **Short-term memory** - Remembers within a conversation thread


### Behind the Scenes

When you use a checkpointer:

1. **First message**: Agent processes it and saves the state
2. **Second message**: Agent loads previous state, adds new message, processes, saves updated state
3. **Continue**: This repeats for every message

### Message Types

- **HumanMessage**: Messages from the user
- **AIMessage**: Messages from the assistant
- **SystemMessage**: Instructions for the agent (system prompt)
- **ToolMessage**: Results from tool executions

### The Cost Problem

As conversations grow longer (e.g., 20+ questions), the context window grows with every message. This leads to:

- **Higher API costs**: You're charged for every token in the conversation history
- **Slower responses**: More tokens = more processing time
- **Context window limits**: Eventually, you'll hit the model's maximum token limit

### The Solution: Two Intelligent Approaches

1. **Message Trimming**: Keep only recent messages (fast but loses context)
2. **Message Summarization**: Compress old messages into a summary (preserves context, slightly more cost)



## Approach 1: Message Trimming with Middleware

The simplest approach: automatically keep only the last N messages. LangChain provides middleware that runs **before** the model is called.

In [6]:
# Import required modules for message trimming
from langchain.messages import RemoveMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langchain.agents import AgentState
from langchain.agents.middleware import before_model
from langgraph.runtime import Runtime
from typing import Any

# Define a middleware function that trims messages
@before_model
def trim_messages_middleware(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    """
    Keep only the last few messages to fit context window.
    This runs BEFORE each model call.
    """
    messages = state["messages"]
    
    # If we have 5 or fewer messages, no need to trim
    if len(messages) <= 5:
        return None  # No changes needed
    
    # Keep the first message (usually system/initial context)
    first_msg = messages[0]
    
    # Keep the last 4 messages (2 turns of conversation)
    recent_messages = messages[-4:]
    
    # Create new message list
    new_messages = [first_msg] + recent_messages
    
    # Return the trimmed messages
    return {
        "messages": [
            RemoveMessage(id=REMOVE_ALL_MESSAGES),  # Remove all existing messages
            *new_messages  # Add back only the messages we want to keep
        ]
    }

print("Trim middleware function created!")

Trim middleware function created!


In [7]:
from langgraph.checkpoint.memory import MemorySaver

trim_memory = MemorySaver()

agent_with_trimming = create_agent(
    model=llm,
    tools=[],
    middleware=[trim_messages_middleware],  # Add our trimming middleware
    checkpointer=trim_memory
)

print("Agent with message trimming created!")

Agent with message trimming created!


In [8]:
# Simulate a long conversation
trim_config = {"configurable": {"thread_id": "trim_test"}}

# Send multiple messages
messages_to_send = [
    "Hi! My name is Bipin.",
    "I live in Mumbai.",
    "I work as a software engineer.",
    "I love playing cricket.",
    "My favorite food is biryani.",
    "I have a dog named Max.",
    "What's my name?",  # This should still work
    "Where do I work?",  # This might be forgotten due to trimming
]

print("Sending messages to agent with trimming...\n")

for i, user_msg in enumerate(messages_to_send, 1):
    response = agent_with_trimming.invoke(
        {"messages": [{"role": "user", "content": user_msg}]},
        trim_config
    )
    bot_response = response["messages"][-1].content
    
    print(f"{i}. You: {user_msg}")
    print(f"   Bot: {bot_response}")
    print(f"   Messages in history: {len(response['messages'])}")
    print()

Sending messages to agent with trimming...

1. You: Hi! My name is Bipin.
   Bot: Hi Bipin! Nice to meet you. How can I help today? I can answer questions, explain concepts, help with writing or coding, plan things, practice problems, or brainstorm ideas. Tell me a bit about what you’d like to do or the goal you have in mind.
   Messages in history: 2

2. You: I live in Mumbai.
   Bot: Nice to meet you, Bipin! Since you’re in Mumbai, I can tailor tips and info to your city. What would you like to do today? Here are a few quick ideas:

- Plan a weekend in Mumbai: top places to visit and easy routes to beat traffic
- Weather updates and monsoon tips
- Quick Marathi or Hindi phrases for daily use
- Food and dining ideas by neighborhood
- Help with coding, math, or exam/practice problems
- Local tips for living or commuting in the city

Tell me what you’re aiming for, and I’ll tailor the help.
   Messages in history: 4

3. You: I work as a software engineer.
   Bot: Nice to know, Bipin. Si

In [10]:
response

{'messages': [HumanMessage(content='Hi! My name is Bipin.', additional_kwargs={}, response_metadata={}, id='3ac2fa1f-710b-48d1-a3eb-a490e3da2218'),
  AIMessage(content='Nice! Max sounds like a champ. Here are a few ways I can help with Max and your cricket routine:\n\n- Quick dog-care checklist\n  - Vaccinations, microchip, flea/tick prevention, regular grooming\n  - Feeding amounts by weight, water hygiene, and a simple weekly weight check\n  - Safe chew toys and crate/space for downtime\n\n- Training and enrichment for Max\n  - Basic commands: sit, stay, recall, leash manners\n  - Fun activities: fetch, scent games, short agility-style drills in a park or backyard\n  - 5–10 minute mini-sessions you can slot between nets or practice days\n\n- Fitness-friendly ideas for you both\n  - Short warm-ups with Max before or after cricket practice (e.g., a brisk 5-minute walk, a few quick sprints with a leash, some recall work)\n  - Gentle play sessions to burn energy on rest days (tug, fetch,

### Key Observations with Trimming

- The agent remembers your name (recent messages)
- But it might forget earlier details like where you work (trimmed messages)
- The message count stays bounded (doesn't grow indefinitely)
- **Cost savings**: Only recent messages are sent to the LLM each time

### Pros and Cons of Trimming

**Pros:**
- Fast and simple
- Guaranteed cost savings
- No extra LLM calls needed

**Cons:**
- Loses information from trimmed messages
- May forget important context from earlier in the conversation


## Approach 2: Message Summarization (Intelligent Compression)

Instead of deleting old messages, we can **summarize** them! This preserves important context while reducing token count.

In [12]:
# Import summarization middleware
from langchain.agents.middleware import SummarizationMiddleware

# Create an agent with automatic summarization
summary_memory = MemorySaver()

agent_with_summarization = create_agent(
    model=llm,
    tools=[],
    middleware=[
        SummarizationMiddleware(
            model=llm,
            trigger=("tokens", 500),
            keep=("messages", 4),
        ),
    ],
    checkpointer=summary_memory
)

print("Agent with summarization middleware created!")

Agent with summarization middleware created!


In [13]:
# Test summarization with a long conversation
summary_config = {"configurable": {"thread_id": "summary_test"}}

# Same messages as before
messages_to_send = [
    "Hi! My name is Bipin.",
    "I live in Mumbai.",
    "I work as a software engineer.",
    "I love playing cricket.",
    "My favorite food is biryani.",
    "I have a dog named Max.",
    "I also enjoy reading science fiction books.",
    "My favorite author is Isaac Asimov.",
    "What's my name?",
    "What do you know about me?",  # This should include summarized info!
]

print("Sending messages to agent with summarization...\n")

for i, user_msg in enumerate(messages_to_send, 1):
    response = agent_with_summarization.invoke(
        {"messages": [{"role": "user", "content": user_msg}]},
        summary_config
    )
    bot_response = response["messages"][-1].content
    
    print(f"{i}. You: {user_msg}")
    print(f"   Bot: {bot_response[:150]}...")  # Truncate for readability
    print()

Sending messages to agent with summarization...

1. You: Hi! My name is Bipin.
   Bot: Hi Bipin! Nice to meet you. How can I help today? 

You can ask me to explain a topic, solve a problem, draft or edit something (emails, essays, resum...

2. You: I live in Mumbai.
   Bot: Nice to meet you, Bipin! Cool to know you’re in Mumbai—that’s a vibrant place with lots of possibilities. I can tailor help to your location. What wou...

3. You: I work as a software engineer.
   Bot: Nice to meet you again, Bipin. Since you’re a software engineer in Mumbai, I can help with resume/LinkedIn optimization for local roles, interview pre...

4. You: I love playing cricket.
   Bot: Awesome! Cricket is such a Mumbai-thing. I can help you weave it nicely with your software career. Here are a few options you might want:

- Find loca...

5. You: My favorite food is biryani.
   Bot: Nice pair of favorites! Biryani and cricket fit nicely into a Mumbai-based software career story. Here are a few ready-to-use s

In [14]:
response

{'messages': [HumanMessage(content='Here is a summary of the conversation to date:\n\n- Bipin, Mumbai-based software engineer.\n\n- Goals: career guidance for the Mumbai market; resume and LinkedIn optimization for local roles; interview preparation; upskilling/learning plan; overall career strategy; salary benchmarks; networking plan.\n\n- Requests: resume/LinkedIn review; mock interview; stack-specific learning plan; networking plan; salary benchmarks.\n\n- Personal branding elements: Max the dog; cricket; reading science fiction.\n\n- Current science fiction detail: favorite author Isaac Asimov.\n\n- Tailoring preferences: customize to stack and years of experience; target Mumbai roles/companies; tone preference (formal vs casual).\n\n- Immediate next step: draft a full-stack–specific resume and LinkedIn version that mentions Max; need stack and years of experience to tailor.', additional_kwargs={}, response_metadata={}, id='59c521a1-7008-413c-a5b0-262c4543b67d'),
  AIMessage(conten

### How Summarization Works

When the token count exceeds the threshold (500 tokens):

1. The middleware takes old messages (except the last 4)
2. Sends them to a cheaper model (gpt-4o-mini) with a prompt: "Summarize this conversation"
3. Replaces the old messages with a single summary message
4. Keeps recent messages intact for immediate context

**Result:** The agent remembers details from the beginning of the conversation (via summary) AND recent context!

---

## Comparing the Three Approaches

| Approach | Memory | Cost Efficiency | Context Retention | Best For |
|----------|--------|----------------|-------------------|----------|
| **No Management** | Unlimited growth | Poor (grows linearly) | Perfect | Short conversations |
| **Trimming** | Recent messages only | Excellent | Limited | Cost-sensitive, recent context matters |
| **Summarization** | Summary + recent | Good | Excellent | Long conversations, full context needed |