# Module 5: Persistence & Short-Term Memory in LangChain 1.0

**Building on Previous Modules:**
- Module 1-3: Built agents, workflows, and tools
- Module 4: Connected to external data sources
- Module 5: **Make your agents remember conversations!**

**What you'll learn:**
- 🧠 Short-term memory concepts
- 💾 LangGraph persistence with checkpointing
- 💬 LangChain conversation memory
- ✂️ Message trimming strategies
- 📝 Message summarization
- 🎯 Production-ready memory patterns

**Real HR Use Case:**
Build an HR assistant that:
- Remembers conversation context
- Handles multi-turn conversations
- Manages long conversation histories
- Maintains state across interactions

**Time:** 2-3 hours

## Setup: Install Dependencies

In [None]:
# Install LangChain 1.0 and required packages
!pip install --pre -U langchain langchain-openai langgraph langchain-community
!pip install langgraph-checkpoint-sqlite  # For SQLite persistence

## Setup: Configure API Keys & Imports

In [None]:
from google.colab import userdata
import os

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

# Common imports
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from typing import Annotated

print("✅ Setup complete!")

---
# Part 1: Understanding Memory Concepts 🧠

## Why Memory Matters

Without memory, every conversation is like meeting someone for the first time:

```
User: "My name is Priya"
Agent: "Hello! How can I help you?"

User: "What's my name?" 
Agent: "I don't know your name."  ❌ NO MEMORY
```

With memory:
```
User: "My name is Priya"
Agent: "Hello Priya! How can I help you?"

User: "What's my name?"
Agent: "Your name is Priya!"  ✅ HAS MEMORY
```

## Types of Memory

### 1. **Short-Term Memory** (This Module)
- Remembers current conversation
- Stored in thread/session
- Cleared when session ends
- Example: "What did I just ask?"

### 2. **Long-Term Memory** (Future Module)
- Remembers across sessions
- Stored in database
- Persists permanently
- Example: "Remember my preferences"

---
# Part 2: LangGraph Persistence (Checkpointing) 💾

**Key Concept:** LangGraph saves "checkpoints" of the graph state at each step.

**Use Cases:**
- Resume interrupted workflows
- Time-travel through execution
- Human-in-the-loop approvals
- Conversation memory

## Lab 1.1: Basic Checkpointing with InMemorySaver

In [None]:
from langgraph.checkpoint.memory import InMemorySaver

# Create HR assistant tools
@tool
def get_employee_leave(employee_id: Annotated[str, "Employee ID"]) -> str:
    """Get leave balance for an employee."""
    leave_data = {
        "101": "Priya Sharma: 12 days",
        "102": "Rahul Verma: 8 days",
        "103": "Anjali Patel: 15 days"
    }
    return leave_data.get(employee_id, "Employee not found")

# Create agent WITH memory
checkpointer = InMemorySaver()  # Stores checkpoints in memory

hr_agent = create_agent(
    model="openai:gpt-4o-mini",
    tools=[get_employee_leave],
    checkpointer=checkpointer,  # Enable memory!
    prompt="You are an HR assistant. Be helpful and remember conversation context."
)

print("✅ HR Agent with memory created!")

### Test Memory with Thread IDs

In [None]:
# Thread ID identifies a conversation
config = {"configurable": {"thread_id": "conversation_1"}}

# First message
print("=" * 70)
print("Turn 1: Asking about Priya")
print("=" * 70)
result1 = hr_agent.invoke(
    {"messages": [{"role": "user", "content": "How many leave days does employee 101 have?"}]},
    config
)
print(result1['messages'][-1].content)

# Follow-up question - tests memory!
print("\n" + "=" * 70)
print("Turn 2: Follow-up question (tests memory!)")
print("=" * 70)
result2 = hr_agent.invoke(
    {"messages": [{"role": "user", "content": "Who did I just ask about?"}]},
    config  # Same thread_id!
)
print(result2['messages'][-1].content)
print("\n✅ Agent remembered the previous question!")

### Understanding Thread Isolation

In [None]:
# Different thread = Different conversation
config2 = {"configurable": {"thread_id": "conversation_2"}}

print("=" * 70)
print("New Thread: Different conversation")
print("=" * 70)
result3 = hr_agent.invoke(
    {"messages": [{"role": "user", "content": "Who did I just ask about?"}]},
    config2  # Different thread_id!
)
print(result3['messages'][-1].content)
print("\n✅ New thread has no memory of previous conversation!")

## Lab 1.2: Persistent Checkpointing with SQLite

**InMemorySaver** loses data when program restarts.  
**SqliteSaver** persists to disk!

In [None]:
from langgraph.checkpoint.sqlite import SqliteSaver

# Create SQLite checkpointer - survives restarts!
with SqliteSaver.from_conn_string(":memory:") as checkpointer:
    persistent_agent = create_agent(
        model="openai:gpt-4o-mini",
        tools=[get_employee_leave],
        checkpointer=checkpointer,
        prompt="You are an HR assistant with persistent memory."
    )
    
    config = {"configurable": {"thread_id": "persistent_conversation"}}
    
    # First interaction
    result = persistent_agent.invoke(
        {"messages": [{"role": "user", "content": "I'm employee 102, Rahul"}]},
        config
    )
    print("Turn 1:", result['messages'][-1].content)
    
    # Later interaction (same thread)
    result = persistent_agent.invoke(
        {"messages": [{"role": "user", "content": "What's my name?"}]},
        config
    )
    print("\nTurn 2:", result['messages'][-1].content)

print("\n✅ SQLite checkpointer preserves state!")
print("💡 In production, use PostgreSQL for scalability")

---
# Part 3: LangChain Short-Term Memory 💬

**Focus:** Managing conversation history in agents.

**Challenge:** LLMs have limited context windows!
- GPT-4: ~8K-128K tokens
- Long conversations exceed limits
- Performance degrades with long context

**Solutions:**
1. Trim messages (remove old ones)
2. Summarize conversation
3. Delete specific messages

## Lab 2.1: Basic Conversation Memory

In [None]:
# Create simple HR agent with conversation memory
hr_chatbot = create_agent(
    model="openai:gpt-4o-mini",
    tools=[],
    checkpointer=InMemorySaver(),
    prompt="""You are a friendly HR assistant. 
    Help employees with their queries and remember the conversation context."""
)

config = {"configurable": {"thread_id": "hr_chat_1"}}

# Multi-turn conversation
def chat(message: str):
    result = hr_chatbot.invoke(
        {"messages": [{"role": "user", "content": message}]},
        config
    )
    print(f"User: {message}")
    print(f"Assistant: {result['messages'][-1].content}\n")

print("=" * 70)
print("HR Chatbot Conversation")
print("=" * 70 + "\n")

chat("Hi! I'm Priya from the Engineering department.")
chat("I want to apply for leave next week.")
chat("What documents do I need?")
chat("What's my name and department again?")  # Tests memory!

print("✅ Agent maintained context throughout conversation!")

## Lab 2.2: Message Trimming - Managing Long Conversations

**Problem:** Long conversations exceed token limits.  
**Solution:** Trim older messages, keep recent ones.

In [None]:
from langchain_core.messages.utils import trim_messages, count_tokens_approximately
from langchain_core.messages import BaseMessage
from langchain.agents import AgentState

def pre_model_hook(state: AgentState) -> dict[str, list[BaseMessage]]:
    """
    Trim messages before sending to LLM.
    Keeps only recent messages within token limit.
    """
    trimmed = trim_messages(
        state["messages"],
        strategy="last",  # Keep last N tokens
        token_counter=count_tokens_approximately,
        max_tokens=500,  # Keep only 500 tokens
        start_on="human",
        end_on=("human", "tool"),
    )
    return {"llm_input_messages": trimmed}

# Create agent with trimming
trimming_agent = create_agent(
    model="openai:gpt-4o-mini",
    tools=[],
    pre_model_hook=pre_model_hook,  # Trim before each LLM call
    checkpointer=InMemorySaver(),
    prompt="You are an HR assistant."
)

config = {"configurable": {"thread_id": "trimming_test"}}

print("=" * 70)
print("Testing Message Trimming")
print("=" * 70 + "\n")

# Send many messages to fill context
for i in range(10):
    trimming_agent.invoke(
        {"messages": [{"role": "user", "content": f"Message number {i+1}"}]},
        config
    )
    print(f"Sent message {i+1}")

# Ask about early message (should be trimmed)
result = trimming_agent.invoke(
    {"messages": [{"role": "user", "content": "What was my first message?"}]},
    config
)

print(f"\nAgent response: {result['messages'][-1].content}")
print("\n✅ Older messages were trimmed to stay within token limit!")

## Lab 2.3: Message Deletion - Removing Specific Messages

**Use Case:** Remove sensitive information or unwanted messages.

In [None]:
from langchain_core.messages import RemoveMessage

def delete_old_messages(state: AgentState):
    """
    Delete oldest 2 messages if we have more than 4.
    Keeps conversation history manageable.
    """
    messages = state["messages"]
    if len(messages) > 4:
        # Remove the oldest 2 messages
        return {"messages": [RemoveMessage(id=m.id) for m in messages[:2]]}
    return {}

# Create agent with message deletion
deleting_agent = create_agent(
    model="openai:gpt-4o-mini",
    tools=[],
    post_model_hook=delete_old_messages,  # Delete after each response
    checkpointer=InMemorySaver(),
    prompt="You are an HR assistant."
)

config = {"configurable": {"thread_id": "deletion_test"}}

print("=" * 70)
print("Testing Message Deletion")
print("=" * 70 + "\n")

# Send several messages
messages = [
    "My name is Priya",
    "I work in Engineering",
    "I need to check my leave balance",
    "Can you help me?"
]

for msg in messages:
    result = deleting_agent.invoke(
        {"messages": [{"role": "user", "content": msg}]},
        config
    )
    print(f"✓ Sent: {msg}")
    print(f"  Messages in state: {len(result['messages'])}")

print("\n✅ Old messages automatically deleted to keep history manageable!")

## Lab 2.4: Message Summarization - Best of Both Worlds

**Problem:** Trimming loses information.  
**Solution:** Summarize old messages instead of deleting them!

In [None]:
# Note: This requires langmem package
# !pip install langmem

try:
    from langmem.short_term import SummarizationNode, RunningSummary
    from langchain.agents import AgentState
    from typing import TypedDict
    
    # Custom state with context for summaries
    class StatefulAgentState(AgentState):
        context: dict[str, RunningSummary]
    
    # Create summarization node
    model = ChatOpenAI(model="gpt-4o-mini")
    summarizer = SummarizationNode(
        token_counter=count_tokens_approximately,
        model=model,
        max_tokens=500,
        max_summary_tokens=150,
        output_messages_key="llm_input_messages",
    )
    
    # Create agent with summarization
    summarizing_agent = create_agent(
        model=model,
        tools=[],
        pre_model_hook=summarizer,  # Summarize before LLM call
        state_schema=StatefulAgentState,
        checkpointer=InMemorySaver(),
    )
    
    config = {"configurable": {"thread_id": "summary_test"}}
    
    print("=" * 70)
    print("Testing Message Summarization")
    print("=" * 70 + "\n")
    
    # Long conversation
    summarizing_agent.invoke({"messages": "Hi, I'm Priya from Engineering department"}, config)
    summarizing_agent.invoke({"messages": "I joined the company in 2020"}, config)
    summarizing_agent.invoke({"messages": "I'm a Senior Developer working on AI projects"}, config)
    summarizing_agent.invoke({"messages": "I have 12 days of leave remaining this year"}, config)
    
    # Check if context is maintained
    result = summarizing_agent.invoke({"messages": "Tell me everything about me"}, config)
    
    print(f"Agent response: {result['messages'][-1].content}")
    
    if 'context' in result and 'running_summary' in result['context']:
        print(f"\nSummary maintained: {result['context']['running_summary'].summary}")
    
    print("\n✅ Conversation summarized - no information lost!")
    
except ImportError:
    print("⚠️  langmem not installed. Run: pip install langmem")
    print("Summarization provides the best balance between context and token limits!")

## Lab 2.5: Accessing Memory in Tools

**Use Case:** Tools need access to conversation state.

In [None]:
from langchain.agents.tool_node import InjectedState
from langchain.agents import AgentState
from typing import Annotated

# Custom state with user info
class HRAgentState(AgentState):
    employee_id: str
    department: str

@tool
def check_my_leave(
    state: Annotated[HRAgentState, InjectedState]
) -> str:
    """Check leave balance for the current employee."""
    # Access state directly in tool!
    employee_id = state.get("employee_id", "unknown")
    department = state.get("department", "unknown")
    
    leave_data = {
        "101": 12,
        "102": 8,
        "103": 15
    }
    
    days = leave_data.get(employee_id, 0)
    return f"Employee {employee_id} from {department} has {days} days of leave."

# Create agent with state access
stateful_agent = create_agent(
    model="openai:gpt-4o-mini",
    tools=[check_my_leave],
    state_schema=HRAgentState,
    checkpointer=InMemorySaver(),
)

# Invoke with initial state
result = stateful_agent.invoke({
    "messages": "Check my leave balance",
    "employee_id": "101",
    "department": "Engineering"
})

print("=" * 70)
print("Tool with State Access")
print("=" * 70)
print(result['messages'][-1].content)
print("\n✅ Tool accessed employee state from conversation!")

## Lab 2.6: Production Pattern - Complete HR Assistant

**Combining all memory patterns for production use.**

In [None]:
from langchain.agents import AgentState
from langchain_core.messages import BaseMessage

# Define comprehensive state
class ProductionHRState(AgentState):
    employee_id: str = ""
    employee_name: str = ""
    department: str = ""
    conversation_summary: str = ""

# Trim hook
def production_trim_hook(state: ProductionHRState) -> dict:
    """Keep last 10 messages to manage token usage."""
    messages = state["messages"]
    if len(messages) > 10:
        return {"llm_input_messages": messages[-10:]}
    return {"llm_input_messages": messages}

# HR Tools
@tool
def get_employee_info(state: Annotated[ProductionHRState, InjectedState]) -> str:
    """Get current employee information from state."""
    name = state.get("employee_name", "Unknown")
    dept = state.get("department", "Unknown")
    emp_id = state.get("employee_id", "Unknown")
    return f"Employee: {name} (ID: {emp_id}) from {dept} department"

@tool
def apply_leave(days: Annotated[int, "Number of days"]) -> str:
    """Apply for leave."""
    return f"Leave application for {days} days submitted successfully!"

# Create production agent
production_agent = create_agent(
    model="openai:gpt-4o-mini",
    tools=[get_employee_info, apply_leave],
    state_schema=ProductionHRState,
    pre_model_hook=production_trim_hook,
    checkpointer=InMemorySaver(),
    prompt="""You are a professional HR assistant.
    
    Capabilities:
    - Remember employee information throughout conversation
    - Help with leave applications
    - Maintain conversation context
    
    Always be professional and helpful."""
)

# Test production agent
config = {"configurable": {"thread_id": "production_session_1"}}

print("=" * 70)
print("Production HR Assistant")
print("=" * 70 + "\n")

# First interaction
result = production_agent.invoke({
    "messages": "Hi, I'm Priya Sharma, employee 101 from Engineering",
    "employee_id": "101",
    "employee_name": "Priya Sharma",
    "department": "Engineering"
}, config)
print(f"Turn 1: {result['messages'][-1].content}\n")

# Follow-up
result = production_agent.invoke(
    {"messages": "I want to apply for 5 days of leave"},
    config
)
print(f"Turn 2: {result['messages'][-1].content}\n")

# Context check
result = production_agent.invoke(
    {"messages": "What's my employee information?"},
    config
)
print(f"Turn 3: {result['messages'][-1].content}")

print("\n✅ Production agent with full memory capabilities!")

---
# Summary & Best Practices

## Memory Strategies Comparison

| Strategy | Pros | Cons | Use When |
|----------|------|------|----------|
| **Basic Memory** | Simple, complete history | Exceeds token limits | Short conversations |
| **Trim Messages** | Stays under limit | Loses old info | Long conversations, recent context matters |
| **Delete Messages** | Fine control | Manual management | Need to remove specific messages |
| **Summarization** | Keeps all info, manages tokens | More complex, extra LLM call | Long conversations, need full context |

## Production Checklist

✅ **Use persistent checkpointer** (SQLite/PostgreSQL, not InMemory)  
✅ **Implement token management** (trim or summarize)  
✅ **Use thread IDs** for conversation isolation  
✅ **Add custom state** for application-specific data  
✅ **Inject state into tools** for context-aware operations  
✅ **Monitor token usage** in production  

## Key Takeaways

1. **LangGraph Persistence**
   - Checkpoints save state at each step
   - Thread IDs isolate conversations
   - Essential for stateful workflows

2. **LangChain Memory**
   - Manages conversation history
   - Multiple strategies for token management
   - Hooks allow customization

3. **Production Patterns**
   - Combine persistence + memory
   - Use database checkpointers
   - Implement token management
   - Add custom state for app data

## Next Steps

- Explore long-term memory (cross-session)
- Implement semantic memory (vector stores)
- Add memory to multi-agent systems
- Deploy with production databases

---

**Remember:** Good memory management is crucial for production agents!

# Exercises

## Exercise 1: Custom Trimming Strategy
Create a trimming function that keeps:
- System message (always)
- Last 3 user messages
- Last 3 assistant messages
- All tool call messages

## Exercise 2: Department-Specific Memory
Build an agent where state includes:
- employee_id
- department
- recent_requests (list)

Create tools that use this state.

## Exercise 3: Multi-User System
Create a system that manages conversations for multiple employees:
- Different thread_id per employee
- Shared knowledge base
- Employee-specific state

## Exercise 4: Smart Summarization
Implement a system that:
- Summarizes every 10 messages
- Keeps summaries in state
- Uses summaries when answering questions

## Bonus: Production Deployment
Deploy the production agent with:
- PostgreSQL checkpointer
- Token usage monitoring
- Multi-user support
- Error handling and recovery