# LangChain Memory: Conversation Memory Patterns

## Introduction

**Memory** allows LLM applications to remember conversation history and maintain context across multiple interactions. Essential for chatbots and multi-turn conversations.

### What is Memory?

Memory enables:
- **Conversation continuity**: Remember what was said before
- **Context awareness**: Understand references ("it", "that", "the previous answer")
- **Personalization**: Remember user preferences
- **Multi-turn reasoning**: Build on previous exchanges
- **State management**: Track conversation state

### Memory vs Stateless

| Stateless (No Memory) | With Memory |
|----------------------|-------------|
| Each call independent | Remembers history |
| No context awareness | Contextual responses |
| Can't reference past | Can reference past |
| Simpler | More complex |

### When to Use Memory?

| ‚úÖ Use Memory For | ‚ùå Don't Use For |
|-------------------|------------------|
| Chatbots | Single Q&A |
| Multi-turn conversations | Independent requests |
| Context-dependent tasks | Stateless APIs |
| Personalized interactions | Batch processing |

---

## Installation & Setup

In [None]:
import os
from getpass import getpass

# Set API key
if not os.getenv("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API Key: ")

print("API key configured!")

---

## Example 1: No Memory (Baseline)

See the problem without memory:

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# Simple chain without memory
prompt = ChatPromptTemplate.from_template("{input}")
llm = ChatOpenAI(model="gpt-4")
chain = prompt | llm | StrOutputParser()

# First message
response1 = chain.invoke({"input": "My name is Alice"})
print(f"User: My name is Alice")
print(f"AI: {response1}\n")

# Second message - no memory!
response2 = chain.invoke({"input": "What's my name?"})
print(f"User: What's my name?")
print(f"AI: {response2}")
print("\n‚ùå Problem: AI doesn't remember the conversation!")

---

## Example 2: ConversationBufferMemory (Legacy)

Store all messages in memory:

In [None]:
from langchain.memory import ConversationBufferMemory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Create memory
memory = ConversationBufferMemory(return_messages=True)

# Create prompt with history placeholder
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

llm = ChatOpenAI(model="gpt-4")

# Helper function to manage memory
def chat(user_input: str) -> str:
    # Get history from memory
    history = memory.load_memory_variables({})["history"]
    
    # Create chain
    chain = prompt | llm | StrOutputParser()
    
    # Get response
    response = chain.invoke({"history": history, "input": user_input})
    
    # Save to memory
    memory.save_context({"input": user_input}, {"output": response})
    
    return response

# Conversation with memory
print("User: My name is Alice")
response1 = chat("My name is Alice")
print(f"AI: {response1}\n")

print("User: What's my name?")
response2 = chat("What's my name?")
print(f"AI: {response2}")
print("\n‚úÖ Success: AI remembers the conversation!")

---

## Example 3: RunnableWithMessageHistory (Modern Approach)

The recommended way to add memory in 2025:

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Create prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])

# Create chain
llm = ChatOpenAI(model="gpt-4")
chain = prompt | llm | StrOutputParser()

# Create message history store
store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# Wrap chain with message history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history"
)

# Configuration for session
config = {"configurable": {"session_id": "user_alice"}}

# Conversation
print("User: My name is Alice and I love Python")
response1 = chain_with_history.invoke(
    {"input": "My name is Alice and I love Python"},
    config=config
)
print(f"AI: {response1}\n")

print("User: What's my name?")
response2 = chain_with_history.invoke(
    {"input": "What's my name?"},
    config=config
)
print(f"AI: {response2}\n")

print("User: What programming language do I like?")
response3 = chain_with_history.invoke(
    {"input": "What programming language do I like?"},
    config=config
)
print(f"AI: {response3}")

---

## Example 4: Multiple Sessions

Manage separate conversations for different users:

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Setup (same as before)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])

chain = prompt | ChatOpenAI(model="gpt-4") | StrOutputParser()

store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history"
)

# User 1 conversation
config_alice = {"configurable": {"session_id": "alice"}}
response = chain_with_history.invoke(
    {"input": "My favorite color is blue"},
    config=config_alice
)
print(f"Alice: My favorite color is blue")
print(f"AI: {response}\n")

# User 2 conversation (different session)
config_bob = {"configurable": {"session_id": "bob"}}
response = chain_with_history.invoke(
    {"input": "My favorite color is red"},
    config=config_bob
)
print(f"Bob: My favorite color is red")
print(f"AI: {response}\n")

# Check Alice's memory
response = chain_with_history.invoke(
    {"input": "What's my favorite color?"},
    config=config_alice
)
print(f"Alice: What's my favorite color?")
print(f"AI: {response}\n")

# Check Bob's memory
response = chain_with_history.invoke(
    {"input": "What's my favorite color?"},
    config=config_bob
)
print(f"Bob: What's my favorite color?")
print(f"AI: {response}")

---

## Example 5: ConversationBufferWindowMemory

Keep only the last N messages:

In [None]:
from langchain.memory import ConversationBufferWindowMemory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# Keep only last 2 interactions (4 messages)
memory = ConversationBufferWindowMemory(
    k=2,  # Keep last 2 exchanges
    return_messages=True
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

llm = ChatOpenAI(model="gpt-4")

def chat(user_input: str) -> str:
    history = memory.load_memory_variables({})["history"]
    chain = prompt | llm | StrOutputParser()
    response = chain.invoke({"history": history, "input": user_input})
    memory.save_context({"input": user_input}, {"output": response})
    return response

# Multiple messages
messages = [
    "My name is Alice",
    "I live in New York",
    "I like Python programming",
    "What's my name?"  # This should work (within window)
]

for msg in messages:
    response = chat(msg)
    print(f"User: {msg}")
    print(f"AI: {response}\n")

# This might not remember name (outside window)
response = chat("What did I tell you first?")
print(f"User: What did I tell you first?")
print(f"AI: {response}")
print("\n‚ö†Ô∏è May not remember early messages (outside window)")

---

## Example 6: ConversationSummaryMemory

Summarize conversation to save tokens:

In [None]:
from langchain.memory import ConversationSummaryMemory
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# Create summary memory
llm = ChatOpenAI(model="gpt-4", temperature=0)
memory = ConversationSummaryMemory(
    llm=llm,
    return_messages=True
)

# Simulate conversation
memory.save_context(
    {"input": "Hi! My name is Alice and I'm a software engineer."},
    {"output": "Hello Alice! Nice to meet you. How can I help you today?"}
)

memory.save_context(
    {"input": "I work on Python backend services and I love design patterns."},
    {"output": "That's great! Design patterns are very useful in backend development."}
)

memory.save_context(
    {"input": "I'm particularly interested in the factory and observer patterns."},
    {"output": "Those are excellent choices for backend systems!"}
)

# Get summarized history
history = memory.load_memory_variables({})
print("Summarized conversation:")
print(history["history"])
print("\n‚úÖ Summary preserves key facts while reducing tokens")

---

## Example 7: ConversationSummaryBufferMemory

Hybrid: Keep recent messages + summary of old ones:

In [None]:
from langchain.memory import ConversationSummaryBufferMemory
from langchain_openai import ChatOpenAI

# Keep recent messages + summarize old ones
llm = ChatOpenAI(model="gpt-4", temperature=0)
memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=100,  # When to start summarizing
    return_messages=True
)

# Add many messages
conversations = [
    ({"input": "My name is Alice"}, {"output": "Nice to meet you, Alice!"}),
    ({"input": "I'm a Python developer"}, {"output": "That's great!"}),
    ({"input": "I work at a tech company"}, {"output": "Interesting!"}),
    ({"input": "I love building APIs"}, {"output": "APIs are fundamental!"}),
]

for inp, out in conversations:
    memory.save_context(inp, out)

# Check memory (should have summary + recent messages)
history = memory.load_memory_variables({})
print("Memory contents:")
print(history["history"])
print("\n‚úÖ Balances detail (recent) with efficiency (summary)")

---

## Example 8: Persistent Memory with SQLite

Store conversation history in database:

In [None]:
from langchain_community.chat_message_histories import SQLChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables.history import RunnableWithMessageHistory

# Create prompt and chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])

chain = prompt | ChatOpenAI(model="gpt-4") | StrOutputParser()

# Function to get SQL-backed message history
def get_session_history(session_id: str):
    return SQLChatMessageHistory(
        session_id=session_id,
        connection_string="sqlite:///chat_history.db"
    )

# Wrap with history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history"
)

# Conversation (persisted to SQLite)
config = {"configurable": {"session_id": "user_123"}}

response1 = chain_with_history.invoke(
    {"input": "My favorite food is pizza"},
    config=config
)
print(f"User: My favorite food is pizza")
print(f"AI: {response1}\n")

# Later conversation (history loaded from DB)
response2 = chain_with_history.invoke(
    {"input": "What's my favorite food?"},
    config=config
)
print(f"User: What's my favorite food?")
print(f"AI: {response2}")
print("\n‚úÖ Conversation persisted to SQLite database")

---

## Example 9: Custom Memory - Token Counting

Track token usage and trim when needed:

In [None]:
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
import tiktoken

class TokenLimitedMemory:
    """Memory that enforces token limit."""
    
    def __init__(self, max_tokens: int = 1000):
        self.messages = []
        self.max_tokens = max_tokens
        self.encoding = tiktoken.encoding_for_model("gpt-4")
    
    def count_tokens(self, messages):
        """Count tokens in messages."""
        total = 0
        for msg in messages:
            total += len(self.encoding.encode(msg.content))
        return total
    
    def add_message(self, role: str, content: str):
        """Add message and trim if needed."""
        if role == "human":
            self.messages.append(HumanMessage(content=content))
        else:
            self.messages.append(AIMessage(content=content))
        
        # Trim old messages if over limit
        while self.count_tokens(self.messages) > self.max_tokens and len(self.messages) > 1:
            self.messages.pop(0)
    
    def get_messages(self):
        return self.messages

# Use custom memory
memory = TokenLimitedMemory(max_tokens=200)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | ChatOpenAI(model="gpt-4") | StrOutputParser()

def chat(user_input: str) -> str:
    history = memory.get_messages()
    response = chain.invoke({"history": history, "input": user_input})
    memory.add_message("human", user_input)
    memory.add_message("ai", response)
    return response

# Test
print(f"Max tokens: {memory.max_tokens}\n")

response = chat("Tell me a short story about a robot")
print(f"Tokens after message 1: {memory.count_tokens(memory.get_messages())}")

response = chat("Make it longer")
print(f"Tokens after message 2: {memory.count_tokens(memory.get_messages())}")

print(f"\nMessages in memory: {len(memory.get_messages())}")

---

## Example 10: Memory with Agents

Add memory to agents for context-aware tool use:

In [None]:
from langchain import hub
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Create tool
@tool
def get_user_preference(preference_type: str) -> str:
    """Get user's preference (mock data)."""
    prefs = {
        "color": "blue",
        "food": "pizza",
        "language": "Python"
    }
    return prefs.get(preference_type.lower(), "Unknown")

tools = [get_user_preference]

# Create agent
prompt = hub.pull("hwchase17/openai-functions-agent")
llm = ChatOpenAI(model="gpt-4", temperature=0)
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

# Add memory
message_history = InMemoryChatMessageHistory()

agent_with_memory = RunnableWithMessageHistory(
    agent_executor,
    lambda session_id: message_history,
    input_messages_key="input",
    history_messages_key="chat_history"
)

config = {"configurable": {"session_id": "test"}}

# Conversation
print("User: What's my favorite color?")
result = agent_with_memory.invoke({"input": "What's my favorite color?"}, config=config)
print(f"AI: {result['output']}\n")

print("User: What about food?")
result = agent_with_memory.invoke({"input": "What about food?"}, config=config)
print(f"AI: {result['output']}")

---

## Memory Type Comparison

### ConversationBufferMemory
- **Stores**: All messages
- **Pros**: Complete context, simple
- **Cons**: Grows unbounded, expensive
- **Use for**: Short conversations

### ConversationBufferWindowMemory
- **Stores**: Last N messages
- **Pros**: Fixed size, predictable cost
- **Cons**: Loses old context
- **Use for**: Long conversations with recent context

### ConversationSummaryMemory
- **Stores**: Summary of all messages
- **Pros**: Constant size, retains key info
- **Cons**: Loses details, extra LLM calls
- **Use for**: Very long conversations

### ConversationSummaryBufferMemory
- **Stores**: Recent messages + summary of old
- **Pros**: Best of both worlds
- **Cons**: More complex
- **Use for**: Production chatbots

---

## Best Practices

### ‚úÖ Do

1. **Use RunnableWithMessageHistory** (modern approach)
2. **Set token limits** (prevent context overflow)
3. **Use session IDs** (separate user conversations)
4. **Persist to database** for production (SQLite, Redis, Postgres)
5. **Choose appropriate memory type** based on use case
6. **Monitor memory size** (track tokens/messages)
7. **Implement cleanup** (delete old sessions)

### ‚ùå Don't

1. **Don't use unbounded memory** (will hit context limits)
2. **Don't share sessions** across users (privacy issue)
3. **Don't store sensitive data** without encryption
4. **Don't ignore token costs** (memory increases costs)
5. **Don't forget to clear memory** when appropriate
6. **Don't use memory for stateless APIs** (unnecessary overhead)

---

## Production Memory Patterns

### Pattern 1: Redis for Distributed Systems

```python
from langchain_community.chat_message_histories import RedisChatMessageHistory

def get_session_history(session_id: str):
    return RedisChatMessageHistory(
        session_id=session_id,
        url="redis://localhost:6379",
        ttl=3600  # Auto-expire after 1 hour
    )
```

### Pattern 2: PostgreSQL for Analytics

```python
from langchain_community.chat_message_histories import PostgresChatMessageHistory

def get_session_history(session_id: str):
    return PostgresChatMessageHistory(
        session_id=session_id,
        connection_string="postgresql://user:pass@localhost/dbname",
        table_name="chat_history"
    )
```

### Pattern 3: Hybrid Memory Strategy

```python
# Recent messages in Redis (fast)
# Older messages in Postgres (cheap)
# Summaries for very old conversations
```

---

## Practice Exercises

In [None]:
# Exercise 1: Build a chatbot with memory that tracks user preferences
# Store: name, favorite_color, favorite_food
# Answer questions about stored preferences

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Your code here:
# Create a chatbot that remembers user preferences
# ...

In [None]:
# Exercise 2: Implement a custom memory that stores only Q&A pairs
# Ignore conversational fluff, keep only questions and answers

# Your code here:
# Create custom memory class
# ...

In [None]:
# Exercise 3: Build a multi-user chatbot with SQLite persistence
# Each user has separate conversation history
# History persists across sessions

# Your code here:
# Use SQLChatMessageHistory
# Test with multiple users
# ...

---

## Key Takeaways

### ‚úÖ What We Learned

1. **Memory Types**: Buffer, Window, Summary, SummaryBuffer
2. **Modern Approach**: RunnableWithMessageHistory (recommended)
3. **Session Management**: Separate conversations per user
4. **Persistence**: SQLite, Redis, PostgreSQL
5. **Token Management**: Monitor and limit memory size
6. **Custom Memory**: Build domain-specific memory logic
7. **Agent Memory**: Add context to tool-using agents
8. **Production Patterns**: Distributed, persistent, hybrid strategies

### üìä Memory Selection Guide

| Use Case | Memory Type | Storage |
|----------|-------------|--------|
| Short chat (<10 messages) | ConversationBuffer | In-memory |
| Medium chat (10-50 messages) | ConversationWindow | In-memory/Redis |
| Long chat (50+ messages) | ConversationSummaryBuffer | Redis/Postgres |
| Production chatbot | ConversationSummaryBuffer | Redis + Postgres |
| Analytics required | ConversationBuffer | Postgres |

### üìö Next Steps

- Combine memory with RAG for document-aware conversations
- Implement memory pruning and cleanup strategies
- Build multi-agent systems with shared memory
- Monitor memory performance and costs

---

## Resources

- [Memory Documentation](https://python.langchain.com/docs/modules/memory/)
- [Chat Message Histories](https://python.langchain.com/docs/integrations/memory/)
- [RunnableWithMessageHistory](https://python.langchain.com/docs/expression_language/how_to/message_history)
- [Memory Types](https://python.langchain.com/docs/modules/memory/types/)

---

**Congratulations!** You've completed the LangChain tutorial series. You now know:
- ‚úÖ LangChain basics and models
- ‚úÖ Prompt engineering
- ‚úÖ LCEL for chain composition
- ‚úÖ RAG for document Q&A
- ‚úÖ Agents for tool-using systems
- ‚úÖ Memory for conversational AI

**Next**: Build production LLM applications! üöÄ