# 🎯 Introduction

LangChain memory systems provide different strategies for managing conversation history in AI applications. Each memory type addresses specific challenges related to context retention, memory efficiency, and conversation management.

# 📋 Memory Architecture Overview

## Core Concepts

- **Memory Buffer**: Storage mechanism for conversation messages  
- **Message History**: Sequence of user and AI interactions  
- **Memory Variables**: Processed memory content for prompt templates  
- **Context Window**: Active memory content sent to language models  

## Memory System Layers

- **Storage Layer**: Where messages are physically stored  
- **Strategy Layer**: How messages are retrieved and managed  
- **Processing Layer**: How memory is formatted for consumption  

# 🧠 ConversationBufferMemory

## Theory

`ConversationBufferMemory` implements a simple linear storage approach where all conversation messages are retained in chronological order. This memory type prioritizes complete context preservation over memory efficiency.

### Key Principles:

- **Complete Retention**: Every message is preserved  
- **Linear Access**: Messages stored in chronological sequence  
- **No Filtering**: All messages available for retrieval  
- **Unbounded Growth**: Memory size increases with conversation length  

## Use Cases

- Short to medium-length conversations  
- Applications requiring complete conversation history  
- Debugging and conversation analysis  
- Prototyping and development environments  

## Implementation

In [1]:
from langchain.memory import ConversationBufferMemory

# Create memory instance
memory = ConversationBufferMemory()

# Add messages manually
memory.chat_memory.add_user_message("Hi, I'm Sourav from Databricks")
memory.chat_memory.add_ai_message("Hello Sourav! Nice to meet you.")
memory.chat_memory.add_user_message("I work with Apache Spark")
memory.chat_memory.add_ai_message("That's great! Spark is powerful for big data.")
memory.chat_memory.add_user_message("What's my name?")

# Check what's stored
print("=== All Messages in Buffer ===")
for i, message in enumerate(memory.chat_memory.messages):
    print(f"{i+1}. {message.type}: {message.content}")

print(f"\nTotal messages: {len(memory.chat_memory.messages)}")

# Get memory as variables (what would be passed to prompt)
memory_vars = memory.load_memory_variables({})
print(f"\nMemory as string:\n{memory_vars['history']}")

# Get buffer content directly
print(f"\nDirect buffer access:\n{memory.buffer}")


=== All Messages in Buffer ===
1. human: Hi, I'm Sourav from Databricks
2. ai: Hello Sourav! Nice to meet you.
3. human: I work with Apache Spark
4. ai: That's great! Spark is powerful for big data.
5. human: What's my name?

Total messages: 5

Memory as string:
Human: Hi, I'm Sourav from Databricks
AI: Hello Sourav! Nice to meet you.
Human: I work with Apache Spark
AI: That's great! Spark is powerful for big data.
Human: What's my name?

Direct buffer access:
Human: Hi, I'm Sourav from Databricks
AI: Hello Sourav! Nice to meet you.
Human: I work with Apache Spark
AI: That's great! Spark is powerful for big data.
Human: What's my name?


  memory = ConversationBufferMemory()


## ✅ Advantages & ❌ Limitations

### Advantages:

- ✅ **Complete Context**: No information loss  
- ✅ **Simple Implementation**: Straightforward to use and understand  
- ✅ **Full History Access**: All messages available for analysis  
- ✅ **Deterministic Behavior**: Predictable memory retrieval  

### Limitations:

- ❌ **Memory Growth**: Unbounded memory consumption  
- ❌ **Performance Degradation**: Slower with long conversations  
- ❌ **Token Limitations**: May exceed LLM context limits  
- ❌ **Cost Implications**: More tokens sent to LLM  

# 🪟 ConversationBufferWindowMemory

## Theory

`ConversationBufferWindowMemory` implements a sliding window approach that maintains only the most recent **K** messages. This strategy balances context preservation with memory efficiency by using a fixed-size circular buffer.

### Key Principles:

- **Fixed Window Size**: Maintains exactly K messages  
- **Recency Bias**: Prioritizes recent interactions  
- **Automatic Eviction**: Oldest messages automatically removed  
- **Bounded Memory**: Predictable memory consumption  

### Mathematical Model:

> Let `M = [m₁, m₂, ..., mₙ]` represent the message sequence.  
> The memory buffer retains the last **K** messages:  
> `Window(M, K) = [mₙ₋ₖ₊₁, ..., mₙ]`, where `n` is the total number of messages.

## 📌 Use Cases

- Long-running conversations  
- Resource-constrained environments  
- Applications where recent context is most important  
- Customer support chatbots  
- Real-time interactive systems  

## Implementation

In [8]:
from langchain.memory import ConversationBufferWindowMemory

# Create memory with window size = 4 messages
memory = ConversationBufferWindowMemory(k=4)

# Add many messages
messages = [
    ("user", "Message 1: Hi, I'm Sourav"),
    ("ai", "Response 1: Hello Sourav!"),
    ("user", "Message 2: I work at Databricks"),
    ("ai", "Response 2: Great company!"),
    ("user", "Message 3: I use Apache Spark"),
    ("ai", "Response 3: Spark is powerful!"),
    ("user", "Message 4: I optimize pipelines"),
    ("ai", "Response 4: Performance is key!"),
    ("user", "Message 5: What's my name?"),  # This will push out Message 1
]

for msg_type, content in messages:
    if msg_type == "user":
        memory.chat_memory.add_user_message(content)
    else:
        memory.chat_memory.add_ai_message(content)

print("=== Window Memory (k=4) ===")
print(f"Window size (k): {memory.k}")
print(f"Messages in Total Memory: {len(memory.chat_memory.messages)}")

# Show what's currently in the window
for i, message in enumerate(memory.chat_memory.messages):
    print(f"{i+1}. {message.type}: {message.content}")

# Get memory variables
memory_vars = memory.load_memory_variables({})
print(f"\nMemory output (only window):\n{memory_vars['history']}")

# Demonstrate sliding window by adding more messages
print("\n=== Adding More Messages ===")
memory.chat_memory.add_user_message("Message 6: Do you remember Message 1?")
memory.chat_memory.add_ai_message("Response 6: I only remember recent messages")

print(f"Messages after adding more: {len(memory.chat_memory.messages)}")
for i, message in enumerate(memory.chat_memory.messages):
    print(f"{i+1}. {message.type}: {message.content}")


=== Window Memory (k=4) ===
Window size (k): 4
Messages in window: 9
1. human: Message 1: Hi, I'm Sourav
2. ai: Response 1: Hello Sourav!
3. human: Message 2: I work at Databricks
4. ai: Response 2: Great company!
5. human: Message 3: I use Apache Spark
6. ai: Response 3: Spark is powerful!
7. human: Message 4: I optimize pipelines
8. ai: Response 4: Performance is key!
9. human: Message 5: What's my name?

Memory output (only window):
AI: Response 1: Hello Sourav!
Human: Message 2: I work at Databricks
AI: Response 2: Great company!
Human: Message 3: I use Apache Spark
AI: Response 3: Spark is powerful!
Human: Message 4: I optimize pipelines
AI: Response 4: Performance is key!
Human: Message 5: What's my name?

=== Adding More Messages ===
Messages after adding more: 11
1. human: Message 1: Hi, I'm Sourav
2. ai: Response 1: Hello Sourav!
3. human: Message 2: I work at Databricks
4. ai: Response 2: Great company!
5. human: Message 3: I use Apache Spark
6. ai: Response 3: Spark is power

  memory = ConversationBufferWindowMemory(k=4)


## 📏 Window Size Selection

### Factors to Consider:

- **Conversation Complexity**: More complex topics need larger windows  
- **Memory Constraints**: Available system memory  
- **Response Quality**: Balance between context and focus  
- **Performance Requirements**: Larger windows = more processing  

### Recommended Window Sizes:

- **Small (2–4 messages)**: Simple Q&A, quick interactions  
- **Medium (6–10 messages)**: Standard conversations, customer support  
- **Large (12–20 messages)**: Complex discussions, technical support  

---

## ✅ Advantages & ❌ Limitations

### Advantages:

- ✅ **Bounded Memory**: Predictable memory usage  
- ✅ **Recent Focus**: Maintains relevant recent context  
- ✅ **Performance**: Consistent performance regardless of conversation length  
- ✅ **Automatic Management**: No manual cleanup required  

### Limitations:

- ❌ **Context Loss**: Early conversation information lost  
- ❌ **Fixed Strategy**: Cannot adapt window size dynamically  
- ❌ **Important Information Loss**: Critical early context may be evicted  
- ❌ **Reference Failures**: Cannot reference old information  

# 📝 ConversationSummaryMemory

## Theory

`ConversationSummaryMemory` implements an intelligent compression strategy that maintains conversation context through summarization rather than raw message storage. This approach uses natural language processing (typically LLMs) to distill the essence of the conversation while preserving important information.

### Key Principles:

- **Lossy Compression**: Reduces memory footprint through summarization  
- **Context Preservation**: Maintains semantic meaning  
- **Adaptive Length**: Summary size remains relatively stable  
- **Intelligent Processing**: Uses LLM for content analysis  

## 📊 Information Theory Perspective

- **Raw Messages**: High redundancy, complete information  
- **Summary**: Low redundancy, essential information  
- **Compression Ratio**: `Original_Size / Summary_Size`  
- **Information Loss**: Details sacrificed for efficiency  

---

## 📌 Use Cases

- Very long conversations  
- Memory-constrained environments  
- Applications requiring historical context without details  
- Knowledge-intensive conversations  
- Multi-session continuity  

---

## 🛠️ Implementation

> Typically uses an LLM or summarization model to periodically update the summary as the conversation progresses.  
> Summarization can be triggered based on message count, time, or interaction type.

In [12]:
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI

# NOTE: ConversationSummaryMemory needs an LLM ONLY for summarization
# The memory functionality itself is separate from conversation chains
llm_for_summary = ChatOpenAI(model_name='gpt-4o-mini', temperature=0)

# Create summary memory
memory = ConversationSummaryMemory(return_messages=True, llm=llm_for_summary)

# Add messages (these will be summarized)
conversation_history = [
    ("user", "Hi, I'm Sourav Banerjee from Databricks"),
    ("ai", "Hello Sourav! Nice to meet you. How can I help?"),
    ("user", "I work on Apache Spark optimization and big data pipelines"),
    ("ai", "That's fascinating! Spark optimization is crucial for performance."),
    ("user", "I also work with Delta Lake and MLflow"),
    ("ai", "Great tools! Delta Lake provides ACID transactions for data lakes."),
    ("user", "I'm particularly interested in performance tuning"),
    ("ai", "Performance tuning involves query optimization and resource management."),
]

# Add all messages
for msg_type, content in conversation_history:
    if msg_type == "user":
        memory.chat_memory.add_user_message(content)
    else:
        memory.chat_memory.add_ai_message(content)

print("=== Original Messages ===")
for i, message in enumerate(memory.chat_memory.messages):
    print(f"{i+1}. {message.type}: {message.content}")

print(f"\nTotal original messages: {len(memory.chat_memory.messages)}")

# Get the summary (this is where LLM is used - ONLY for summarization)
memory_vars = memory.load_memory_variables({})
summary = memory_vars['history']

print(f"\n=== Generated Summary ===")
print(f"Summary: {summary}")

# Check the buffer (summary storage)
print(f"\n=== Buffer Content ===")
print(f"Buffer: {memory.buffer}")

# Add new messages after summarization
print(f"\n=== Adding New Messages ===")
memory.chat_memory.add_user_message("What did we discuss about my work?")
memory.chat_memory.add_ai_message("We discussed your work with Spark optimization at Databricks.")

# Get updated memory (summary + new messages)
updated_memory = memory.load_memory_variables({})
print(f"Updated memory:\n{updated_memory['history']}")


=== Original Messages ===
1. human: Hi, I'm Sourav Banerjee from Databricks
2. ai: Hello Sourav! Nice to meet you. How can I help?
3. human: I work on Apache Spark optimization and big data pipelines
4. ai: That's fascinating! Spark optimization is crucial for performance.
5. human: I also work with Delta Lake and MLflow
6. ai: Great tools! Delta Lake provides ACID transactions for data lakes.
7. human: I'm particularly interested in performance tuning
8. ai: Performance tuning involves query optimization and resource management.

Total original messages: 8

=== Generated Summary ===
Summary: [SystemMessage(content='', additional_kwargs={}, response_metadata={})]

=== Buffer Content ===
Buffer: 

=== Adding New Messages ===
Updated memory:
[SystemMessage(content='', additional_kwargs={}, response_metadata={})]
