# Workshop 6: Conversational Memory for Analytics Code Agents

**Today's Goal**: Make our agent conversational.

Real data analysis is iterative. You filter data, inspect results, ask follow-ups, refine queries. Let's see what happens when we try multi-turn conversations with our Workshop 5 agent...

## Setup (Same as Workshop 5)

In [1]:
# Install dependencies
!pip install -q openai pandas matplotlib

In [3]:
import pandas as pd
import os
import re
import sys
from io import StringIO
from openai import OpenAI
from typing import Dict, List, Any, Optional, Callable, Type, Union
from pydantic import BaseModel
import inspect

# Initialize OpenAI client
openai_client = OpenAI()

def generate(
    prompt: str,
    temperature: float = 0,
    response_format: Optional[Type[BaseModel]] = None,
    model: str = "gpt-4o-mini"
) -> Union[str, BaseModel]:
    """
    Generate text using OpenAI's API with optional structured output

    Args:
        prompt: The input prompt for generation
        temperature: Sampling temperature (0-2), default 0
        response_format: Optional Pydantic model class for structured output
        model: The model to use, default "gpt-4o-mini"

    Returns:
        Either a string (regular generation) or a Pydantic model instance (structured output)
    """
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]

    if response_format is not None:
        # Use structured output with Pydantic model
        response = openai_client.beta.chat.completions.parse(
            model=model,
            messages=messages,
            temperature=temperature,
            response_format=response_format
        )
        return response.choices[0].message.parsed
    else:
        # Regular text generation
        response = openai_client.chat.completions.create(
            model=model,
            messages=messages,
            temperature=temperature
        )
        return response.choices[0].message.content.strip()

In [4]:
# Load Pokemon dataset
df = pd.read_csv("./data/pokemon.csv")

# Clean column names
df.columns = df.columns.str.replace(' ', '_').str.replace('.', '').str.lower()

print(f"✅ Loaded {len(df)} Pokemon with {len(df.columns)} columns")
print(f"Columns: {list(df.columns[:8])}...")
df.head(3)

✅ Loaded 800 Pokemon with 13 columns
Columns: ['#', 'name', 'type_1', 'type_2', 'total', 'hp', 'attack', 'defense']...


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation,legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False


## Recap: Workshop 5 CodeAgent

Let's bring in the CodeAgent from Workshop 5 exactly as we left it.

### Tools (V3 from Workshop 5)

In [10]:
def show_info_v3(data=None):
    """Show dataset structure (columns, types, non-null counts)"""
    target = data if data is not None else df
    buffer = StringIO()
    target.info(buf=buffer)
    response = buffer.getvalue()
    print(response)  # Print so agent can see in observation
    return response

def show_data_v3(data=None, n=5, sort_by=None, ascending=True):
    """Show first n rows, optionally sorted"""
    target = data if data is not None else df
    if sort_by:
        target = target.sort_values(sort_by, ascending=ascending)
    result = target.head(n)
    print(result)  # Print so agent can see the data
    return result

def filter_rows_v3(condition: str, data=None):
    """Filter rows using pandas query syntax"""
    target = data if data is not None else df
    result = target.query(condition)
    print(f"✅ Filtered to {len(result)} rows")  # Print count so agent sees result
    return result

def calculate_statistics_v3(data, column: str, stat_type: str):
    """Calculate statistics. stat_type: mean, median, max, min, sum, count, std"""
    stat_functions = {
        'mean': data[column].mean,
        'median': data[column].median,
        'max': data[column].max,
        'min': data[column].min,
        'sum': data[column].sum,
        'count': data[column].count,
        'std': data[column].std,
    }
    result = stat_functions[stat_type]()
    print(f"{stat_type}({column}) = {result}")  # Print so agent sees the value
    return result

def aggregate_by_v3(data, group_by: str, agg_col: str, agg_func: str):
    """Group and aggregate. agg_func: mean, sum, count, min, max, median"""
    result = data.groupby(group_by)[agg_col].agg(agg_func).reset_index()
    print(result)  # Print so agent sees the aggregated data
    return result

code_agent_tools = {
    'show_info_v3': show_info_v3,
    'show_data_v3': show_data_v3,
    'filter_rows_v3': filter_rows_v3,
    'calculate_statistics_v3': calculate_statistics_v3,
    'aggregate_by_v3': aggregate_by_v3,
}

### Pydantic Models for ReAct-Style Code Generation

In [11]:
class CodeResponse(BaseModel):
    """📋 Structured response for ReAct-style code generation"""
    thought: str  # Agent's reasoning about what to do
    code: str  # Pure Python code as string
    is_final_answer: bool = False  # True when agent has enough observations to answer

### Updated Safe Code Executor
Our executor maintains a **persistent namespace** across multiple `execute()` calls:

```python
# Within a single query (ReAct loop)
Step 1: fire_pokemon = filter_rows_v3(...)  # Creates variable
Step 2: show_data_v3(data=fire_pokemon)     # Variable exists!

# Across conversation turns
Turn 1: fire_pokemon = filter_rows_v3(...)  # Creates variable
Turn 2: calculate_statistics_v3(data=fire_pokemon, ...)  # Still exists!
```

**Why?** This enables:
1. **Within query**: ReAct loop can build on previous steps
2. **Across queries**: Conversation can reference variables from earlier turns

**When to reset:**
```python
# Start new conversation - reset both memory AND executor
memory = SimpleMemory()
executor.reset()  # Fresh namespace

# Turn 1, 2, 3... - 
code_agent("filter...")  # Variables persist
code_agent("analyze...")  # Can use previous variables
```

In [12]:
class SimpleSafeExecutor:
    """🛡️ Execute code in a restricted environment with persistent namespace"""
    
    def __init__(self, tools: Dict[str, Callable]):
        self.tools = tools
        # Minimal safe builtins
        self.safe_builtins = {
            'print': print,
            'len': len,
            'range': range,
            'str': str,
            'int': int,
            'float': float,
            'list': list,
            'dict': dict,
            'True': True,
            'False': False,
            'None': None,
        }
        self.namespace = {}  # Will be initialized in reset()
        self.reset()  # Initialize fresh namespace
    
    def execute(self, code: str, df: pd.DataFrame) -> tuple[bool, str]:
        """Execute code and return (success, output)"""
        output_buffer = StringIO()
        original_stdout = sys.stdout
        
        try:
            # Redirect stdout to capture prints
            sys.stdout = output_buffer
            
            # Add/update df in namespace (in case it was modified)
            self.namespace['df'] = df
            
            # Execute code in PERSISTENT namespace
            # Variables created in one execute() call will exist in the next!
            exec(code, self.namespace)
            
            # Get captured output
            output = output_buffer.getvalue()
            return True, output if output else "✅ Executed successfully"
            
        except Exception as e:
            error_output = output_buffer.getvalue()
            return False, (error_output + f"❌ Error: {type(e).__name__}: {str(e)}")
            
        finally:
            # Restore stdout
            sys.stdout = original_stdout
            output_buffer.close()
    
    def reset(self):
        """Reset namespace to clean state (call between different queries)"""
        self.namespace = {
            '__builtins__': self.safe_builtins,
            **self.tools,
        }

# Create executor
executor = SimpleSafeExecutor(code_agent_tools)

### Helper: Generate Tool Descriptions

In [13]:
def generate_code_tool_descriptions(tools: Dict[str, Callable]) -> str:
    """Generate tool descriptions for CodeAgent."""
    descriptions = []
    for name, func in tools.items():
        doc = (func.__doc__ or "No description").strip()
        sig = inspect.signature(func)
        descriptions.append(f"{name}{sig}: {doc}")
    return "\n".join(descriptions)

### Baseline CodeAgent (ReAct-Style)

**What changed from Workshop 5?**

Our CodeAgent now follows the **ReAct loop** (Reasoning and Acting):

```
┌─────────────────────────────────────────┐
│              User Query                  │
└─────────────┬───────────────────────────┘
              ▼
        ┌─────────────┐
        │   THOUGHT   │ ← "What do I know? What do I need?"
        └─────┬───────┘
              ▼
        ┌─────────────┐
        │     CODE    │ ← "Execute code to gather info"
        └─────┬───────┘
              ▼
        ┌─────────────┐
        │ OBSERVATION │ ← "What did I learn?"
        └─────┬───────┘
              ▼
    ┌───────────────────┐
    │ Enough info?      │
    │ No → Loop back    │
    │ Yes → Final code  │
    └───────────────────┘
```

**Key differences**:
- Agent **reads execution results** (observations) and decides next step
- Multi-step reasoning (inspect data → filter → analyze)
- Self-correcting (can fix errors based on observations)
- Explicit "I'm done" signal (`is_final_answer=True`)

In [14]:
def code_agent(query: str, max_steps: int = 15, verbose: bool = True) -> str:
    """
    ReAct-style CodeAgent that loops: Thought → Code → Observation → repeat.
    
    Args:
        query: User's question
        max_steps: Maximum reasoning steps
        verbose: Print step details
        
    Returns:
        Final answer string
        
    Note: Call executor.reset() manually before starting a new conversation session!
    """
    history = []  # List of {thought, code, observation}
    
    for step in range(1, max_steps + 1):
        if verbose:
            print(f"\n{'═'*70}")
            print(f"🔄 STEP {step}")
            print('═'*70)
        
        history_context = ""
        if history:
            history_context = "\n\nPrevious steps:\n"
            
            for i, h in enumerate(history, 1):
                history_context += f"Step {i}:\n"
                history_context += f"  Thought: {h['thought']}\n"
                history_context += f"  Code: {h['code']}\n"
                history_context += f"  Observation: {h['observation']}\n\n" 
        
        prompt = f"""You are a Python coding agent that writes code to answer data questions.

Query: {query}

Available tools (all return data, use print for feedback):
{generate_code_tool_descriptions(code_agent_tools)}

Global variables:
- df: Pokemon dataset (pandas DataFrame)

{history_context}

Instructions:
1. Think step-by-step. What do you know from previous observations? What do you need to find out?
2. **Variables persist across steps** - eg. if you created some bariable in Step 2, you can use it in Step 3!
3. **Don't repeat the same action unless required** - review previous steps carefully.
4. If you've gathered enough information to answer, set is_final_answer=True and provide the answer.
5. Otherwise, write code to gather MORE information (use tools to inspect/analyze).
6. Write Python code using ONLY the provided tools and safe builtins.
7. Use print() aggressively to check intermediate results.
8. Do NOT import anything or call pandas methods directly on df.

Respond with:
- thought: Your reasoning about what to do next (reference what you learned!)
- code: Python code to execute
- is_final_answer: True if you can answer now, False otherwise

If is_final_answer=True, your code should print a summary starting with "FINAL ANSWER:"."""
        
        # Get response from LLM with structured output
        response = generate(prompt, temperature=0.1, response_format=CodeResponse)
        
        if verbose:
            print(f"\n💭 THOUGHT: {response.thought}")
        
        # Check if agent is done
        if response.is_final_answer:
            if verbose:
                print(f"\n✅ AGENT FINISHED! Executing final code...\n")
            
            # Execute final code
            success, output = executor.execute(response.code, df)
            
            if verbose:
                print(f"{'═'*70}")
                print(f"📝 FINAL CODE:")
                print('═'*70)
                print(response.code)
                print('═'*70)
                print(f"\n📊 FINAL ANSWER:")
                print(output)
            
            return output
        
        # Not done - execute code to gather observation
        if verbose:
            print(f"\n⚙️  EXECUTING CODE:")
            print('─'*70)
            print(response.code)
            print('─'*70)
        
        success, output = executor.execute(response.code, df)
        
        if verbose:
            preview = output[:300] + "..." if len(output) > 300 else output
            print(f"\n📊 OBSERVATION:")
            print(preview)
        
        if not success:
            # Error case - add to history and let agent try to fix
            history.append({
                "thought": response.thought,
                "code": response.code,
                "observation": f"❌ Error: {output}"
            })
            if verbose:
                print(f"\n⚠️  Error occurred. Agent will try to recover...")
        else:
            # Success - add to history
            history.append({
                "thought": response.thought,
                "code": response.code,
                "observation": output
            })
    
    # Max steps reached
    return f"⚠️ Reached maximum steps ({max_steps}). Last observation: {history[-1]['observation'][:200] if history else 'None'})"

## Part 0: The Memory Problem

Let's try multi-turn conversations and see what happens...

### Demo: Single Query

In [16]:
print("🎯 QUERY: What is the average attack for Fire-type Pokemon?\n")
result = code_agent("What is the average attack for Fire-type Pokemon?")
print("\n" + "="*70)

🎯 QUERY: What is the average attack for Fire-type Pokemon?


══════════════════════════════════════════════════════════════════════
🔄 STEP 1
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to find the average attack for Fire-type Pokemon. First, I should filter the dataset to get only the Fire-type Pokemon. Then, I can calculate the mean of their attack values. I will start by checking the structure of the dataset to confirm the column names and types.

⚙️  EXECUTING CODE:
──────────────────────────────────────────────────────────────────────
show_info_v3()
──────────────────────────────────────────────────────────────────────

📊 OBSERVATION:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   name        800 non-null    object
 2   type_1      800 non-null    ob...

══

### Demo: Follow-up Query

In [18]:
print("🎯 TURN 1: Filter Fire types\n")
result1 = code_agent("Filter the dataset to Fire-type Pokemon")

print("\n" + "="*70)
print("🎯 TURN 2: Follow-up (uses 'that'!)\n")
print("="*70)
result2 = code_agent("What's the average attack for that filtered data?")

# ❌ The agent has no idea what "that filtered data" means!

🎯 TURN 1: Filter Fire types


══════════════════════════════════════════════════════════════════════
🔄 STEP 1
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to filter the dataset to get only Fire-type Pokemon. I will use the filter_rows_v3 function with the appropriate condition to achieve this. First, I will check the structure of the dataset to confirm the column names and types, especially to ensure that there is a column that indicates the type of Pokemon.

⚙️  EXECUTING CODE:
──────────────────────────────────────────────────────────────────────
show_info_v3(data=df)
──────────────────────────────────────────────────────────────────────

📊 OBSERVATION:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   name        800 non-null    object
 2   type_1      800 non-n

**What went wrong?**

The agent can't see:
- Previous conversation turns
- What variables were created  
- What operations were just performed


**The root cause**: Each query goes to the LLM in isolation. No access to conversation history.

**Why this breaks real workflows**:
- Can't use pronouns ("that", "it", "those results")
- Can't build iteratively (filter → inspect → refine)
- Must repeat context in every query
- Loses variables created in previous turns

---

**Let's try fixing this.**

What if we show the agent previous conversation turns? Let's experiment...

## Part 1a: First Attempt - Adding Chat History

**Idea**: What if we show the agent previous conversation turns?

Let's try storing each turn (user query + agent code + result) and injecting recent turns into the next prompt.

**The plan**:
1. Store conversation turns in a list
2. When processing a new query, show the agent the last few turns
3. Agent can reference previous context

Let's build it...

### Implementation: SimpleMemory (Version 1 - Buffer Only)

For now, we'll just implement a simple buffer that stores recent messages. We'll add more sophistication if we run into problems.

In [22]:
class SimpleMemory:
    """Simple buffer memory - stores recent conversation turns."""
    
    def __init__(self):
        self.messages = []  # List of conversation turns
    
    def add_turn(self, user_query: str, generated_code: str, execution_result: str):
        """Add a conversation turn to memory."""
        self.messages.append({"role": "user", "content": user_query})
        self.messages.append({"role": "assistant", "content": f"Code:\n{generated_code}\n\nResult:\n{execution_result}"})
    
    def get_history_str(self, k: int = 4) -> str:
        """Format recent K messages for injection into prompt."""
        recent = self.messages[-k:] if self.messages else []
        if not recent:
            return "(no previous conversation)"
        
        lines = []
        for msg in recent:
            if msg["role"] == "user":
                lines.append(f"User: {msg['content']}")
            else:
                lines.append(f"Assistant: {msg['content']}...")
        return "\n".join(lines)
    
    def show(self):
        """Visualize memory state (for transparency)."""
        print("=" * 60)
        print("AGENT MEMORY STATE")
        print("=" * 60)
        print(f"\n💬 Messages in buffer: {len(self.messages)}")
        for msg in self.messages:
            role_emoji = "👤" if msg["role"] == "user" else "🤖"
            content = msg["content"][:500] + "..." if len(msg["content"]) > 500 else msg["content"]
            print(f"  {role_emoji} {content}")
        print("=" * 60)
        print()


### CodeAgent with Memory

In [23]:
def code_agent_with_memory(
    query: str,
    memory: SimpleMemory,
    max_steps: int = 15,
    verbose: bool = True
) -> str:
    """
    ReAct-style CodeAgent with chat history.
    
    Note: Variables persist across conversation turns!
    Call executor.reset() before starting a new conversation.
    """
    step_history = []  # Current query's steps: {thought, code, observation}
    
    for step in range(1, max_steps + 1):
        if verbose:
            print(f"\n{'═'*70}")
            print(f"🔄 STEP {step}")
            print('═'*70)
        
        # Build step history context (tiered)
        step_context = ""
        if step_history:
            step_context = "\n\nSteps taken so far:\n"
            
            for i, h in enumerate(step_history):
                step_context += f"Step {i}:\n"
                step_context += f"  Thought: {h['thought']}\n"
                step_context += f"  Observation: {h['observation']}\n\n"
            
        # Build prompt WITH CONVERSATION HISTORY
        conversation_history = memory.get_history_str(k=4)
        
        prompt = f"""You are a Python coding agent that writes code to answer data questions.

Query: {query}

Available tools (all return data, use print for feedback):
{generate_code_tool_descriptions(code_agent_tools)}

Global variables:
- df: Pokemon dataset (pandas DataFrame)

Previous conversation:
{conversation_history}

{step_context}

Instructions:
1. Think step-by-step. Build on the previous conversation when relevant.
2. **Variables persist across conversation turns** - i.e. if a variable was created in a previous conversation turn, you can use it now!
3. If you've gathered enough information to answer, set is_final_answer=True.
4. Otherwise, write code to gather more information.
5. Write Python code using ONLY the provided tools and safe builtins.
6. Use print() to show intermediate results.
7. Do NOT import anything or call pandas methods directly on df.

Respond with:
- thought: Your reasoning about what to do next
- code: Python code to execute
- is_final_answer: True if you can answer now, False otherwise

If is_final_answer=True, your code should print a summary starting with "FINAL ANSWER:"."""
        
        # Get response from LLM
        response = generate(prompt, temperature=0.1, response_format=CodeResponse)
        
        if verbose:
            print(f"\n💭 THOUGHT: {response.thought}")
        
        # Check if done
        if response.is_final_answer:
            if verbose:
                print(f"\n✅ AGENT FINISHED! Executing final code...\n")
            
            success, output = executor.execute(response.code, df)
            
            if verbose:
                print(f"{'═'*70}")
                print(f"📝 FINAL CODE:")
                print('═'*70)
                print(response.code)
                print('═'*70)
                print(f"\n📊 FINAL ANSWER:")
                print(output)
            
            # Save the full interaction to conversation memory
            full_code = "\n\n".join([h['code'] for h in step_history] + [response.code])
            full_observations = "\n\n".join([h['observation'] for h in step_history])
            memory.add_turn(query, full_code, f"Steps:\n{full_observations}\n\nFinal:\n{output}")
            
            return output
        
        # Execute code
        if verbose:
            print(f"\n⚙️  EXECUTING CODE:")
            print('─'*70)
            print(response.code)
            print('─'*70)
        
        success, output = executor.execute(response.code, df)
        
        if verbose:
            preview = output[:300] + "..." if len(output) > 300 else output
            print(f"\n📊 OBSERVATION:")
            print(preview)
        
        step_history.append({
            "thought": response.thought,
            "code": response.code,
            "observation": output if success else f"❌ Error: {output}"
        })
        
        if not success and verbose:
            print(f"\n⚠️  Error occurred. Agent will try to recover...")
    
    # Max steps reached
    result = f"⚠️ Reached maximum steps ({max_steps}). Last observation: {step_history[-1]['observation'][:200] if step_history else 'None'}"
    full_code = "\n\n".join([h['code'] for h in step_history])
    memory.add_turn(query, full_code, result)
    return result


### Demo: Memory in Action

In [24]:
# Fresh memory
memory = SimpleMemory()

print("="*70)
print("TURN 1: Filter Fire types")
print("="*70)
code_agent_with_memory("Filter to Fire-type Pokemon", memory)

print("\n" + "="*70)
print("MEMORY STATE AFTER TURN 1")
print("="*70)
memory.show()

TURN 1: Filter Fire types

══════════════════════════════════════════════════════════════════════
🔄 STEP 1
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to filter the dataset to only include Fire-type Pokemon. I will use the filter_rows_v3 function to achieve this by specifying the condition for the 'type' column.

⚙️  EXECUTING CODE:
──────────────────────────────────────────────────────────────────────
filtered_fire_pokemon = filter_rows_v3("type == 'Fire'", data=df)
──────────────────────────────────────────────────────────────────────

📊 OBSERVATION:
❌ Error: UndefinedVariableError: name 'type' is not defined

⚠️  Error occurred. Agent will try to recover...

══════════════════════════════════════════════════════════════════════
🔄 STEP 2
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to filter the dataset to include only Fire-type Pokemon. The error I encountered suggests that I need to use the c

In [25]:
print("="*70)
print("TURN 2: Follow-up with pronoun")
print("="*70)
code_agent_with_memory("Show me the first 5 rows of that", memory)

print("\n" + "="*70)
print("MEMORY STATE AFTER TURN 2")
print("="*70)
memory.show()

TURN 2: Follow-up with pronoun

══════════════════════════════════════════════════════════════════════
🔄 STEP 1
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to show the first 5 rows of the filtered Fire-type Pokémon dataset. I will use the `show_data_v3` function to display these rows.

✅ AGENT FINISHED! Executing final code...

══════════════════════════════════════════════════════════════════════
📝 FINAL CODE:
══════════════════════════════════════════════════════════════════════
show_data_v3(data=filtered_fire_pokemon, n=5)
══════════════════════════════════════════════════════════════════════

📊 FINAL ANSWER:
   #                       name type_1  type_2  total  hp  attack  defense  \
4  4                 Charmander   Fire     NaN    309  39      52       43   
5  5                 Charmeleon   Fire     NaN    405  58      64       58   
6  6                  Charizard   Fire  Flying    534  78      84       78   
7  6  CharizardMega Ch

In [26]:
print("="*70)
print("TURN 3: Another follow-up")
print("="*70)
code_agent_with_memory("How many Fire types are there?", memory)

memory.show()

TURN 3: Another follow-up

══════════════════════════════════════════════════════════════════════
🔄 STEP 1
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to count the number of Fire-type Pokémon in the filtered dataset. Since I have already filtered the Fire-type Pokémon into the variable 'filtered_fire_pokemon', I can use the count function to get the number of rows in this DataFrame. This will give me the total number of Fire-type Pokémon.

✅ AGENT FINISHED! Executing final code...

══════════════════════════════════════════════════════════════════════
📝 FINAL CODE:
══════════════════════════════════════════════════════════════════════
fire_type_count = len(filtered_fire_pokemon)
print(f'Total number of Fire-type Pokémon: {fire_type_count}')
══════════════════════════════════════════════════════════════════════

📊 FINAL ANSWER:
Total number of Fire-type Pokémon: 52

AGENT MEMORY STATE

💬 Messages in buffer: 6
  👤 Filter to Fire-type Pokemon


**It works! 🎉**

The agent can now:
- ✅ Reference "that" from previous turns  
- ✅ Build on previous queries
- ✅ Use variables created earlier (fire_pokemon persists!)
- ✅ Have natural, iterative conversations

### What We Just Built: Buffer Memory

This is called **buffer memory** (or **short-term memory**):
- Stores recent conversation turns
- Injects them into the next prompt
- Enables context awareness from recent conversation

**Now let's try a longer conversation and see what happens...**

## Part 1b: The Scalability Problem

Buffer memory works great for a few turns. But what about longer conversations?

Let's try a 4-turn conversation and watch what happens to our token count...

In [27]:
# Let's build up a 4-turn conversation and watch the buffer grow
memory = SimpleMemory()
executor.reset()

queries = [
    "Filter to Fire-type Pokemon",
    "What's the average attack?",
    "Show top 5 by HP",
    "Now filter to Water-type Pokemon"
]

for i, query in enumerate(queries, 1):
    print(f"\n{'='*70}")
    print(f"TURN {i}: {query}")
    print(f"  Buffer size BEFORE: {len(memory.messages)} messages")
    print('='*70)
    code_agent_with_memory(query, memory, max_steps=3, verbose=False)
    print(f"\n  Buffer size AFTER: {len(memory.messages)} messages")
    print(f"  📈 Growing by 2 each turn (user + assistant)\n")

print("\n" + "="*70)
print("FINAL MEMORY STATE - BUFFER KEEPS GROWING!")
print("="*70)
memory.show()


TURN 1: Filter to Fire-type Pokemon
  Buffer size BEFORE: 0 messages

  Buffer size AFTER: 2 messages
  📈 Growing by 2 each turn (user + assistant)


TURN 2: What's the average attack?
  Buffer size BEFORE: 2 messages

  Buffer size AFTER: 4 messages
  📈 Growing by 2 each turn (user + assistant)


TURN 3: Show top 5 by HP
  Buffer size BEFORE: 4 messages

  Buffer size AFTER: 6 messages
  📈 Growing by 2 each turn (user + assistant)


TURN 4: Now filter to Water-type Pokemon
  Buffer size BEFORE: 6 messages

  Buffer size AFTER: 8 messages
  📈 Growing by 2 each turn (user + assistant)


FINAL MEMORY STATE - BUFFER KEEPS GROWING!
AGENT MEMORY STATE

💬 Messages in buffer: 8
  👤 Filter to Fire-type Pokemon
  🤖 Code:
filtered_fire_pokemon = filter_rows_v3("type == 'Fire'", data=df)

filter_rows_v3("type == 'Fire'", data=df)

filter_rows_v3('type == "Fire"', data=df)

Result:
⚠️ Reached maximum steps (3). Last observation: ❌ Error: ❌ Error: UndefinedVariableError: name 'type' is not defined

**We have a problem.** 🚨

The buffer grows unbounded:
- Turn 1: 2 messages (~200 tokens)
- Turn 2: 4 messages (~400 tokens)
- Turn 3: 6 messages (~600 tokens)
- Turn 4: 8 messages (~800 tokens)

**The math**:
```
Turn 10:  20 messages → ~2,000 tokens in memory alone!
Turn 20:  40 messages → ~4,000 tokens
Turn 50: 100 messages → ~10,000 tokens
```

LLMs have large context windows (128k tokens), but:
- 💰 **Cost**: You pay per token (input + output)
- 🐌 **Speed**: More tokens = slower responses
- 📏 **Limits**: Eventually you hit the max

**We need a better solution.**

What if we could keep the benefits of buffer memory (context awareness) without the unbounded growth?

Let's try something...

### Implementation: SummaryMemory (Version 2 - Compression)

Our buffer memory solved resolving context from past conversation, but it **grew unbounded** and became expensive to send back to the LLM.  

**Idea**: Compress older turns into a succinct summary while preserving the most recent messages verbatim.  
This mimics how humans remember conversations: recent exchanges are clear, whereas older topics are condensed into high‑level summaries.

We'll implement a `SummaryMemory` class that:

- Tracks full messages like `SimpleMemory`.
- When the buffer exceeds a threshold (e.g., 6 messages ≈ 3 turns), it compresses the oldest messages into a single summary.
- Stores that summary separately and removes the original detailed messages, keeping the latest few messages intact.
- Injects the summary (if present) followed by the recent messages into the prompt.

This reduces prompt length while retaining important context.


In [31]:

class SummaryMemory(SimpleMemory):
    """Summary memory with compression of older turns.

    - Keeps a buffer of recent messages (user/assistant) at full fidelity.
    - Compresses older messages into a running summary string.
    - Summary is included in the prompt before recent messages.
    """
    def __init__(self, max_messages: int = 6):
        super().__init__()
        self.max_messages = max_messages  # threshold for compression
        self.summary = ""  # accumulated summary of older messages

    def _compress(self):
        """Compress old messages into summary if buffer exceeds max_messages."""
        if len(self.messages) > self.max_messages:
            # compress all but the last four messages
            num_to_compress = len(self.messages) - 4
            old_messages = self.messages[:num_to_compress]
            # Build summary from user queries
            summary_points = []
            for msg in old_messages:
                if msg["role"] == "user":
                    content = msg["content"]
                    trimmed = content[:60].strip()
                    summary_points.append(trimmed)
            if summary_points:
                compressed = "; ".join(summary_points)
                if self.summary:
                    self.summary += "; " + compressed
                else:
                    self.summary = compressed
            # Remove compressed messages from buffer
            self.messages = self.messages[num_to_compress:]

    def add_turn(self, user_query: str, generated_code: str, execution_result: str):
        super().add_turn(user_query, generated_code, execution_result)
        self._compress()

    def get_history_str(self, k: int = 4) -> str:
        parts = []
        if self.summary:
            parts.append(f"Summary: {self.summary}")
        recent = self.messages[-k:] if self.messages else []
        for msg in recent:
            if msg["role"] == "user":
                parts.append(f"User: {msg['content']}")
            else:
                parts.append(f"Assistant: {msg['content'][:200]}...")
        return "\n".join(parts) if parts else "(no previous conversation)"

    def show(self):
        print("=" * 60)
        print("AGENT MEMORY STATE (SummaryMemory)")
        print("=" * 60)
        if self.summary:
            print(f"\n📝 Summary of older messages:\n{self.summary}\n")
        print(f"💬 Messages in buffer: {len(self.messages)}")
        for msg in self.messages:
            role_emoji = "👤" if msg["role"] == "user" else "🤖"
            content = msg["content"]
            if len(content) > 500:
                content = content[:500] + "..."
            print(f"  {role_emoji} {content}")
        print("=" * 60)
        print()


### Demo: Compression in Action

Let's see how `SummaryMemory` behaves on a longer conversation.  
We'll run six turns of queries. After the third turn the memory compresses older messages, keeping only a summary plus the most recent messages.


In [33]:

# Fresh summary memory
memory = SummaryMemory(max_messages=6)
executor.reset()

queries = [
    "Filter to Fire-type Pokemon",
    "What's the average attack for Fire types?",
    "Show top 3 by HP",
    "Now filter to Water-type Pokemon",
    "Show me the first 5 rows of that",
    "What's the average speed?",
    "What's the highest speed?"
]

for i, q in enumerate(queries, 1):
    print(f"\n{'='*70}")
    print(f"TURN {i}: {q}")
    code_agent_with_memory(q, memory, max_steps=3, verbose=False)
    print(f"📦 Memory after turn {i}:")
    memory.show()



TURN 1: Filter to Fire-type Pokemon
📦 Memory after turn 1:
AGENT MEMORY STATE (SummaryMemory)
💬 Messages in buffer: 2
  👤 Filter to Fire-type Pokemon
  🤖 Code:
filtered_fire_pokemon = filter_rows_v3("type == 'Fire'", data=df)

show_info_v3()

filter_rows_v3("type_1 == 'Fire'", data=df)

Result:
⚠️ Reached maximum steps (3). Last observation: ✅ Filtered to 52 rows



TURN 2: What's the average attack for Fire types?
📦 Memory after turn 2:
AGENT MEMORY STATE (SummaryMemory)
💬 Messages in buffer: 4
  👤 Filter to Fire-type Pokemon
  🤖 Code:
filtered_fire_pokemon = filter_rows_v3("type == 'Fire'", data=df)

show_info_v3()

filter_rows_v3("type_1 == 'Fire'", data=df)

Result:
⚠️ Reached maximum steps (3). Last observation: ✅ Filtered to 52 rows

  👤 What's the average attack for Fire types?
  🤖 Code:
average_attack_fire = calculate_statistics_v3(data=filtered_fire_pokemon, column='attack', stat_type='mean')
print('Average Attack for Fire types:', average_attack_fire)

Result:
Steps:


Final

After six turns, notice that the memory no longer contains all previous messages.  
Instead, older turns have been compressed into a succinct summary, while the most recent exchanges remain in full detail.  
This pattern scales much better than a simple buffer: it caps the number of tokens sent to the LLM while preserving critical information.

**SummaryMemory vs. BufferMemory**

| Memory type | Growth | Pros | Cons |
|-------------|-------|------|------|
| **Buffer** (SimpleMemory) | Linear with number of turns | Exact recall of recent dialogue | Unbounded growth, high cost |
| **Summary** (SummaryMemory) | Bounded | Retains context via summaries | Loses fine details of older turns |

In practice, you might combine both approaches: keep a short buffer (e.g., last 2–3 turns) and summarize everything before that.


## Part 2: The Bigger Picture - Long-Term Memory

**What we've built**: Short-term memory (buffer + compression) for single-session conversations.

**But what about multi-session conversations?**

Imagine a data analyst who uses our agent every day:
- Monday: Analyzes Q3 sales
- Tuesday: Analyzes Q4 sales  
- Wednesday: "Compare to what I did Monday"

Our current memory resets each session. For this use case, we'd need **long-term memory** that persists across sessions.

**We won't build this** (requires building storage and retrieval mechanisms), but let's understand the landscape so you know what exists in production systems...

## Three Types of Long-Term Memory

Production AI systems use three types of long-term memory, inspired by human cognitive architecture:

### Episodic Memory (Past Experiences)

**What**: Memory of specific events with timestamps and context - like a diary of past interactions

**Human analogy**: "I remember last Tuesday I wore a blue shirt and had lunch at that Italian place"

**Agent example**: 
- "Last Tuesday you analyzed Fire Pokemon and found 52 rows with average attack of 84.7"
- "In our previous session (2024-01-15), you created a variable called `sales_2023` with 10,000 rows"
- "Three days ago you asked about legendary Pokemon and filtered to generation 1


### Semantic Memory (General Knowledge)

**What**: Facts and concepts, not tied to specific events - like general knowledge you've learned

**Agent example**:
- "User prefers concise answers without excessive explanations"
- "The Pokemon dataset has 800 rows, 13 columns, types include Fire/Water/Grass/etc."
- "Always use `show_info_v3` before filtering unknown data to avoid column name errors"
- "User's company fiscal year starts in July"

**Key characteristics**:
- **Timeless**: Not tied to when you learned it
- **Factual**: Statements of truth, preferences, domain knowledge
- **Generalizable**: Applies across many situations


### Procedural Memory (How-To Knowledge)

**What**: Rules, procedures, learned behaviors - like muscle memory or "knowing how" to do something

**Key characteristics**:
- **Action-oriented**: How to do things, not what happened or what's true
- **Conditional**: "When X, do Y" patterns
- **Learned behaviors**: Improve through experience

**Implementation approaches**:
1. **System prompts**: Add learned patterns to prompt ("Always do X before Y")
2. **Tool descriptions**: Encode procedures in tool docstrings
3. **Few-shot examples**: Show successful workflows as examples
4. **Reflection**: Agent reviews past failures, generates rules

**We actually use this!** Our tool descriptions are procedural memory:
```python
def filter_rows_v3(condition: str, data=None):
    """Filter rows using pandas query syntax"""  # ← Procedural memory!
    # Agent learns HOW to use this tool from the description
    # "use query syntax" → agent knows to write 'type_1 == "Fire"' not df[df['type_1'] == 'Fire']
```


### Memory Type Summary

```
MEMORY TYPES TAXONOMY
│
├─ SHORT-TERM (single session) ← WE BUILT THIS ✅
│  ├─ Buffer memory: Recent turns in full detail
│  └─ Compression: Old turns summarized by LLM
│
└─ LONG-TERM (cross-session) ← Production patterns
   ├─ Episodic: Specific past experiences with timestamps
   ├─ Semantic: Facts, preferences, domain knowledge
   └─ Procedural: Rules, workflows, learned behaviors
```

### When to Use Which Memory Type

**Single-session analytics** (our use case):
- ✅ Buffer + Compression (SimpleMemory)
- ❌ Don't need: Vector stores, external databases

**Multi-session chatbot**:
- ✅ Buffer for current session
- ✅ Vector store for past sessions (RAG)
- ✅ Entity memory for user preferences

**Production data agent**:
- ✅ Buffer + Compression
- ✅ Semantic memory (dataset schemas, user preferences)
- ✅ Procedural memory (learned query patterns)

**Personalized assistant**:
- ✅ All types! Episodic + Semantic + Procedural
- Vector stores for rich retrieval

### What Makes This Work

**Without Memory + Without ReAct** (Early Workshop 5):
- ❌ Single-shot code generation
- ❌ Each query must be self-contained
- ❌ Can't reference "that" or "it"
- ❌ No multi-step reasoning

**With ReAct, Without Memory** (Workshop 5 ReAct baseline):
- ✅ Multi-step reasoning within a single query
- ✅ Self-correcting based on observations
- ❌ Can't build on previous conversations
- ❌ No memory across queries

**With ReAct + Memory** (Workshop 6 - what we just built!):
- ✅ Multi-step reasoning per query (ReAct loop)
- ✅ Natural follow-ups across queries (chat history)
- ✅ Automatic compression for scalability
- ✅ Iterative refinement
- ✅ Feels like pair programming

### What We Built

```python
class SimpleMemory:
    messages: list           # Chat history (recent turns)
    compressed_history: str  # Compressed old turns
    compression_threshold: int  # When to compress
    
    # ~80 lines total (including compression logic)!
```

**Built on top of Workshop 5's CodeAgent** - but now with **ReAct loop** AND **memory with compression**:
- Same tools, same executor
- **NEW**: Multi-step reasoning (Thought → Code → Observation → repeat)
- **NEW**: Agent reads execution results and decides next steps
- **NEW**: Chat history for conversation continuity
- **NEW**: Automatic compression for scalability!

### Memory Makes Agents Conversational

**Three Key Insights**:

1. **Multiple memory types exist**
   - Short-term: Buffer + Compression (what we built)
   - Long-term: Episodic + Semantic + Procedural (production patterns)

2. **Start simple**
   - Don't need vector databases for single-session analytics
   - Buffer + compression handles 90% of use cases
   - Complexity should match need

3. **Patterns transfer**
   - LangGraph uses checkpointers (same concept, database-backed)
   - ADK uses session state (same concept, cloud-native)
   - Agno uses AgentMemory (same concept, self-managed)
   - You understand the foundation → can learn any framework

### The Core Pattern

**Agent Memory = Store → Retrieve → Inject**

Whether you're using LangChain, LangGraph, ADK, or building from scratch:

1. **Store**: Save conversation turns, facts, or experiences
2. **Retrieve**: Get relevant memory based on current query
3. **Inject**: Add memory to the prompt before sending to LLM

This mental model maps to any framework!

### Workshop 7 Preview

**Next Week**: 

- We'll see the full agent in action, including visualizations
- See implementation of the same agent in an Agent framework

### Some resources

**Production Memory Libraries**:
- **Zep**: Dedicated memory database for agents (episodic + semantic)
- **Mem0**: Cross-session user personalization
- **Cognee**: Document knowledge graphs