# 🤖 Workshop 5: ReAct and Code Agents

---

## 🎯 Today's Agenda

Build **multi-step reasoning agents** that can break down complex problems and chain operations.

### What We'll Build

1. 🧠 **ReAct Agent** - Multi-step reasoning with our tools
2. 💾 **Artifact Store** - Tracking intermediate results
3. 💻 **CodeAgent** - Generates code that chains operations
4. 🔒 **Safe Execution** - Secure code execution environment

### Why This Matters

> Single-step agents can't solve complex queries. We need agents that can **think**, **act**, and **learn** from their actions to tackle real-world problems.

---

In [1]:
# Install dependencies
!pip install openai pandas matplotlib



---

## 📦 Part 0: Setup and Recap

Last workshop we built 9 tools but only used them **one at a time**. Today we'll make our agent **smarter**.

### Core Utilities

In [2]:
# Core imports
import pandas as pd
import os
from openai import OpenAI
import matplotlib.pyplot as plt
from typing import TypedDict, List, Callable, Dict, Any, Optional, Type, Union
from pydantic import BaseModel
import inspect
import sys
from io import StringIO

# Initialize OpenAI client
openai_client = OpenAI()

def generate(
    prompt: str,
    temperature: float = 0,
    response_format: Optional[Type[BaseModel]] = None,
    model: str = "gpt-4o-mini"
) -> Union[str, BaseModel]:
    """
    🎨 Generate text using OpenAI's API with optional structured output

    Args:
        prompt: The input prompt for generation
        temperature: Sampling temperature (0-2), default 0
        response_format: Optional Pydantic model class for structured output
        model: The model to use, default "gpt-4o-mini"

    Returns:
        Either a string (regular generation) or a Pydantic model instance (structured output)
    """
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]

    if response_format is not None:
        # Use structured output with Pydantic model
        response = openai_client.beta.chat.completions.parse(
            model=model,
            messages=messages,
            temperature=temperature,
            response_format=response_format
        )
        return response.choices[0].message.parsed
    else:
        # Regular text generation
        response = openai_client.chat.completions.create(
            model=model,
            messages=messages,
            temperature=temperature
        )
        return response.choices[0].message.content.strip()

print("✅ Core utilities loaded")

✅ Core utilities loaded


### Load Dataset

In [3]:
# Load Pokemon dataset
df = pd.read_csv('data/pokemon.csv')
df.columns = (df.columns
              .str.replace(' ', '_', regex=False)
              .str.replace('.', '', regex=False)
              .str.lower())

print(f"📊 Dataset loaded: {df.shape[0]} Pokemon, {df.shape[1]} columns")
print(f"📋 Columns: {', '.join(df.columns.tolist())}")
df.head()

📊 Dataset loaded: 800 Pokemon, 13 columns
📋 Columns: #, name, type_1, type_2, total, hp, attack, defense, sp_atk, sp_def, speed, generation, legendary


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation,legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


### 🔧 Tools from Workshop 4

We built 9 tools last week. Let's import them:

In [14]:
# ═══════════════════════════════════════════════════════════
# EXPLORATION TOOLS
# ═══════════════════════════════════════════════════════════

def load_csv(filepath: str):
    """📂 Load CSV from a given filepath into global dataframe."""
    global df
    df = pd.read_csv(filepath)
    df.columns = (df.columns
      .str.replace(' ', '_', regex=False)
      .str.replace('.', '', regex=False)
      .str.lower())
    return f"✅ Loaded {filepath}: {df.shape[0]} rows, {df.shape[1]} columns"

def show_info():
    """ℹ️ Show dataframe structure: columns, types, missing values."""
    import io
    buffer = io.StringIO()
    df.info(buf=buffer)
    return buffer.getvalue()

def show_data(n: int = 5, sort_by: str = None, ascending: bool = True):
    """👀 Show first n rows of dataframe, optionally sorted."""
    display_df = df
    if sort_by:
        display_df = df.sort_values(sort_by, ascending=ascending)
    return display_df.head(n).to_string()

# ═══════════════════════════════════════════════════════════
# ANALYSIS TOOLS
# ═══════════════════════════════════════════════════════════

def filter_rows(condition: str):
    """🔍 Filter dataframe rows using pandas query syntax. The condition string is the query expression for Pandas df.query"""
    result = df.query(condition)
    return f"Found {len(result)} Pokemon:\n{result.to_string()}"

def calculate_statistics(column: str, stat_type: str):
    """📊 Calculate statistics on a column."""
    stat_functions = {
        'mean': df[column].mean,
        'median': df[column].median,
        'max': df[column].max,
        'min': df[column].min,
        'sum': df[column].sum,
        'count': df[column].count,
        'std': df[column].std
    }
    result = stat_functions[stat_type]()
    return f"{stat_type.capitalize()} of {column}: {result:.2f}"

def aggregate_by(group_by: str, agg_col: str, agg_func: str):
    """📈 Group data and calculate aggregate statistics."""
    result = df.groupby(group_by)[agg_col].agg(agg_func)
    return f"{agg_func.capitalize()} of {agg_col} by {group_by}:\n{result.to_string()}"

# ═══════════════════════════════════════════════════════════
# VISUALIZATION TOOLS
# ═══════════════════════════════════════════════════════════

def create_bar_chart(category_col: str, value_col: str = None, 
                     aggregation: str = 'count', title: str = None):
    """📊 Create a bar chart comparing categories."""
    plt.figure(figsize=(10, 6))
    if aggregation == 'count':
        data = df[category_col].value_counts().sort_index()
        ylabel = 'Count'
    else:
        agg_funcs = {'mean': 'mean', 'sum': 'sum', 'max': 'max', 'min': 'min', 'median': 'median'}
        data = df.groupby(category_col)[value_col].agg(agg_funcs[aggregation])
        ylabel = f'{aggregation.capitalize()} of {value_col}'
    
    data.plot(kind='bar')
    plt.title(title or f'{ylabel} by {category_col}')
    plt.xlabel(category_col)
    plt.ylabel(ylabel)
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
    return f"✅ Bar chart created: {ylabel} by {category_col}"

def create_scatter_plot(x_col: str, y_col: str, color_by: str = None):
    """📈 Create a scatter plot to show relationships."""
    plt.figure(figsize=(10, 6))
    if color_by and color_by in df.columns:
        for category in df[color_by].unique():
            mask = df[color_by] == category
            plt.scatter(df[mask][x_col], df[mask][y_col], label=category, alpha=0.6)
        plt.legend()
    else:
        plt.scatter(df[x_col], df[y_col], alpha=0.6)
    plt.xlabel(x_col)
    plt.ylabel(y_col)
    plt.title(f'{y_col} vs {x_col}')
    plt.tight_layout()
    plt.show()
    return f"✅ Scatter plot created: {y_col} vs {x_col}"

def create_histogram(column: str, bins: int = 20):
    """📊 Create a histogram showing distribution."""
    plt.figure(figsize=(10, 6))
    plt.hist(df[column].dropna(), bins=bins, edgecolor='black', alpha=0.7)
    plt.xlabel(column)
    plt.ylabel('Frequency')
    plt.title(f'Distribution of {column}')
    plt.tight_layout()
    plt.show()
    return f"✅ Histogram created for {column}"

# ═══════════════════════════════════════════════════════════
# ALL TOOLS
# ═══════════════════════════════════════════════════════════

all_tools = [
    load_csv, show_info, show_data,
    filter_rows, calculate_statistics, aggregate_by,
    create_bar_chart, create_scatter_plot, create_histogram
]

print(f"✅ Loaded {len(all_tools)} tools from Workshop 4")

✅ Loaded 9 tools from Workshop 4


### 🤖 The Single-Step Agent (from Workshop 4)

Our current agent can only pick **ONE** tool at a time:

In [15]:
def generate_tool_descriptions(tools: List[Callable]) -> str:
    """Generate tool descriptions from function introspection."""
    descriptions = []
    for func in tools:
        sig = inspect.signature(func)
        params = []
        for name, param in sig.parameters.items():
            param_type = param.annotation.__name__ if param.annotation != inspect.Parameter.empty else "any"
            default = f" (default: {param.default})" if param.default != inspect.Parameter.empty else ""
            params.append(f"  - {name}: {param_type}{default}")

        doc_lines = (func.__doc__ or "No description").strip().split('\n')
        short_desc = doc_lines[0]

        desc = f"""{func.__name__}: {short_desc}
Parameters:
{chr(10).join(params) if params else '  None'}"""
        descriptions.append(desc)

    return "\n\n".join(descriptions)

# Create tools dictionary
tools_dict_v1 = {func.__name__: func for func in all_tools}

def single_step_agent(query: str):
    """🤖 Agent that picks ONE tool."""
    prompt = f"""You are a data analysis assistant.

Dataset: Pokemon with {len(df)} rows
Columns: {df.columns.tolist()}

Available tools:
{generate_tool_descriptions(all_tools)}

User query: "{query}"

Choose the ONE tool that best answers this query.
Respond with ONLY the tool call, no explanation.

Format: tool_name(param="value")

Your response:"""

    response = generate(prompt).strip()
    print(f"🤖 Agent chose: {response}\n")

    try:
        result = eval(response, {"__builtins__": {}}, tools_dict_v1)
        return result
    except Exception as e:
        return f"Error: {e}"

# Test it
print("🧪 Testing single-step agent:")
print(single_step_agent("What's the average attack of all Pokemon?"))

🧪 Testing single-step agent:
🤖 Agent chose: calculate_statistics(column="attack", stat_type="mean")

Mean of attack: 79.00


### ⚠️ The Problem

**What if we need multiple steps?**

Query: *"Which Pokemon type has the highest average attack?"*

- **Step 1:** Group by type and calculate average
- **Step 2:** Find the maximum

> ❌ Single-step agent can only do step 1!

---

## 🧠 Part 1: Building a ReAct Agent

Let's add multi-step reasoning using the **ReAct pattern**:

```
Loop:
  1. 💭 Thought: "What should I do next?"
  2. ⚙️  Action: Call a tool
  3. 📊 Observation: See the result
  4. 🔄 Repeat until done
```

### Step 1: Define Types

We'll use **Pydantic models** for structured data throughout:

In [16]:
# ═══════════════════════════════════════════════════════════
# PYDANTIC MODELS FOR AGENT HISTORY
# ═══════════════════════════════════════════════════════════

class ThoughtStep(BaseModel):
    """💭 What the agent is thinking"""
    reasoning: str

class ActionStep(BaseModel):
    """⚙️ What tool to call"""
    tool_call: str

class ObservationStep(BaseModel):
    """📊 What the tool returned"""
    result: str
    success: bool

class HistoryEntry(BaseModel):
    """🔄 One complete step in the ReAct loop"""
    step_number: int
    thought: ThoughtStep
    action: ActionStep
    observation: ObservationStep

# ═══════════════════════════════════════════════════════════
# FOR AGENT REASONING (STRUCTURED LLM OUTPUT)
# ═══════════════════════════════════════════════════════════

class ReActResponse(BaseModel):
    """📋 Structured response from ReAct agent"""
    thought: str
    action: str  # Tool call OR empty string
    is_final_answer: bool  # True when ready to answer
    needs_user_input: bool  # True when asking user for clarification

### Step 2: Why Structured Outputs?

#### ❌ Old Way (String Parsing)
```python
response = "THOUGHT: I need to filter\nACTION: filter_rows(...)"
# Parse with split(), regex, etc. - fragile!
if "ANSWER" in response:  # String checking - error-prone!
```

#### ✅ New Way (Structured Outputs with Pydantic)
```python
response = generate(prompt, response_format=ReActResponse)
response.thought  # ✅ Guaranteed to exist
response.action   # ✅ Type-safe access
if response.is_final_answer:  # ✅ Boolean check - clean!
```

#### 🎯 Benefits

| Feature | String Parsing | Structured Outputs |
|---------|----------------|--------------------|
| Markdown issues | ❌ Fragile | ✅ No issues |
| Invalid formats | ❌ Possible | ✅ Validated |
| Type safety | ❌ Manual | ✅ Automatic |
| Control flow | ❌ String checks | ✅ Boolean flags |
| Code cleanliness | ❌ Complex parsing | ✅ Simple access |

---

Now let's build the ReAct agent:

In [17]:
def build_react_prompt(query: str, tools: List[Callable], history: List[HistoryEntry]) -> str:
    """🎨 Build prompt for ReAct agent with history."""

    # Format history for prompt
    history_text = ""
    if history:
        history_text = "\n\nPrevious steps:\n"
        for entry in history:
            history_text += f"""
Step {entry.step_number}:
Thought: {entry.thought.reasoning}
Action: {entry.action.tool_call}
Observation: {entry.observation.result}
"""

    # Check if data is loaded
    dataset_status = f"Current dataset: Loaded ({len(df)} rows, {len(df.columns)} columns)" if len(df) > 0 else "No dataset loaded"

    prompt = f"""You are a data analysis agent using the ReAct pattern.

Query: {query}

{dataset_status}

Available tools:
{generate_tool_descriptions(tools)}
{history_text}

CORE PRINCIPLE: Only use information you explicitly have.

Tool outputs provide:
- message: natural language summary
- artifact (optional): handle metadata with an `id` you can reuse (e.g., df_0)

When you need a prior result, pass the artifact id via the tool's `df_id` (or similar) parameter.

If you're missing information, ask the user.
If you can answer from existing observations, deliver the final answer instead of calling another tool.

Your response JSON:
- thought: reasoning about the next step
- action: tool call with exact parameter values, or empty string if not calling a tool
- is_final_answer: true if you can answer the user's query, otherwise false
- needs_user_input: true if you need to ask the user for missing information, otherwise false
"""

    return prompt


def synthesize_final_answer(query: str, history: List[HistoryEntry], final_thought: str) -> str:
    """🎯 Synthesize a user-facing answer from the agent's observations."""

    observations_text = ""
    for entry in history:
        observations_text += f"\nStep {entry.step_number}: {entry.action.tool_call}\nResult: {entry.observation.result}\n"

    prompt = f"""You are synthesizing a final answer for a user's data analysis query.

Original query: "{query}"

Agent's final reasoning: {final_thought}

Observations from tool executions:
{observations_text if observations_text else "No tool executions"}

Instructions:
- Provide a clear, direct answer to the user's query
- Base your answer on the observations
- Use natural language, not technical jargon
- Be concise but complete

Final answer:"""

    return generate(prompt, temperature=0.3)


def react_agent(query: str, max_steps: int = 10, verbose: bool = True, tools: Optional[List[Callable]] = None) -> str:
    """🤖 Multi-step ReAct agent using structured outputs."""
    active_tools = tools or all_tools
    tools_dict = {func.__name__: func for func in active_tools}
    history: List[HistoryEntry] = []

    for step in range(1, max_steps + 1):
        if verbose:
            print(f"\n{'═' * 70}")
            print(f"🔄 STEP {step}")
            print('═' * 70)

        prompt = build_react_prompt(query, active_tools, history)
        response = generate(prompt, response_format=ReActResponse)

        if verbose:
            print(f"\n💭 THOUGHT: {response.thought}\n")

        if response.needs_user_input:
            if verbose:
                print("\n❓ AGENT NEEDS USER INPUT")
            return response.thought

        if response.is_final_answer:
            if verbose:
                print("\n✅ AGENT FINISHED! Synthesizing answer from observations...")
            return synthesize_final_answer(query, history, response.thought)

        if verbose:
            print(f"⚙️  ACTION: {response.action}")
            print(f"\n⚙️  EXECUTING: {response.action}")

        try:
            safe_globals = {"__builtins__": {}}
            safe_globals.update(tools_dict)
            evaluation = eval(response.action, safe_globals, {})
            if callable(evaluation):
                evaluation = evaluation()
            observation = ObservationStep(result=str(evaluation), success=True)

            if verbose:
                preview = str(evaluation)
                print(f"\n📊 TOOL RESULT:\n{preview[:300]}..." if len(preview) > 300 else f"\n📊 TOOL RESULT:\n{preview}")
        except Exception as e:
            observation = ObservationStep(result=f"Error: {e}", success=False)
            if verbose:
                print(f"\n❌ TOOL ERROR: {e}")

        history.append(HistoryEntry(
            step_number=step,
            thought=ThoughtStep(reasoning=response.thought),
            action=ActionStep(tool_call=response.action),
            observation=observation,
        ))

    final_thought = "I reached the maximum number of steps. Based on what I've learned:"
    return synthesize_final_answer(query, history, final_thought)

### 🧪 Test the ReAct Agent

In [18]:
query = "Which Pokemon type has the highest average attack?"

print(f"🎯 QUERY: {query}\n")
answer = react_agent(query, max_steps=15)

print(f"\n\n{'═'*70}")
print(f"🎉 FINAL ANSWER: {answer}")
print('═'*70)

🎯 QUERY: Which Pokemon type has the highest average attack?


══════════════════════════════════════════════════════════════════════
🔄 STEP 1
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: To find out which Pokemon type has the highest average attack, I need to group the dataset by the Pokemon type and calculate the average attack for each type. Then, I can identify the type with the highest average attack.

⚙️  ACTION: aggregate_by(group_by='type', agg_col='attack', agg_func='mean')

⚙️  EXECUTING: aggregate_by(group_by='type', agg_col='attack', agg_func='mean')

❌ TOOL ERROR: 'type'

══════════════════════════════════════════════════════════════════════
🔄 STEP 2
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to check the structure of the dataset to understand the column names and types, especially to find out how the Pokemon types and attack values are represented.

⚙️  ACTION: show_info()

⚙️  EXECUTING: 

### 🎉 Success!

The ReAct agent can take multiple steps:

1. **Step 1:** Groups by type and calculates average attack
2. **Step 2:** Analyzes the results and signals it's done
3. **Synthesis:** System automatically generates user-facing answer from observations

> **Key insight:** The agent focuses on *gathering information* (tool execution), then we *synthesize* a clean answer.


---

## ⚠️ Part 2: The Limitation

But wait... let's try a query that needs to **chain operations**:

In [19]:
query = "What is the average attack for Fire-type Pokemon?"

print(f"🎯 QUERY: {query}\n")
answer = react_agent(query, max_steps=10)

print(f"\n\n{'═'*70}")
print(f"🎉 FINAL ANSWER: {answer}")
print('═'*70)

🎯 QUERY: What is the average attack for Fire-type Pokemon?


══════════════════════════════════════════════════════════════════════
🔄 STEP 1
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to filter the dataset to include only Fire-type Pokemon and then calculate the average attack for that subset.

⚙️  ACTION: filter_rows('type == "Fire"')

⚙️  EXECUTING: filter_rows('type == "Fire"')

❌ TOOL ERROR: name 'type' is not defined

══════════════════════════════════════════════════════════════════════
🔄 STEP 2
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to check the structure of the dataset to find the correct column name for the Pokemon type before filtering for Fire-type Pokemon.

⚙️  ACTION: show_info()

⚙️  EXECUTING: show_info()

📊 TOOL RESULT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------

### 🔍 Let's Check the Correct Answer

In [21]:
# ✅ Correct answer
correct_answer = df[df['type_1'] == 'Fire'].attack.mean()
print(f"✅ Correct answer: {correct_answer:.2f}")

# ❌ What the agent gave us
print(f"❌ Agent's answer: 79.00")

✅ Correct answer: 84.77
❌ Agent's answer: 79.00


### 🐛 What Went Wrong?

The agent tries:

1. **Step 1:** `show_info()` → Understands column names
2. **Step 2:** `filter_rows("type_1 == 'Fire'")` → Gets **string** "Found 52 Pokemon"
3. **Step 3:** `calculate_statistics('attack', 'mean')` → But this uses the **global df**, not filtered data!

> **Problem:** The `calculate_statistics` doesn't have access to the filtered data from Step 2!

### 🤔 Why Our ReAct Agent Stalled

Our current tools return plain **strings**, so a tool call loses the actual DataFrame:

```python
def filter_rows(condition: str):
    result = df.query(condition)
    return f"Found {len(result)} Pokemon:\n{result.to_string()}"  # ❌ String only!
```

After Step 1 returns `"Found 52 Pokémon..."`, the agent has no way to pass that filtered DataFrame to Step 2.

> **Solution needed:** Give the agent a way to pass around objects (not just strings) so later steps can reuse results.

---

## 💾 Part 3: A Simple Solution

**What if we tracked objects, assigned IDs to them, and let the agent pass handles (identifiers) to tool calls?**

Let's give it a go!

### 💡 The Idea

Each tool will:

1. **Return a friendly `message`** (string) for the LLM to read
2. **Optionally attach an `artifact`** with metadata and a handle/id (e.g., `df_0`)
3. **Accept `df_id`** (or similar) when it needs to reuse a previous result

---

### 🏗️ Implementation

In [24]:
from typing import Optional, Dict, Any

# ═══════════════════════════════════════════════════════════
# ARTIFACT STORE
# ═══════════════════════════════════════════════════════════

artifact_store: Dict[str, Any] = {}
artifact_counter = 0

def _save_artifact(obj: Any, kind: str = "df", metadata: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
    """Save an artifact and return its metadata."""
    global artifact_counter
    handle = f"{kind}_{artifact_counter}"
    artifact_counter += 1
    artifact_store[handle] = obj
    meta = metadata.copy() if metadata else {}
    meta["type"] = kind
    meta["id"] = handle
    return meta

def _get_artifact(handle: str) -> Any:
    """Retrieve an artifact by handle."""
    if handle not in artifact_store:
        raise KeyError(f"Unknown artifact handle: {handle}")
    return artifact_store[handle]

def _resolve_df(df_id: Optional[str]):
    """Resolve a dataframe ID or use global df."""
    return _get_artifact(df_id) if df_id else df

def _artifact_response(message: str, artifact_meta: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
    """Create a tool response with optional artifact."""
    return {"message": message, "artifact": artifact_meta}

def reset_artifacts():
    """Clear stored artifacts (helpful between demos)."""
    global artifact_store, artifact_counter
    artifact_store = {}
    artifact_counter = 0
    return "Cleared artifact registry"

### Create v2 Tools with Artifact Support

Now let's create tools that work with artifacts:

In [25]:
# ═══════════════════════════════════════════════════════════
# TOOLS V2: WITH ARTIFACT SUPPORT
# ═══════════════════════════════════════════════════════════

def load_csv_v2(filepath: str) -> Dict[str, Any]:
    """Load CSV and return a dataframe handle + metadata."""
    global df
    df = pd.read_csv(filepath)
    df.columns = (df.columns
        .str.replace(' ', '_', regex=False)
        .str.replace('.', '', regex=False)
        .str.lower())
    meta = _save_artifact(df, metadata={
        "rows": int(df.shape[0]),
        "columns": df.columns.tolist(),
    })
    message = f"Loaded {filepath}: {df.shape[0]} rows, {df.shape[1]} columns"
    return _artifact_response(message, meta)

def show_info_v2(df_id: Optional[str] = None) -> Dict[str, Any]:
    """ℹReturn column info; accept df_id from a previous tool."""
    target = _resolve_df(df_id)
    import io
    buffer = io.StringIO()
    target.info(buf=buffer)
    meta = None
    if df_id:
        meta = {
            "type": "dataframe",
            "id": df_id,
            "rows": int(target.shape[0]),
            "columns": target.columns.tolist(),
        }
    return _artifact_response(buffer.getvalue(), meta)

def show_data_v2(n: int = 5, sort_by: Optional[str] = None, ascending: bool = True, df_id: Optional[str] = None) -> Dict[str, Any]:
    """Preview rows and hand back a new dataframe handle."""
    target = _resolve_df(df_id)
    working = target.sort_values(sort_by, ascending=ascending) if sort_by else target
    preview = working.head(n)
    meta = _save_artifact(preview, metadata={
        "rows": int(preview.shape[0]),
        "columns": preview.columns.tolist(),
        "source_df": df_id or "df_global",
    })
    message = f"Showing {n} rows" + (f" (sorted by {sort_by})" if sort_by else "")
    return _artifact_response(message, meta)

def filter_rows_v2(condition: str, df_id: Optional[str] = None) -> Dict[str, Any]:
    """🔍 Filter rows using pandas query syntax. You may pass df_id to chain."""
    target = _resolve_df(df_id)
    result = target.query(condition)
    meta = _save_artifact(result, metadata={
        "rows": int(result.shape[0]),
        "columns": result.columns.tolist(),
        "source_df": df_id or "df_global",
        "condition": condition,
    })
    return _artifact_response(f"Filtered to {len(result)} rows", meta)

def calculate_statistics_v2(column: str, stat_type: str, df_id: Optional[str] = None) -> Dict[str, Any]:
    """Compute a statistic and store the numeric result as an artifact."""
    target = _resolve_df(df_id)
    stat_functions = {
        'mean': target[column].mean,
        'median': target[column].median,
        'max': target[column].max,
        'min': target[column].min,
        'sum': target[column].sum,
        'count': target[column].count,
        'std': target[column].std,
    }
    value = float(stat_functions[stat_type]())
    meta = _save_artifact({
        "column": column,
        "stat_type": stat_type,
        "value": value,
        "source_df": df_id or "df_global",
    }, kind="statistic")
    return _artifact_response(f"{stat_type.capitalize()} of {column}: {value:.2f}", meta)

def aggregate_by_v2(group_by: str, agg_col: str, agg_func: str, df_id: Optional[str] = None) -> Dict[str, Any]:
    """Group and aggregate, returning a new dataframe handle."""
    target = _resolve_df(df_id)
    result = target.groupby(group_by)[agg_col].agg(agg_func).reset_index()
    meta = _save_artifact(result, metadata={
        "rows": int(result.shape[0]),
        "columns": result.columns.tolist(),
        "source_df": df_id or "df_global",
        "group_by": group_by,
        "agg_col": agg_col,
        "agg_func": agg_func,
    })
    return _artifact_response(f"{agg_func.capitalize()} of {agg_col} by {group_by}", meta)

def create_bar_chart_v2(category_col: str, value_col: Optional[str] = None, aggregation: str = 'count', title: Optional[str] = None, df_id: Optional[str] = None) -> Dict[str, Any]:
    """Build a bar chart; return a figure handle."""
    target = _resolve_df(df_id)
    plt.figure(figsize=(10, 6))
    if aggregation == 'count':
        data = target[category_col].value_counts().sort_index()
        ylabel = 'Count'
    else:
        agg_funcs = {'mean': 'mean', 'sum': 'sum', 'max': 'max', 'min': 'min', 'median': 'median'}
        data = target.groupby(category_col)[value_col].agg(agg_funcs[aggregation])
        ylabel = f'{aggregation.capitalize()} of {value_col}'
    ax = data.plot(kind='bar')
    plt.title(title or f'{ylabel} by {category_col}')
    plt.xlabel(category_col)
    plt.ylabel(ylabel)
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    fig = ax.get_figure()
    plt.show()
    meta = _save_artifact(fig, kind="figure", metadata={
        "title": title or f'{ylabel} by {category_col}',
        "summary": f'{aggregation} of {value_col or "counts"} by {category_col}',
    })
    return _artifact_response("Bar chart created", meta)

def create_scatter_plot_v2(x_col: str, y_col: str, color_by: Optional[str] = None, df_id: Optional[str] = None) -> Dict[str, Any]:
    """Build a scatter plot; optional color_by groups."""
    target = _resolve_df(df_id)
    plt.figure(figsize=(10, 6))
    if color_by and color_by in target.columns:
        for category in target[color_by].unique():
            mask = target[color_by] == category
            plt.scatter(target[mask][x_col], target[mask][y_col], label=category, alpha=0.6)
        plt.legend()
    else:
        plt.scatter(target[x_col], target[y_col], alpha=0.6)
    plt.xlabel(x_col)
    plt.ylabel(y_col)
    plt.title(f'{y_col} vs {x_col}')
    plt.tight_layout()
    fig = plt.gcf()
    plt.show()
    meta = _save_artifact(fig, kind="figure", metadata={
        "x": x_col,
        "y": y_col,
        "color_by": color_by,
    })
    return _artifact_response(f"Scatter plot created: {y_col} vs {x_col}", meta)

def create_histogram_v2(column: str, bins: int = 20, df_id: Optional[str] = None) -> Dict[str, Any]:
    """Plot a histogram and return the figure handle."""
    target = _resolve_df(df_id)
    plt.figure(figsize=(10, 6))
    plt.hist(target[column].dropna(), bins=bins, edgecolor='black', alpha=0.7)
    plt.xlabel(column)
    plt.ylabel('Frequency')
    plt.title(f'Distribution of {column}')
    plt.tight_layout()
    fig = plt.gcf()
    plt.show()
    meta = _save_artifact(fig, kind="figure", metadata={
        "column": column,
        "bins": bins,
    })
    return _artifact_response(f"Histogram created for {column}", meta)

# ═══════════════════════════════════════════════════════════
# ALL TOOLS V2
# ═══════════════════════════════════════════════════════════

all_tools_v2 = [
    load_csv_v2, show_info_v2, show_data_v2,
    filter_rows_v2, calculate_statistics_v2, aggregate_by_v2,
    create_bar_chart_v2, create_scatter_plot_v2, create_histogram_v2,
]

tools_dict_v2 = {func.__name__: func for func in all_tools_v2}

### 🧪 Quick Sanity Check

In [27]:
# Test: load data, filter, compute a stat with handles
reset_artifacts()

load_result = load_csv_v2("data/pokemon.csv")
print(f"📂 {load_result['message']}")

fire_df_id = load_result["artifact"]["id"]
fire_result = filter_rows_v2("type_1 == 'Fire'", df_id=fire_df_id)
print(f"🔍 {fire_result['message']}")

stat_result = calculate_statistics_v2("attack", "mean", df_id=fire_result["artifact"]["id"])
print(f"📊 {stat_result['message']}")

📂 Loaded data/pokemon.csv: 800 rows, 13 columns
🔍 Filtered to 52 rows
📊 Mean of attack: 84.77


### 🧪 Re-run the ReAct Agent with Tool v2

Same question as before, but now the agent can pass `artifact['id']` between steps:

In [28]:
reset_artifacts()

query = "What is the average attack for Fire-type Pokemon?"
print(f"🎯 QUERY: {query}\n")
answer = react_agent(query, max_steps=10, tools=all_tools_v2)

print("\n" + "═"*70)
print(f"🎉 FINAL ANSWER: {answer}")
print("═"*70)

🎯 QUERY: What is the average attack for Fire-type Pokemon?


══════════════════════════════════════════════════════════════════════
🔄 STEP 1
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to filter the dataset to include only Fire-type Pokemon and then calculate the average attack for that subset.

⚙️  ACTION: filter_rows_v2(condition='type == "Fire"')

⚙️  EXECUTING: filter_rows_v2(condition='type == "Fire"')

❌ TOOL ERROR: name 'type' is not defined

══════════════════════════════════════════════════════════════════════
🔄 STEP 2
══════════════════════════════════════════════════════════════════════

💭 THOUGHT: I need to check the column names in the dataset to find the correct name for the type of Pokemon. This will help me filter the dataset correctly for Fire-type Pokemon.

⚙️  ACTION: show_info_v2(df_id=None)

⚙️  EXECUTING: show_info_v2(df_id=None)

📊 TOOL RESULT:
{'message': "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 800 entrie

### 🎉 Success!

Now the agent chains naturally:

1. `filter_rows_v2` → narrows to Fire types (`df_0`)
2. `calculate_statistics_v2` → uses `df_0` to get correct answer

### 📊 What This Fix Solves

| Feature | Before | After |
|---------|--------|-------|
| Pass data between steps | ❌ No | ✅ Yes |
| Reuse intermediate results | ❌ Lost | ✅ Preserved |
| Chain operations | ❌ Broken | ✅ Works |

### ⚠️ Drawbacks

- **No limits:** Store never expires entries (memory leak risk)
- **Global scope:** Concurrent users would overwrite each other
- **Learning curve:** Model must learn to use handles from prompt

---

## 💻 Part 4: Building a CodeAgent

### 🤔 How Would We Solve This by Hand?

**Query:** `What is the average attack for Fire-type Pokemon?`

Before asking the agent to chain even more tool calls, let's take a step back and consider how we'd answer ourselves:

```python
fire_pokemon = df[df['type_1'] == 'Fire']
avg_attack = fire_pokemon.attack.mean()
```

Very simple pandas code! Much simpler than tracking handles.

**💡 Actually, what if, instead of dealing with all the handles and keeping track of artifacts,  we let the agent write and execute this kind of code itself?**

#### Approaches

| Approach | How it works | Composition | Security |
|----------|--------------|-------------|----------|
| **Tool calls (strings)** | One tool per step, outputs are text | ❌ No | ✅ Very high |
| **Tool calls + handles** | Tools return handles (`df_id`) | ✅ Yes | ✅ High |
| **Code calling tools** | LLM writes Python with approved helpers | ✅ Yes | ⚠️ Medium |
| **Arbitrary code** | LLM can run any Python | ✅ Maximum | ⚠️ Low |

> We'll try building **Code calling tools** - a balance between flexibility and security.

---

### Step 1: Modify Tools to Return Data

Tools that can return objects, not just strings:

In [30]:
# ═══════════════════════════════════════════════════════════
# TOOLS V3: RETURN DATA + PRINT FEEDBACK
# ═══════════════════════════════════════════════════════════

def filter_rows_v3(condition: str):
    """Filter dataframe rows using pandas query syntax."""
    result = df.query(condition)
    print(f"Filtered to {len(result)} rows")  # 👈 LLM sees this when the generated code is executed
    return result  # 👈 Code gets DataFrame!

def aggregate_by_v3(data, group_by: str, agg_col: str, agg_func: str):
    """Group data and calculate aggregate statistics. Returns a Series."""
    result = data.groupby(group_by)[agg_col].agg(agg_func)
    print(f"Aggregated: {agg_func}({agg_col}) by {group_by}")
    return result

def calculate_statistics_v3(data, column: str, stat_type: str):
    """Calculate statistics on a column."""
    stat_functions = {
        'mean': data[column].mean,
        'median': data[column].median,
        'max': data[column].max,
        'min': data[column].min,
    }
    result = stat_functions[stat_type]()
    print(f"{stat_type.capitalize()} of {column}: {result:.2f}")
    return result

def show_info_v3():
    """ℹShow dataframe structure: columns, types, missing values."""
    print(df.info())
    return df

def show_data_v3(data, n: int = 5):
    """Show first n rows of dataframe."""
    print(f"Showing {n} rows")
    return data.head(n)

# ═══════════════════════════════════════════════════════════
# TOOLS DICTIONARY FOR CODE AGENT
# ═══════════════════════════════════════════════════════════

code_agent_tools = {
    'show_info_v3': show_info_v3,
    'show_data_v3': show_data_v3,
    'filter_rows_v3': filter_rows_v3,
    'aggregate_by_v3': aggregate_by_v3,
    'calculate_statistics_v3': calculate_statistics_v3
}

### Step 2: Build a Safe Executor

We'll build an execution environment where we can safely execute LLM-generated Python code:

#### 🔒 Security Features

- ✅ No imports allowed
- ✅ Limited builtins (no `open`, `exec`, `eval`)
- ✅ Print output capture
- ✅ Only our tools and some standard Python libraries available

In [31]:
class SimpleSafeExecutor:
    """
    🔒 Simple safe Python executor inspired by SmolAgents.
    
    Security features:
    - No imports allowed
    - Limited builtins (no open, exec, eval)
    - Print output capture
    - Only our tools and some standard Python libraries available
    """
    
    def __init__(self, tools: Dict[str, Callable]):
        self.tools = tools
        
        # Build safe builtins
        self.safe_builtins = {
            # ✅ Safe built-ins only
            'len': len,
            'range': range,
            'enumerate': enumerate,
            'zip': zip,
            'list': list,
            'dict': dict,
            'str': str,
            'int': int,
            'float': float,
            'print': print,
            'min': min,
            'max': max,
            'sum': sum,
            'sorted': sorted,
            # ❌ Explicitly blocked: open, exec, eval, compile, __import__
        }
    
    def execute(self, code: str, df) -> tuple[bool, str]:
        """
        Execute code safely and return (success, output).
        
        Args:
            code: Python code to execute
            df: DataFrame to provide
            
        Returns:
            (success: bool, output: str)
        """
        # Capture print output (SmolAgents pattern)
        output_buffer = StringIO()
        original_stdout = sys.stdout
        
        try:
            # Redirect stdout to capture prints
            sys.stdout = output_buffer
            
            # Build safe execution environment
            safe_globals = {
                '__builtins__': self.safe_builtins,
                'df': df,
                **self.tools,
            }
            
            # Execute code
            exec(code, safe_globals, {})
            
            # Get captured output
            output = output_buffer.getvalue()
            return True, output if output else "✅ Executed successfully"
            
        except Exception as e:
            error_output = output_buffer.getvalue()
            return False, (error_output + f"❌ Error: {type(e).__name__}: {str(e)}")
            
        finally:
            # Restore stdout
            sys.stdout = original_stdout
            output_buffer.close()

# Create executor
executor = SimpleSafeExecutor(code_agent_tools)

### Step 3: Test the Executor

Let's verify what's safe and what's blocked:

In [33]:
# Test 1: Safe operations
print("\n✅ Test 1: Safe operations (should work)")
code = """
fire = filter_rows_v3("type_1 == 'Fire'")
print(f"Got {len(fire)} Fire Pokemon")
"""
success, output = executor.execute(code, df)
print(f"Result: {'✅ Success' if success else '❌ Failed'}")
print(f"Output: {output}")


✅ Test 1: Safe operations (should work)
Result: ✅ Success
Output: Filtered to 52 rows
Got 52 Fire Pokemon



In [34]:
# Test 2: Blocked import
print("\n❌ Test 2: Try to import (should be blocked)")
code = """
import os
os.system('ls')
"""
success, output = executor.execute(code, df)
print(f"Result: {'❌ Passed (BAD!)' if success else '✅ Blocked (GOOD!)'}")
print(f"Output: {output}")


❌ Test 2: Try to import (should be blocked)
Result: ✅ Blocked (GOOD!)
Output: ❌ Error: ImportError: __import__ not found


In [35]:
# Test 3: Blocked file access
print("\n❌ Test 3: Try to open file (should be blocked)")
code = """
data = open('/etc/passwd').read()
"""
success, output = executor.execute(code, df)
print(f"Result: {'❌ Passed (BAD!)' if success else '✅ Blocked (GOOD!)'}")
print(f"Output: {output}")


❌ Test 3: Try to open file (should be blocked)
Result: ✅ Blocked (GOOD!)
Output: ❌ Error: NameError: name 'open' is not defined


In [36]:

# Test 4: Chaining operations
print("\n✅ Test 4: Chaining operations (should work)")
code = """
fire = filter_rows_v3("type_1 == 'Fire'")
avg_attack = calculate_statistics_v3(fire, 'attack', 'mean')
"""
success, output = executor.execute(code, df)
print(f"Result: {'✅ Success' if success else '❌ Failed'}")
print(f"Output: {output}")


✅ Test 4: Chaining operations (should work)
Result: ✅ Success
Output: Filtered to 52 rows
Mean of attack: 84.77



### 🎉 Key Observations

| Feature | Status |
|---------|--------|
| Our tools work | ✅ Yes |
| Variables work | ✅ Yes |
| Chaining works | ✅ Yes |
| Imports blocked | ✅ Yes |
| File access blocked | ✅ Yes |

---

### Step 4: Build the CodeAgent

First, define a structured output model for generated code:

In [37]:
class CodeResponse(BaseModel):
    """📋 Structured response for generated Python code"""
    code: str  # Pure Python code as string


def generate_code_tool_descriptions(tools: Dict[str, Callable]) -> str:
    """📚 Generate tool descriptions for CodeAgent."""
    descriptions = []
    for name, func in tools.items():
        doc = (func.__doc__ or "No description").strip()
        sig = inspect.signature(func)
        params = str(sig)
        descriptions.append(f"{name}{params}\n    {doc}")
    return "\n\n".join(descriptions)


def code_agent(query: str, max_attempts: int = 3) -> str:
    """
    🤖 Agent that generates Python code calling our tools (using structured outputs).
    
    Args:
        query: User's question
        max_attempts: Maximum retry attempts on errors
        
    Returns:
        Execution output or error message
    """
    
    error_context = ""
    
    for attempt in range(max_attempts):
        # Build prompt
        prompt = f"""You are a Python coding agent that writes code to answer data questions.

Query: {query}

Available tools (all return data, use print for feedback):
{generate_code_tool_descriptions(code_agent_tools)}

Global variables:
- df: Pokemon dataset (pandas DataFrame)

Instructions:
1. Inspect the data structure first (use show_info_v3 or show_data_v3) so you know column names.
2. Write Python code using ONLY the provided tools and safe builtins.
3. Store intermediate results in variables and reuse them.
4. Chain operations logically to answer the query end-to-end.
5. Use print() to show intermediate steps or debugging info.
6. Do NOT import anything or call pandas methods directly on df.
7. Finish with a concise summary that starts with "FINAL ANSWER:" describing the result.

{error_context}

You will respond with pure executable Python code (no markdown formatting, no explanations).
The code will be directly executed."""
        
        # Get code from LLM with structured output
        response = generate(prompt, temperature=0.1, response_format=CodeResponse)
        code = response.code.strip()
        
        print(f"\n{'═'*70}")
        print(f"📝 GENERATED CODE (Attempt {attempt + 1}):")
        print('═'*70)
        print(code)
        print('═'*70)
        
        # Execute safely
        success, output = executor.execute(code, df)
        
        print(f"\n📊 EXECUTION RESULT:")
        print(output)
        
        if success:
            return output
        else:
            # Retry with error context
            error_context = f"\n\nPrevious attempt failed with error:\n{output}\nPlease fix the code."
    
    return f"Failed after {max_attempts} attempts"


### Step 5: Test the CodeAgent

Let's try the query that had issues with ReAct:

In [38]:
query = "What is the average attack for Fire-type Pokemon?"

print(f"🎯 QUERY: {query}")
result = code_agent(query, max_attempts=10)

print(f"\n\n{'═'*70}")
print(f"✅ SUCCESS!")
print('═'*70)

🎯 QUERY: What is the average attack for Fire-type Pokemon?

══════════════════════════════════════════════════════════════════════
📝 GENERATED CODE (Attempt 1):
══════════════════════════════════════════════════════════════════════
show_info_v3()

# After inspecting the data structure, we will filter for Fire-type Pokemon.
fire_pokemon = filter_rows_v3("type == 'Fire'")

# Now we will calculate the average attack for these Fire-type Pokemon.
average_attack = calculate_statistics_v3(fire_pokemon, 'attack', 'mean')

# Finally, we will print the average attack.
print(average_attack)

# Summary of the result.
print(f'FINAL ANSWER: The average attack for Fire-type Pokemon is {average_attack}.')
══════════════════════════════════════════════════════════════════════

📊 EXECUTION RESULT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non

### 🎉 What Happened?

The CodeAgent generated Python code like:
```python
fire_pokemon = filter_rows_v3("type_1 == 'Fire'")
avg_attack = calculate_statistics_v3(fire_pokemon, 'attack', 'mean')
```

### 💡 Why Structured Outputs?

- ✅ **No markdown parsing**: LLM returns pure Python code in `response.code`
- ✅ **Reliability**: Guaranteed JSON format
- ✅ **Type safety**: Pydantic validates the response structure
- ✅ **Clean control flow**: Boolean flags instead of string checking
- ✅ **Error handling**: Invalid formats caught before execution

---

### 🧪 Test More Complex Queries

In [39]:
# Test 1: Multi-step analysis
query = "Show me the top 3 Pokemon types by average speed"
print(f"\n🎯 QUERY: {query}")
result = code_agent(query, max_attempts=10)
print("\n" + "═"*70 + "\n")


🎯 QUERY: Show me the top 3 Pokemon types by average speed

══════════════════════════════════════════════════════════════════════
📝 GENERATED CODE (Attempt 1):
══════════════════════════════════════════════════════════════════════
show_info_v3()

# Assuming the relevant columns are 'Type' and 'Speed', we will proceed to calculate the average speed by type.

# Step 1: Group by 'Type' and calculate average speed
average_speed_by_type = aggregate_by_v3(df, 'Type', 'Speed', 'mean')

# Step 2: Sort the average speeds in descending order and get the top 3 types
sorted_average_speed = average_speed_by_type.sort_values(ascending=False)

# Step 3: Get the top 3 types
top_3_types = sorted_average_speed.head(3)

# Print the intermediate results for debugging
print("Average speed by type:", average_speed_by_type)
print("Sorted average speed:", sorted_average_speed)
print("Top 3 types by average speed:", top_3_types)

# Final summary
print("FINAL ANSWER: The top 3 Pokemon types by average speed ar

In [40]:
# Test 2: Filter and calculate
query = "What's the highest defense stat among legendary Pokemon?"
print(f"\n🎯 QUERY: {query}")
result = code_agent(query, max_attempts=10)
print("\n" + "═"*70 + "\n")


🎯 QUERY: What's the highest defense stat among legendary Pokemon?

══════════════════════════════════════════════════════════════════════
📝 GENERATED CODE (Attempt 1):
══════════════════════════════════════════════════════════════════════
show_info_v3()

# Filter for legendary Pokemon
legendary_condition = "is_legendary == True"
filtered_legendary = filter_rows_v3(legendary_condition)

# Calculate the maximum defense stat among legendary Pokemon
max_defense = calculate_statistics_v3(filtered_legendary, 'defense', 'max')

print(f'Highest defense stat among legendary Pokemon: {max_defense}')

# Final summary
print(f'FINAL ANSWER: The highest defense stat among legendary Pokemon is {max_defense}.')
══════════════════════════════════════════════════════════════════════

📊 EXECUTION RESULT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           

---

## 🔒 Part 5: Security Analysis

Let's understand what makes our CodeAgent safe (and what doesn't).

### ✅ What's Blocked

Our `SimpleSafeExecutor` blocks:

In [41]:
print("🔒 Security Tests\n")
print("═" * 70)

tests = [
    ("Import os", "import os\nos.system('ls')"),
    ("Open file", "open('/etc/passwd').read()"),
    ("Use exec", "exec('print(123)')"),
    ("Use eval", "eval('1+1')"),
    ("Use __import__", "__import__('os').system('ls')"),
]

for name, code in tests:
    print(f"\n🧪 Test: {name}")
    success, output = executor.execute(code, df)
    status = "❌ PASSED (BAD!)" if success else "✅ BLOCKED (GOOD!)"
    print(f"  {status}")
    if not success:
        print(f"  Error: {output.split(':')[-1].strip()}")

print("\n" + "═" * 70)

🔒 Security Tests

══════════════════════════════════════════════════════════════════════

🧪 Test: Import os
  ✅ BLOCKED (GOOD!)
  Error: __import__ not found

🧪 Test: Open file
  ✅ BLOCKED (GOOD!)
  Error: name 'open' is not defined

🧪 Test: Use exec
  ✅ BLOCKED (GOOD!)
  Error: name 'exec' is not defined

🧪 Test: Use eval
  ✅ BLOCKED (GOOD!)
  Error: name 'eval' is not defined

🧪 Test: Use __import__
  ✅ BLOCKED (GOOD!)
  Error: name '__import__' is not defined

══════════════════════════════════════════════════════════════════════


### ⚠️ What's Still Risky

However, DataFrame methods are still accessible:

In [42]:
code = """
# Agent could still use pandas methods on df:
result = df[df['type_1'] == 'Fire']
print(f"Filtered using pandas directly: {len(result)} rows")
"""

success, output = executor.execute(code, df)
print(f"⚠️  Direct pandas access works: {success}")
print(f"📊 Output: {output}")

⚠️  Direct pandas access works: True
📊 Output: Filtered using pandas directly: 52 rows



### 🚨 Security Risk

**The problem:** We expose a pandas DataFrame (`df`) in the execution environment, which includes methods that can perform dangerous operations:

- `.to_csv()`, `.to_sql()`, `.to_pickle()` – write files to disk
- `.to_clipboard()` – access system clipboard
- SQL methods – execute arbitrary database queries

### 🛡️ Potential Solutions

For secure code execution, consider some of these options:

#### 📦 Sandboxed Execution Environments

- **[E2B](https://github.com/e2b-dev/E2B)** – Docker-based code sandboxes with resource limits
- **[Firecracker](https://firecracker-microvm.github.io/)** – Lightweight VMs for isolation
- **[gVisor](https://gvisor.dev/)** – Container sandbox with good isolation

#### 🐳 Container-Based Isolation

- Docker with restricted volumes and network policies
- Podman for rootless containers

#### ☁️ Cloud Sandboxes

- AWS Lambda (isolated serverless execution)
- Google Cloud Run (containerized execution)
- Azure Container Instances

#### 🐍 Python-Specific Sandboxes

- **[RestrictedPython](https://github.com/zopefoundation/RestrictedPython)** – Restricted Python execution
- **[PyPy sandbox](https://doc.pypy.org/en/latest/sandbox.html)** – Isolated Python interpreter

---

## 🎓 Part 6: Comparison and Takeaways

### 📊 ReAct vs CodeAgent

| Feature | ReAct (JSON + Handles) | CodeAgent |
|---------|----------------------|------------|
| **Multi-step reasoning** | ✅ Yes | ✅ Yes |
| **Composition** | ✅ Pass artifact handles | ✅ Native variables |
| **Variables** | Handles (IDs) | Python variables |
| **Intermediate results** | ✅ Preserved via store | ✅ Preserved |
| **Structured outputs** | ✅ JSON (thought + action) | ✅ JSON (code) |
| **Answer synthesis** | ✅ Auto-synthesize | ✅ Print statements |
| **Security** | ✅ Simple (tool calls) | ⚠️ Needs sandbox |
| **Flexibility** | ⚠️ Limited to tools | ✅ High |


---

### 🎯 What We Built Today

1. ✅ **ReAct Agent** – Multi-step reasoning with structured outputs + answer synthesis
2. ✅ **Discovered limitation** – Tools can't pass state between steps
3. ✅ **Artifact Store** – Simple solution for tracking intermediate results
4. ✅ **CodeAgent** – Generates Python code with structured outputs
5. ✅ **SimpleSafeExecutor** – Safe execution environment inspired by SmolAgents
6. ✅ **Security awareness** – Understanding constraints and production solutions

---

### 💡 Key Insights

#### Why Structured Outputs?

- ✅ No string parsing
- ✅ Type safety with Pydantic models
- ✅ Cleaner code (no regex or split logic)
- ✅ Boolean flags for control flow
- ✅ Runtime validation

#### Why Code Generation?

- ✅ Variables enable composition
- ✅ One execution context preserves state
- ✅ Natural way to chain operations
- ✅ More flexible than rigid tool sequences

#### Why Code Calling Tools (not arbitrary code)?

- ✅ Tools are vetted and safe
- ✅ Limited attack surface
- ✅ Balance between flexibility and security
- ⚠️ Still needs proper sandboxing for production

---

### 🚀 Next Steps

**Workshop 6:** Memory & Conversations
- Agent remembers context across multiple queries
- Conversation history management
- Long-term memory strategies