# The Agent Loop: Building Production Agents with LangChain 1.0

> **Note:** While this notebook can be adapted to use various LLM providers, we'll be using the Anthropic Claude API. Please follow the best practices outlined in the [SRHG AI Usage Guidelines](https://srhg.enterprise.slack.com/docs/T0HANKTEC/F0AB86J3A1L).

In this notebook, we'll explore the foundational concepts of AI agents and learn how to build production-grade agents using LangChain's new `create_agent` abstraction with middleware support. We'll build a **Stone Ridge Investment Assistant** that can answer questions about Stone Ridge's investment philosophy, market insights, and strategic outlook.

**Learning Objectives:**
- Understand what an "agent" is and how the agent loop works
- Learn the core constructs of LangChain (Runnables, LCEL)
- Master the `create_agent` function and middleware system
- Build an agentic RAG application using Qdrant for Stone Ridge investor letters

## Table of Contents:

- **Part 1:** Introduction to LangChain, LangSmith, and `create_agent`
  - Task 1: Dependencies
  - Task 2: Environment Variables
  - Task 3: LangChain Core Concepts (Runnables & LCEL)
  - Task 4: Understanding the Agent Loop
  - Task 5: Building Your First Agent with `create_agent()`
  - Question #1 & Question #2
  - Activity #1: Create a Custom Tool

- **Part 2:** Middleware - Agentic RAG with Qdrant
  - Task 6: Loading & Chunking Documents
  - Task 7: Setting up Qdrant Vector Database
  - Task 8: Creating a RAG Tool
  - Task 9: Introduction to Middleware
  - Task 10: Building Agentic RAG with Middleware
  - Question #3 & Question #4
  - Activity #2: Enhance the Agent

---
# Part 1
## Introduction to LangChain, LangSmith, and `create_agent`

## Task 1: Dependencies

First, let's ensure we have all the required packages installed. We'll be using:

- **LangChain 1.0+**: The core framework with the new `create_agent` API
- **LangChain-Anthropic**: Anthropic Claude model integrations
- **LangChain-OpenAI**: OpenAI embeddings (we'll use Claude for chat, OpenAI for embeddings)
- **LangSmith**: Observability and tracing
- **Qdrant**: Vector database for RAG
- **PyMuPDF**: PDF parsing for investor letters

## Task 2: Environment Variables

We need to set up our API keys for:
1. **Anthropic** - For Claude models (chat/reasoning)
2. **OpenAI** - For embeddings (text-embedding-3-small)
3. **LangSmith** - For tracing and observability (optional but recommended)

## Task 3: LangChain Core Concepts

Before diving into agents, let's understand the fundamental building blocks of LangChain.

### What is a Runnable?

A **Runnable** is the core abstraction in LangChain - think of it as a standardized component that:
- Takes an input
- Performs some operation
- Returns an output

Every component in LangChain (models, prompts, retrievers, parsers) is a Runnable, which means they all share the same interface:

```python
result = runnable.invoke(input)           # Single input
results = runnable.batch([input1, input2]) # Multiple inputs
for chunk in runnable.stream(input):       # Streaming
    print(chunk)
```

### What is LCEL (LangChain Expression Language)?

**LCEL** allows you to chain Runnables together using the `|` (pipe) operator:

```python
chain = prompt | model | output_parser
result = chain.invoke({"query": "Hello!"})
```

This is similar to Unix pipes - the output of one component becomes the input to the next.

## Task 4: Understanding the Agent Loop

### What is an Agent?

An **agent** is a system that uses an LLM to decide what actions to take. Unlike a simple chain that follows a fixed sequence, an agent can:

1. **Reason** about what to do next
2. **Take actions** by calling tools
3. **Observe** the results
4. **Iterate** until the task is complete

### The Agent Loop

The core of every agent is the **agent loop**:

```
                          AGENT LOOP                         
                                                             
      +----------+     +----------+     +----------+         
      |  Model   | --> |   Tool   | --> |  Model   | --> ... 
      |   Call   |     |   Call   |     |   Call   |         
      +----------+     +----------+     +----------+         
           |                                  |              
           v                                  v              
      "Use search"                   "Here's the answer"     
```

1. **Model Call**: The LLM receives the current state and decides whether to:
   - Call a tool (continue the loop)
   - Return a final answer (exit the loop)

2. **Tool Call**: If the model decides to use a tool, the tool is executed and its output is added to the conversation

3. **Repeat**: The loop continues until the model decides it has enough information to answer

### Why `create_agent`?

LangChain 1.0 introduced `create_agent` as the new standard way to build agents. It provides:

- **Simplified API**: One function to create production-ready agents
- **Middleware Support**: Hook into any point in the agent loop
- **Built on LangGraph**: Uses the battle-tested LangGraph runtime under the hood

In [7]:
from langchain_core.tools import tool

@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression. Use this for any math calculations.
    
    Args:
        expression: A mathematical expression to evaluate (e.g., '2 + 2', '10 * 5')
    """
    try:
        # Using eval with restricted globals for safety
        result = eval(expression, {"__builtins__": {}}, {})
        return f"The result of {expression} is {result}"
    except Exception as e:
        return f"Error evaluating expression: {e}"

@tool
def get_current_time() -> str:
    """Get the current date and time. Use this when the user asks about the current time or date."""
    from datetime import datetime
    return f"The current date and time is: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}"

# Create our tool belt
tools = [calculate, get_current_time]

print("Tools created:")
for t in tools:
    print(f"  - {t.name}: {t.description[:60]}...")

Tools created:
  - calculate: Evaluate a mathematical expression. Use this for any math ca...
  - get_current_time: Get the current date and time. Use this when the user asks a...


In [8]:
from langchain.agents import create_agent

# Create the Claude model for our agent
claude_model = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)

# Create our first agent
simple_agent = create_agent(
    model=claude_model,
    tools=tools,
    system_prompt="You are a helpful assistant that can perform calculations and tell the time. Always explain your reasoning."
)

print("Agent created successfully!")
print(f"Type: {type(simple_agent)}")

Agent created successfully!
Type: <class 'langgraph.graph.state.CompiledStateGraph'>


In [9]:
# Test the agent with a simple calculation
response = simple_agent.invoke(
    {"messages": [{"role": "user", "content": "What is 25 * 48?"}]}
)

# Print the final response
print("Agent Response:")
print(response["messages"][-1].content)

Agent Response:
The answer is 1,200. When you multiply 25 by 48, you get 1,200.


In [11]:
# Let's see the full conversation to understand the agent loop
print("Full Agent Conversation:")
print("=" * 50)
for msg in response["messages"]:
    role = msg.type if hasattr(msg, 'type') else 'unknown'
    content = msg.content if hasattr(msg, 'content') else str(msg)
    print(f"\n[{role.upper()}]")
    print(content[:500] if len(str(content)) > 500 else content)

Full Agent Conversation:

[HUMAN]
What time is it, and what is 100 divided by the current hour?

[AI]
[{'text': "I'll help you find the current time and then calculate 100 divided by the current hour.", 'type': 'text'}, {'id': 'toolu_019SAw3Ui4cnqce9QNw5QwLp', 'input': {}, 'name': 'get_current_time', 'type': 'tool_use'}]

[TOOL]
The current date and time is: 2026-02-10 00:55:14

[AI]
The current time is 00:55:14 (12:55:14 AM) on February 10, 2026. The current hour is 0 (midnight hour in 24-hour format).

However, I cannot divide 100 by 0 as that would result in division by zero, which is undefined mathematically. 

If you'd like me to use a different interpretation:
- If we consider the hour in 12-hour format, it would be 12 (midnight as hour 12)
- Or if you meant to wait until a different hour when division would be possible

Would you like me to calculate 100 √∑ 12 instead,


In [12]:
# Stream the agent's response
print("Streaming Agent Response:")
print("=" * 50)

for chunk in simple_agent.stream(
    {"messages": [{"role": "user", "content": "Calculate 15% of 250"}]},
    stream_mode="updates"
):
    for node, values in chunk.items():
        print(f"\n[Node: {node}]")
        if "messages" in values:
            for msg in values["messages"]:
                if hasattr(msg, 'content') and msg.content:
                    print(msg.content)

Streaming Agent Response:

[Node: model]
[{'text': "I'll calculate 15% of 250 for you.", 'type': 'text'}, {'id': 'toolu_01HU7sPshywfRWsuoAFFSshE', 'input': {'expression': '0.15 * 250'}, 'name': 'calculate', 'type': 'tool_use'}]

[Node: tools]
The result of 0.15 * 250 is 37.5

[Node: model]
15% of 250 is **37.5**.

To explain the calculation: 15% means 15/100 = 0.15, so we multiply 250 by 0.15 to get the result.


## ‚ùì Question #2:

Looking at the `calculate` and `get_current_time` tools we created, why is the **docstring** so important for each tool? How does the agent use this information when deciding which tool to call?

##### ‚úÖ Answer:
The docstrings for the tool functions tell the agent what kind of extra capabilities it has available, and what the expected input arguments and output structure of those functional capabilities are. It can deteremine which of its existing tools is most capable of addressing the most recent message in the chain, without having to parse through the actual logic in the tool function itself.

In [13]:
### YOUR CODE HERE ###

# Create your custom tool
@tool
def get_current_timezone() -> str:
    """Get the current timezone information. Use this when the user asks about the timezone, time zone, or what timezone they are in.
    
    Returns:
        str: The current timezone name and UTC offset
    """
    from datetime import datetime
    import time
    
    # Get timezone name
    tz_name = time.tzname[time.daylight]
    
    # Get UTC offset
    offset_seconds = -time.timezone if not time.daylight else -time.altzone
    offset_hours = offset_seconds // 3600
    offset_minutes = (abs(offset_seconds) % 3600) // 60
    
    # Format offset
    offset_str = f"UTC{'+' if offset_hours >= 0 else ''}{offset_hours:02d}:{offset_minutes:02d}"
    
    return f"Current timezone: {tz_name} ({offset_str})"

@tool
def get_time_hours_ago(hours: float) -> str:
    """Calculate what the date and time was a specified number of hours ago. Use this when the user asks about past times.
    
    Args:
        hours: The number of hours to go back in time (e.g., 2.5 for 2.5 hours ago)
    
    Returns:
        str: The date and time that many hours ago
    """
    from datetime import datetime, timedelta
    
    try:
        past_time = datetime.now() - timedelta(hours=float(hours))
        return f"{hours} hour(s) ago, it was: {past_time.strftime('%Y-%m-%d %H:%M:%S')}"
    except Exception as e:
        return f"Error calculating past time: {e}"

# Add your tools to the tools list
tools = [calculate, get_current_time, get_current_timezone, get_time_hours_ago]

# Create a new agent with the updated tools
simple_agent = create_agent(
    model=claude_model,
    tools=tools,
    system_prompt="You are a helpful assistant that can perform time-based calculations, tell the time, provide timezone information, and calculate past times. Always explain your reasoning."
)

---
# Part 2
## Middleware - Agentic RAG with Qdrant

## Task 6: Loading & Chunking Documents

We'll use the Stone Ridge 2025 Investor Letter - the same document from Module 2 - to build our investment assistant.

In [16]:
# Split the documents into chunks
text_splitter = CharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)

chunks = text_splitter.split_texts(documents)

print(f"Split into {len(chunks)} chunks")
print(f"\nSample chunk:")
print("-" * 50)
print(chunks[0][:300] + "...")

Split into 133 chunks

Sample chunk:
--------------------------------------------------
2025 Investor Letter
Investor Letter
‚ÄúEvery driver has a limit.  Mine is a little bit further than others.‚Äù
‚Äî‚ÄÇ Ayrton Senna, greatest Formula One driver of all time
‚ÄúI‚Äôm not funny.  What I am is brave.‚Äù
‚Äî‚ÄÇ Lucille Ball, greatest female comedian of all time
‚ÄúI‚Äôd rather be optimistic and wrong than pe...


In [17]:
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

# Initialize the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

# Get embedding dimension
sample_embedding = embedding_model.embed_query("test")
embedding_dim = len(sample_embedding)
print(f"Embedding dimension: {embedding_dim}")

Embedding dimension: 1536


In [20]:
# Create the vector store and add documents
from langchain_core.documents import Document

# Convert chunks to LangChain Document objects
langchain_docs = [Document(page_content=chunk) for chunk in chunks]

# Create vector store
vector_store = QdrantVectorStore(
    client=qdrant_client,
    collection_name=collection_name,
    embedding=embedding_model
)

# Add documents to the vector store
vector_store.add_documents(langchain_docs)

print(f"Added {len(langchain_docs)} documents to vector store")

Added 133 documents to vector store


## Task 8: Creating a RAG Tool

Now we'll wrap our retriever as a tool that the agent can use. This is the key to **Agentic RAG** - the agent decides when to retrieve information about Stone Ridge's investment philosophy and strategy.

## Task 9: Introduction to Middleware

**Middleware** in LangChain 1.0 allows you to hook into the agent loop at various points:

```
                       MIDDLEWARE HOOKS                 
                                                        
   +--------------+                    +--------------+ 
   | before_model | --> MODEL CALL --> | after_model  | 
   +--------------+                    +--------------+ 
                                                        
   +-------------------+                                
   | wrap_model_call   |  (intercept and modify calls)  
   +-------------------+                                
```

Common use cases:
- **Logging**: Track what the agent is doing
- **Guardrails**: Filter or modify inputs/outputs
- **Rate limiting**: Control API usage
- **Human-in-the-loop**: Pause for human approval

LangChain provides middleware through **decorator functions** that hook into specific points in the agent loop.

In [24]:
# You can also use the built-in ModelCallLimitMiddleware to prevent runaway agents
from langchain.agents.middleware import ModelCallLimitMiddleware

# This middleware will stop the agent after 10 model calls per thread
call_limiter = ModelCallLimitMiddleware(
    thread_limit=10,  # Max calls per conversation thread
    run_limit=5,      # Max calls per single run
    exit_behavior="end"  # What to do when limit is reached
)

print("Call limit middleware created!")
print(f"  - Thread limit: {call_limiter.thread_limit}")
print(f"  - Run limit: {call_limiter.run_limit}")

Call limit middleware created!
  - Thread limit: 10
  - Run limit: 5


In [25]:
from langchain.agents import create_agent

# Reset the call counter
model_call_count = 0

# Define our tools - include the RAG tool and the calculator from earlier
rag_tools = [
    search_investment_knowledge,
    calculate,
    get_current_time
]

# Create the Claude model for our RAG agent
claude_rag_model = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)

# Create the agentic RAG system with middleware
investment_agent = create_agent(
    model=claude_rag_model,
    tools=rag_tools,
    system_prompt="""You are a helpful Stone Ridge investment assistant with access to a comprehensive knowledge base of investor letters and company information.

Your role is to:
1. Answer questions about Stone Ridge's investment philosophy, market insights, and strategic outlook
2. Always search the knowledge base when the user asks investment-related questions
3. Provide accurate, helpful information based on the retrieved context
4. Be professional and informative in your responses
5. If you cannot find relevant information, say so honestly
6. Include a reminder that information is for educational purposes only and not investment advice when appropriate

Remember: Always cite information from the knowledge base when applicable.""",
    middleware=[
        log_before_model,
        log_after_model,
        call_limiter
    ]
)

print("Investment Agent created with middleware!")

Investment Agent created with middleware!


In [27]:
# Test with a more complex query
print("Testing with complex query")
print("=" * 50)

response = investment_agent.invoke(
    {"messages": [{"role": "user", "content": "What does Stone Ridge say about their energy investments? Also, if they invested $100 million with a 50% return over 12 years, what would be the total value?"}]}
)
print("\n" + "=" * 50)
print("FINAL RESPONSE:")
print("=" * 50)
print(response["messages"][-1].content)

Testing with complex query
[LOG] Model call #7 - Messages in state: 1
[LOG] After model - Tool calls requested: [{'name': 'search_investment_knowledge', 'args': {'query': 'energy investments oil gas energy sector strategy'}, 'id': 'toolu_01N41xSgabPAbxnbTvuhjFi3', 'type': 'tool_call'}, {'name': 'calculate', 'args': {'expression': '100000000 * (1 + 0.5)'}, 'id': 'toolu_019aEvRw8fjqnRP1etijsBG7', 'type': 'tool_call'}]
[LOG] Model call #8 - Messages in state: 4
[LOG] After model - Tool calls requested: [{'name': 'search_investment_knowledge', 'args': {'query': 'Stone Ridge Energy SRE investment philosophy approach natural gas oil'}, 'id': 'toolu_0116y2oaH9GUM3n35yXQ5YxP', 'type': 'tool_call'}]
[LOG] Model call #9 - Messages in state: 6
[LOG] After model - Tool calls requested: []

FINAL RESPONSE:
Based on the Stone Ridge investment knowledge base, here's what Stone Ridge says about their energy investments:

## Stone Ridge Energy (SRE) Investment Approach

**Key Investment Philosophy:**
-

### Visualizing the Agent

The agent created by `create_agent` is built on LangGraph, so we can visualize its structure.

---
## ‚ùì Question #3:

How does **Agentic RAG** differ from traditional RAG? What are the advantages and potential disadvantages of letting the agent decide when to retrieve information from the Stone Ridge investor letters?

##### ‚úÖ Answer:
Agentic RAG combines the external information retrieval capabilities of traditional RAG with a dynamic reasoning and action loop enabled by agents. Agentic RAG can generate an answer to a given input question in multiple steps, using tool calls and LLM reasoning each time to hone the current output to the format of the expected answer.<br><br>
The agentic approach has the advantage of performing multiple rounds of auomtated reasoning between rounds of external infomration acquisition to get answers to complex, multi-part questions, without requirng a human to stitch the output of one tool to the input of another tool or model reply. The agentic approach can also result in unnecessary use of tools, consuming more tokens and prolonging the time taken to answer simple questions. It also requires the agent to re-check all of its available tools at each step, costing time and compute.

## ‚ùì Question #4:

Looking at the middleware examples (`log_before_model`, `log_after_model`, and `ModelCallLimitMiddleware`), describe a real-world scenario where middleware would be essential for a production agent. What specific middleware hooks would you use and why?

##### ‚úÖ Answer:
Middleware is essential in production agents for monitoring, governance, and reliability across several critical scenarios. In regulated industries like healthcare and finance, compliance and audit trail middleware is required to record PII access, tool usage, and data sources while redacting sensitive information from logs, proving what data was accessed and how decisions were made. For reliability, retry-with-backoff middleware handles transient API failures, circuit breaker middleware prevents cascading failures when upstream services are down, and rate limiting middleware respects API quotas to prevent a single complex query from exhausting rate limits and breaking the service for everyone.

---
## üèóÔ∏è Activity #2: Enhance the Agentic RAG System

Now it's your turn! Enhance the investment agent by implementing ONE of the following:

### Option A: Add a New Tool
Create a new tool that the agent can use. Ideas:
- A tool that calculates compound annual growth rate (CAGR)
- A tool that compares investment returns across different time periods
- A tool that formats financial figures with proper notation

### Option B: Create Custom Middleware
Build middleware that adds new functionality:
- Middleware that tracks which tools are used most frequently
- Middleware that adds a compliance disclaimer to investment-related responses
- Middleware that enforces a response length limit

### Option C: Improve the RAG Tool
Enhance the retrieval tool:
- Add metadata filtering by year or topic
- Implement reranking of results for financial relevance
- Add source citations with relevance scores

# Custom Financial Analysis Agent

In [35]:
import os

import getpass
from langchain.agents import create_agent
from langchain.agents.middleware import before_model, after_model
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
import nest_asyncio
import pandas as pd
from uuid import uuid4


# configure async execution pattern
nest_asyncio.apply()  # Required for async operations in Jupyter


# Set Anthropic, OpenAI and LangChain API Keys
os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Anthropic API Key: ")
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key (for embeddings): ")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE9 - The Agent Loop - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key (press Enter to skip): ") or ""

if not os.environ["LANGCHAIN_API_KEY"]:
    os.environ["LANGCHAIN_TRACING_V2"] = "false"

In [50]:
# Define custom tools for the Agent to use
@tool
def load_returns_csv(filename: str) -> str:
    """Load a CSV file from the data/returns folder and return its contents as a formatted table.
    
    Use this tool to load historical investment return data. Available files include:
    - vfiax_returns_2011_2025.csv: VFIAX fund returns from 2011-2025
    - sp500_returns_1928_2025.csv: S&P 500 returns from 1928-2025
    - top_etf_returns_2020_2025.csv: Top performing ETFs from 2020-2025
    
    Args:
        filename: Name of the CSV file in data/returns folder (with or without .csv extension)
    
    Returns:
        A formatted string representation of the DataFrame or error message
    """
    
    try:
        # Add .csv extension if not present
        if not filename.endswith('.csv'):
            filename += '.csv'
        
        # Construct full path
        filepath = os.path.join('data', 'returns', filename)
        
        # Check if file exists
        if not os.path.exists(filepath):
            return f"Error: File '{filename}' not found in data/returns folder. Available files: vfiax_returns_2011_2025.csv, sp500_returns_1928_2025.csv, top_etf_returns_2020_2025.csv"
        
        # Load the CSV
        df = pd.read_csv(filepath)
        
        # Return formatted table with summary
        return f"Loaded {filename}:\n\nShape: {df.shape[0]} rows x {df.shape[1]} columns\nColumns: {', '.join(df.columns.tolist())}\n\nFirst 10 rows:\n{df.head(10).to_string(index=False)}\n\nData available for further analysis."
    
    except Exception as e:
        return f"Error loading CSV: {str(e)}"

@tool
def standardize_percent_representations(filename: str, column_names: str) -> str:
    """Convert string-based percent representations in a pandas DataFrame to standardized float format.
    
    This tool takes a CSV file and column name(s), converts percentage strings (like '50%', '50.5%', 
    '0.5', etc.) to decimal floats (0.50, 0.505, 0.005), and saves the updated file.
    
    Args:
        filename: Name of the CSV file in data/returns folder (with or without .csv extension)
        column_names: Comma-separated column names to convert (e.g., 'Return' or 'Return,Yield')
    
    Returns:
        Success message with conversion details or error message
    """
    
    try:
        # Add .csv extension if not present
        if not filename.endswith('.csv'):
            filename += '.csv'
        
        # Construct full path
        filepath = os.path.join('data', 'returns', filename)
        
        # Check if file exists
        if not os.path.exists(filepath):
            return f"Error: File '{filename}' not found in data/returns folder."
        
        # Load the CSV
        df = pd.read_csv(filepath)
        
        # Parse column names
        columns = [col.strip() for col in column_names.split(',')]
        
        # Validate columns exist
        missing_cols = [col for col in columns if col not in df.columns]
        if missing_cols:
            return f"Error: Columns not found in DataFrame: {', '.join(missing_cols)}. Available columns: {', '.join(df.columns.tolist())}"
        
        # Track conversions
        conversions = {}
        
        # Convert each specified column
        for col in columns:
            original_dtype = df[col].dtype
            converted_count = 0
            
            def convert_percent(value):
                nonlocal converted_count
                
                # If already numeric, return as-is
                if pd.isna(value):
                    return value
                
                if isinstance(value, (int, float)):
                    return float(value)
                
                # Convert to string and strip whitespace
                str_value = str(value).strip()
                
                # Handle empty strings
                if not str_value:
                    return None
                
                try:
                    # Remove % symbol if present
                    if '%' in str_value:
                        converted_count += 1
                        # Remove % and convert to decimal (50% -> 0.50)
                        return float(str_value.replace('%', '')) / 100.0
                    else:
                        # Parse as regular float
                        return float(str_value)
                except ValueError:
                    # If conversion fails, return original value
                    return value
            
            # Apply conversion
            df[col] = df[col].apply(convert_percent)
            
            conversions[col] = {
                'original_dtype': str(original_dtype),
                'new_dtype': str(df[col].dtype),
                'converted_values': converted_count
            }
        
        # Save the updated DataFrame
        df.to_csv(filepath, index=False)
        
        # Build result message
        result_parts = [f"Successfully standardized percent representations in {filename}:"]
        for col, stats in conversions.items():
            result_parts.append(f"  - {col}: Converted {stats['converted_values']} values from {stats['original_dtype']} to {stats['new_dtype']}")
        
        result_parts.append(f"\nUpdated file saved to {filepath}")
        result_parts.append(f"\nSample of converted data:\n{df[columns].head(5).to_string(index=False)}")
        
        return '\n'.join(result_parts)
    
    except Exception as e:
        return f"Error standardizing percent values: {str(e)}"

@tool
def calculate_comparable_returns(filename: str, return_column: str, return_type: str, years: int = None) -> str:
    """Calculate and normalize investment returns for fair comparison across different data formats.
    
    Use this tool to convert between cumulative returns and annualized returns, enabling
    apples-to-apples comparison of investment performance.
    
    Args:
        filename: Name of the CSV file in data/returns folder (with or without .csv extension)
        return_column: Name of the column containing return data (e.g., 'Total Return', '5-Year Return')
        return_type: Type of returns in the data - either 'cumulative' (total return over period) 
                     or 'annual' (year-by-year returns)
        years: Number of years for the period (required if return_type='cumulative')
    
    Returns:
        Analysis showing both annualized (CAGR) and cumulative returns for comparison
    """
    
    try:
        # Add .csv extension if not present
        if not filename.endswith('.csv'):
            filename += '.csv'
        
        # Construct full path
        filepath = os.path.join('data', 'returns', filename)
        
        # Check if file exists
        if not os.path.exists(filepath):
            return f"Error: File '{filename}' not found in data/returns folder."
        
        # Load the CSV
        df = pd.read_csv(filepath)
        
        # Validate return column exists
        if return_column not in df.columns:
            return f"Error: Column '{return_column}' not found. Available columns: {', '.join(df.columns.tolist())}"
        
        # Normalize return_type
        return_type = return_type.lower().strip()
        
        if return_type == 'cumulative':
            if years is None or years <= 0:
                return "Error: 'years' parameter is required and must be positive when return_type='cumulative'"
            
            # Parse cumulative returns (handle % strings)
            def parse_return(val):
                if pd.isna(val):
                    return None
                if isinstance(val, (int, float)):
                    return float(val) / 100 if val > 1 else float(val)
                # String with %
                str_val = str(val).strip().replace('%', '').replace(',', '')
                try:
                    num = float(str_val)
                    return num / 100 if num > 1 else num
                except:
                    return None
            
            df['parsed_return'] = df[return_column].apply(parse_return)
            df = df.dropna(subset=['parsed_return'])
            
            # Calculate CAGR: (1 + total_return)^(1/years) - 1
            df['CAGR'] = ((1 + df['parsed_return']) ** (1.0 / years)) - 1
            df['cumulative_return'] = df['parsed_return']
            
            # Format results
            result_parts = [f"Analysis of {filename} (Cumulative {years}-Year Returns):\n"]
            result_parts.append(f"{'Investment':<40} {'Cumulative Return':>18} {'CAGR (Annual)':>18}")
            result_parts.append("-" * 80)
            
            for idx, row in df.head(15).iterrows():
                name = row.get('Name', row.get('Symbol', f'Row {idx}'))[:38]
                cum_ret = f"{row['cumulative_return']*100:.2f}%"
                cagr = f"{row['CAGR']*100:.2f}%"
                result_parts.append(f"{name:<40} {cum_ret:>18} {cagr:>18}")
            
            result_parts.append(f"\nüìä Summary Statistics:")
            result_parts.append(f"  - Average Cumulative Return: {df['cumulative_return'].mean()*100:.2f}%")
            result_parts.append(f"  - Average CAGR: {df['CAGR'].mean()*100:.2f}%")
            result_parts.append(f"  - Best CAGR: {df['CAGR'].max()*100:.2f}%")
            
        elif return_type == 'annual':
            # Parse annual returns
            def parse_return(val):
                if pd.isna(val):
                    return None
                if isinstance(val, (int, float)):
                    return float(val) / 100 if abs(val) > 1 else float(val)
                str_val = str(val).strip().replace('%', '').replace(',', '')
                try:
                    num = float(str_val)
                    return num / 100 if abs(num) > 1 else num
                except:
                    return None
            
            df['parsed_return'] = df[return_column].apply(parse_return)
            df = df.dropna(subset=['parsed_return'])
            
            # Calculate compound return: product of (1 + r) for each year, minus 1
            cumulative = 1.0
            returns_list = df['parsed_return'].tolist()
            for r in returns_list:
                cumulative *= (1 + r)
            cumulative_return = cumulative - 1
            
            # Calculate CAGR
            n_years = len(returns_list)
            cagr = (cumulative ** (1.0 / n_years)) - 1 if n_years > 0 else 0
            
            result_parts = [f"Analysis of {filename} (Annual Returns):\n"]
            result_parts.append(f"{'Year':<10} {'Annual Return':>15}")
            result_parts.append("-" * 30)
            
            for idx, row in df.iterrows():
                year = row.get('Year', row.get('Date', f'Period {idx}'))
                annual_ret = f"{row['parsed_return']*100:.2f}%"
                result_parts.append(f"{str(year):<10} {annual_ret:>15}")
            
            result_parts.append(f"\nüìä Summary Statistics:")
            result_parts.append(f"  - Number of Years: {n_years}")
            result_parts.append(f"  - Cumulative Return ({n_years} years): {cumulative_return*100:.2f}%")
            result_parts.append(f"  - CAGR: {cagr*100:.2f}%")
            result_parts.append(f"  - Average Annual Return: {df['parsed_return'].mean()*100:.2f}%")
            result_parts.append(f"  - Best Year: {df['parsed_return'].max()*100:.2f}%")
            result_parts.append(f"  - Worst Year: {df['parsed_return'].min()*100:.2f}%")
            
        else:
            return f"Error: return_type must be either 'cumulative' or 'annual', got '{return_type}'"
        
        return '\n'.join(result_parts)
    
    except Exception as e:
        return f"Error calculating comparable returns: {str(e)}"

tools = [load_returns_csv, standardize_percent_representations, calculate_comparable_returns]


In [51]:
# Create the Claude model for our agent
claude_model = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)


@after_model
def log_after_model(state, runtime):
    """Called after each model invocation."""
    last_message = state.get("messages", [])[-1] if state.get("messages") else None
    if last_message:
        has_tool_calls = hasattr(last_message, 'tool_calls') and last_message.tool_calls
        print(f"[LOG] After model - Tool calls requested: {has_tool_calls}")
    return None


# Create the financial analysis agent with middleware
financial_agent = create_agent(
    model=claude_model,
    tools=tools,
    system_prompt="""You are a helpful financial analysis assistant with access to historical return data for various investments.

Your role is to:
1. Help users analyze investment returns and performance data
2. Load CSV files with historical return data when requested
3. Standardize percent representations to float format when needed
4. Provide accurate calculations and insights based on the data
5. Be professional and informative in your responses
6. Always remind users that this is for educational purposes only and not investment advice

Remember: Use the available tools to access and analyze the data.""",
    middleware=[
        log_after_model
    ]
)

print("Financial Analysis Agent created with middleware!")

Financial Analysis Agent created with middleware!


In [39]:
# Test the financial analysis agent
query = ("Compare the S&P500, VFIAX and ETF return files in the data/returns folder "
         "to determine which strategy would produce the best returns for $10,000 invested in 2020")


print(f"Query: {query}\n")
print("=" * 80)

result = financial_agent.invoke({"messages": [{"role": "user", "content": query}]})

print("\n" + "=" * 80)
print(f"Agent Response:\n{result['messages'][-1].content}")
print("\n" + "=" * 80)
print(f"Total model calls made: {model_call_count}")

Query: what were the VFIAX returns like for the last 5 years

[LOG] After model - Tool calls requested: [{'name': 'load_returns_csv', 'args': {'filename': 'vfiax_returns_2011_2025.csv'}, 'id': 'toolu_01HWkfEjD5yPpSw7GTSTFm6a', 'type': 'tool_call'}]
[LOG] After model - Tool calls requested: [{'name': 'load_returns_csv', 'args': {'filename': 'vfiax_returns_2011_2025'}, 'id': 'toolu_01Te6CuTkGZ23wrrxmkDTQWU', 'type': 'tool_call'}]
[LOG] After model - Tool calls requested: [{'name': 'load_returns_csv', 'args': {'filename': 'vfiax_returns_2011_2025.csv'}, 'id': 'toolu_01KKsTzuyq83LZg229gjwEwc', 'type': 'tool_call'}]
[LOG] After model - Tool calls requested: []

Agent Response:
I apologize, but I'm experiencing a technical issue accessing the VFIAX returns file, even though it appears to be listed as available. This might be a temporary system issue.

However, I can provide you with some general context about VFIAX performance over the last 5 years (2020-2024):

**VFIAX (Vanguard S&P 500 Ind

In [52]:
# Test the financial analysis agent
query = ("Compare the S&P500, VFIAX and ETF return files in the data/returns folder "
         "to determine which type of fund had the best returns in the last 5 yars: "
         "S&P500 Index Fund, any ETF fund, VFIAX mutual fund")


print(f"Query: {query}\n")
print("=" * 80)

result = financial_agent.invoke({"messages": [{"role": "user", "content": query}]})

print("\n" + "=" * 80)
print(f"Agent Response:\n{result['messages'][-1].content}")
print("\n" + "=" * 80)
print(f"Total model calls made: {model_call_count}")

Query: Compare the S&P500, VFIAX and ETF return files in the data/returns folder to determine which type of fund had the best returns in the last 5 yars: S&P500 Index Fund, any ETF fund, VFIAX mutual fund

[LOG] After model - Tool calls requested: [{'name': 'load_returns_csv', 'args': {'filename': 'sp500_returns_1928_2025.csv'}, 'id': 'toolu_015urzAYVyoanq17zcjBVVJb', 'type': 'tool_call'}, {'name': 'load_returns_csv', 'args': {'filename': 'vfiax_returns_2011_2025.csv'}, 'id': 'toolu_01NtNE2eosNVTzgv1YaCXxr7', 'type': 'tool_call'}, {'name': 'load_returns_csv', 'args': {'filename': 'top_etf_returns_2020_2025.csv'}, 'id': 'toolu_01C5C1Sn6vdBJfUYeganuGJD', 'type': 'tool_call'}]
[LOG] After model - Tool calls requested: [{'name': 'standardize_percent_representations', 'args': {'filename': 'vfiax_returns_2011_2025.csv', 'column_names': 'Capital return by NAV,Income return by NAV,Total return by NAV,Benchmark'}, 'id': 'toolu_017P8ThTzEcTP4qs8dHyJSZf', 'type': 'tool_call'}, {'name': 'standardi