The first take-home project will be a chance to get some hands-on practice with the concepts of Week 1 for an agent task of your choice.

Goal:

- Choose a task involving multi-turn interaction/tool use
- Implement an agent scaffold either using an API directly, or one of the frameworks we covered
- Create a small set of "test prompts"
- Create a "reward function" to evaluate your agent
- Test your agent setup with multiple models/prompts
- Examine multiple agent outputs, identify a consistent "problem", adjust the setup (prompts/tools) OR adjust your evals to measure/address the problem

Ideas for agent tasks:
- Search agent for your favorite blog/website
- Agent which
- Agent for playing a simple board/card game
- Code agent specialized to only use a specific library, e.g. iterating on a matplotlib plot until it "looks right"
- Terminal-based chat agent with user handoff/confirmation

Ideas for reward functions:
- Format checks using regex
- Deterministic checks (parsing math answers, running code with test cases, solving a puzzle/game)
- Embedding or text overlap similarity to a "ground truth"
- LLM judges which can see the "ground truth"
- LLM judges which evaluate a set of fuzzy criteria + give scores for each

Tips:
- Start simple, get a basic version working, then ramp up complexity
- If your agent "just works" with a fairly powerful model, try it with a weaker model and see what breaks


Bonus goals:
- Try making a "parallel-friendly" version using asyncio + error handling
- Try implementing Best-of-N selection -- can your eval function match your judgment for which outputs are "best"?
- Try testing either a "multi-agent" (parallelized) version of your agent, OR a Client/Server version (e.g. MCP, A2A)


Deliverable:
- A repo, notebook, *or* short writeup detailing your setup + experimentation
- What approaches did you try?
- What roadblocks did you run into?
- Which evaluation methods worked best for your task?
- What's the smallest model that worked decently well?

In [2]:
# simplest possible LLM call
import os
from openai import OpenAI

oai = OpenAI()

response = oai.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
    ],
)
print(response.choices[0].message.content)

The capital of France is Paris.


In [4]:
import re
from openai import OpenAI

class SimpleLineageAgent:
    def __init__(self, model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
    
    def analyze_lineage(self, code: str, target_variable: str) -> str:
        """Analyze the lineage of a target variable through the code"""
        
        system_prompt = """You are a code lineage analysis expert. Your job is to trace how variables are created through data transformations.

When given code and a target variable, analyze:
1. What source tables/files were used
2. What operations were performed (filter, join, groupby, etc.)  
3. The step-by-step flow from sources to the final variable

Be clear, concise, and focus on the data flow."""

        user_prompt = f"""
Variable to analyze: {target_variable}
```
{code}
```
"""
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0
        )
        
        return response.choices[0].message.content or "No response"



In [6]:
# Simple test function

sample_code = '''
# Load data
transactions = spark.table("raw.transactions")
customers = spark.table("raw.customers")

# Filter recent transactions
recent_tx = transactions.filter(col("date") >= "2023-01-01")

# Group by customer
customer_totals = recent_tx.groupBy("customer_id").sum("amount")

# Join with customer names
final_result = customer_totals.join(customers, "customer_id")
'''
    
agent = SimpleLineageAgent()

print("=== Simple Lineage Analysis ===")
print(f"Code:\n{sample_code}")
print("\n" + "="*50)

# Test analyzing final_result
print("Analyzing 'final_result':")
result = agent.analyze_lineage(sample_code, "final_result")
print(result)

print("\n" + "-"*30)

# Test analyzing intermediate variable
print("Analyzing 'customer_totals':")
result2 = agent.analyze_lineage(sample_code, "customer_totals")
print(result2)

=== Simple Lineage Analysis ===
Code:

# Load data
transactions = spark.table("raw.transactions")
customers = spark.table("raw.customers")

# Filter recent transactions
recent_tx = transactions.filter(col("date") >= "2023-01-01")

# Group by customer
customer_totals = recent_tx.groupBy("customer_id").sum("amount")

# Join with customer names
final_result = customer_totals.join(customers, "customer_id")


Analyzing 'final_result':
To analyze the variable `final_result`, we can break down the data flow as follows:

### 1. Source Tables/Files Used
- **`raw.transactions`**: This table contains transaction data, including fields such as `date` and `amount`.
- **`raw.customers`**: This table contains customer data, including fields such as `customer_id` and customer names.

### 2. Operations Performed
- **Filter**: The `transactions` table is filtered to include only recent transactions (those with a date on or after January 1, 2023).
- **GroupBy**: The filtered transactions are grouped by `

In [7]:
# Simple test function

sample_code = '''
# Load data
transactions = spark.table("raw.transactions")
customers = spark.table("raw.customers")

# Filter recent transactions
recent_tx = transactions.filter(col("date") >= "2023-01-01")

# Group by customer
customer_totals = recent_tx.groupBy("customer_id").sum("amount")

# Join with customer names
final_result = customer_totals.join(customers, "customer_id")
'''
    
agent = SimpleLineageAgent(model="gpt-4.1-mini")

print("=== Simple Lineage Analysis ===")
print(f"Code:\n{sample_code}")
print("\n" + "="*50)

# Test analyzing final_result
print("Analyzing 'final_result':")
result = agent.analyze_lineage(sample_code, "final_result")
print(result)

print("\n" + "-"*30)

# Test analyzing intermediate variable
print("Analyzing 'customer_totals':")
result2 = agent.analyze_lineage(sample_code, "customer_totals")
print(result2)

=== Simple Lineage Analysis ===
Code:

# Load data
transactions = spark.table("raw.transactions")
customers = spark.table("raw.customers")

# Filter recent transactions
recent_tx = transactions.filter(col("date") >= "2023-01-01")

# Group by customer
customer_totals = recent_tx.groupBy("customer_id").sum("amount")

# Join with customer names
final_result = customer_totals.join(customers, "customer_id")


Analyzing 'final_result':
Analysis of variable: final_result

1. Source tables:
   - raw.transactions
   - raw.customers

2. Operations performed:
   - Load raw.transactions into `transactions`
   - Load raw.customers into `customers`
   - Filter `transactions` to keep only rows with date >= "2023-01-01" → `recent_tx`
   - Group `recent_tx` by "customer_id" and sum the "amount" column → `customer_totals`
   - Join `customer_totals` with `customers` on "customer_id" → `final_result`

3. Step-by-step flow:
   - Start with raw.transactions and raw.customers tables.
   - Filter transaction

In [None]:
from tools import CodeAnalysisTools


class ToolsLineageAgent:
    """Lineage agent with tools support"""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        from openai import OpenAI
        self.client = OpenAI()
        self.model = model
        self.tools = CodeAnalysisTools()
    
    def analyze_lineage_with_tools(self, code: str, target_variable: str) -> str:
        """Analyze lineage using both LLM and tools"""
        
        # First, use tools to extract structured information
        tool_analysis = self.tools.trace_variable_dependencies(code, target_variable)
        
        # Enhanced system prompt with tool information
        system_prompt = f"""You are a code lineage analysis expert with access to code analysis tools.

Your job is to trace how variables are created through data transformations.

Tool Analysis Results:
- Detected sources: {tool_analysis['sources']}
- Detected operations: {tool_analysis['operations']}
- Assignment lines: {tool_analysis['assignment_lines']}

Using this information and the provided code, analyze:
1. What source tables/files were used
2. What operations were performed (filter, join, groupby, etc.)  
3. The step-by-step flow from sources to the target variable

Be clear, concise, and focus on the data flow."""

        user_prompt = f"""
Variable to analyze: {target_variable}
```
{code}
```
"""
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0
        )
        
        return response.choices[0].message.content or "No response"

In [2]:
from tools import CodeAnalysisTools

sample_code = """
import pandas as pd

# Load data
sales = pd.read_csv("sales_data.csv")
customers = pd.read_csv("customers.csv")

# Process data
recent_sales = sales.query("date >= '2023-01-01'")
customer_totals = recent_sales.groupby('customer_id')['amount'].sum().reset_index()
final_result = customer_totals.merge(customers, on='customer_id')
"""

tools = CodeAnalysisTools()

print("🔧 TESTING CODE ANALYSIS TOOLS")
print("=" * 50)

print("📁 Sources found:")
sources = tools.extract_table_sources(sample_code)
print(sources)

print("\n📝 Variable assignments:")
assignments = tools.extract_variable_assignments(sample_code)
for var, lines in assignments.items():
    print(f"  {var}: lines {lines}")

print("\n⚙️ Operations found:")
operations = tools.extract_operations(sample_code)
for op in operations:
    print(f"  Line {op['line']}: {op['operation']} - {op['code']}")

print("\n🎯 Trace 'final_result':")
trace = tools.trace_variable_dependencies(sample_code, "final_result")
print(f"  Sources: {trace['sources']}")
print(f"  Operations: {trace['operations']}")
print(f"  Assignment lines: {trace['assignment_lines']}")

🔧 TESTING CODE ANALYSIS TOOLS
📁 Sources found:
['sales_data.csv', 'customers.csv']

📝 Variable assignments:
  sales: lines [5]
  customers: lines [6]
  recent_sales: lines [9]
  customer_totals: lines [10]
  final_result: lines [11]

⚙️ Operations found:
  Line 9: filter - recent_sales = sales.query("date >= '2023-01-01'")
  Line 10: groupby - customer_totals = recent_sales.groupby('customer_id')['amount'].sum().reset_index()
  Line 10: reset_index - customer_totals = recent_sales.groupby('customer_id')['amount'].sum().reset_index()
  Line 10: agg_sum - customer_totals = recent_sales.groupby('customer_id')['amount'].sum().reset_index()
  Line 10: groupBy - customer_totals = recent_sales.groupby('customer_id')['amount'].sum().reset_index()
  Line 10: sql_SUM - customer_totals = recent_sales.groupby('customer_id')['amount'].sum().reset_index()
  Line 11: merge - final_result = customer_totals.merge(customers, on='customer_id')

🎯 Trace 'final_result':
  Sources: ['sales_data.csv', 'custo

In [None]:
    # def compare_tool_vs_llm(self, code: str, target_variable: str) -> dict[str, any]:
    #     """Compare tool analysis vs LLM analysis"""

    #     # Tool analysis
    #     tool_result = self.trace_variable_dependencies(code, target_variable)

    #     # Pure LLM analysis (without tools)
    #     system_prompt = """You are a code lineage analysis expert.

    #     Analyze the code to trace how the target variable was created:
    #     1. What source tables/files were used
    #     2. What operations were performed
    #     3. The step-by-step flow"""

    #     user_prompt = f"Target variable: {target_variable}\n\nCode:\n```\n{code}\n```"

    #     llm_response = self.client.chat.completions.create(
    #         model=self.model,
    #         messages=[
    #             {"role": "system", "content": system_prompt},
    #             {"role": "user", "content": user_prompt},
    #         ],
    #         temperature=0,
    #     )

    #     # Enhanced analysis (tools + LLM)
    #     enhanced_response = self.analyze_lineage_with_tools(code, target_variable)

    #     return {
    #         "tool_analysis": tool_result,
    #         "llm_only": llm_response.choices[0].message.content,
    #         "enhanced_analysis": enhanced_response,
    #     }