# Agent Testing Notebook

This notebook tests individual modules from agent.py. 

**Workflow**: 
1. Modify agent.py functions
2. Run the reload cell below
3. Run individual test cells to validate changes

## Setup Configuration

In [68]:
# Setup environment and configuration
import os
from dotenv import load_dotenv
import dspy
import logging
import importlib
import agent
from agent import *

load_dotenv(override=True)

# Reload agent.py after making changes
importlib.reload(agent)
print("Agent module reloaded successfully!")

# Disable verbose DSPy and LiteLLM outputs
dspy.settings.configure(show_guidelines=False, show_messages=False, show_cot=False)
logging.getLogger("LiteLLM").setLevel(logging.WARNING)

# Print configuration
print(f"SMALL_MODEL: {SMALL_MODEL}")
print(f"BIG_MODEL: {BIG_MODEL}")
print(f"TEMPERATURE: {TEMPERATURE}")
print(f"MAX_TOKENS: {MAX_TOKENS}")

# Configure DSPy default
default_lm = dspy.LM(
    model=SMALL_MODEL,
    api_key=OPENROUTER_API_KEY,
    api_base=OPENROUTER_BASE_URL,
    temperature=TEMPERATURE,
    max_tokens=MAX_TOKENS
)
dspy.configure(lm=default_lm)
lead_agent = AsyncLeadAgent()

print("\nConfiguration complete!")

Agent module reloaded successfully!
SMALL_MODEL: openrouter/google/gemini-2.5-flash-lite-preview-06-17
BIG_MODEL: openrouter/openai/gpt-4.1-mini
TEMPERATURE: 1.0
MAX_TOKENS: 4000

Configuration complete!


## Test Individual Async Tools

In [69]:
# Test web_search
result = await web_search("DSPy framework", count=2)
print("=== Web Search Test ===")
print(result)

=== Web Search Test ===
Search results for 'DSPy framework':\n\n1. GitHub - stanfordnlp/dspy: DSPy: The framework for programming—not prompting—language models\n   <strong>DSPy</strong>: <strong>The</strong> <strong>framework</strong> for programming—not prompting—language models - stanfordnlp/<strong>dspy</strong>\n   https://github.com/stanfordnlp/dspy\n\n2. DSPy\n   <strong>DSPy</strong> is a declarative <strong>framework</strong> for building modular AI software.\n   https://dspy.ai/


In [70]:
# Test wikipedia_search
result = wikipedia_search("Python programming", sentences=3)
print("=== Wikipedia Search Test ===")
print(result)

=== Wikipedia Search Test ===
Wikipedia – Python (programming language)\n\nPython is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically type-checked and garbage-collected.


In [71]:
# Test async_batch_call
calls = [
    {"tool_name": "web_search", "args": {"query": "Lamine Yamal", "count": 2}},
    {"tool_name": "wikipedia_search", "args": {"query": "Desire Doue", "sentences": 2}}
]

results = await async_batch_call(calls)
print("=== Async Batch Call Test ===")
for i, result in enumerate(results):
    print(f"\nResult {i+1}:")
    print(result[:200] + "..." if len(result) > 200 else result)

=== Async Batch Call Test ===

Result 1:
web_search: Search results for 'Lamine Yamal':\n\n1. Lamine Yamal - Wikipedia\n   <strong>Lamine</strong> <strong>Yamal</strong> Nasraoui Ebana (born 13 July 2007) is a Spanish professional footballer...

Result 2:
wikipedia_search: Wikipedia – Désiré Doué\n\nDésiré Nonka-Maho Doué (French pronunciation: [deziʁe dwe]; born 3 June 2005) is a French professional footballer who plays as an attacking midfielder or w...


# Test full AsyncLeadAgent workflow with decomposition

In [72]:
# Test AsyncLeadAgent - Query Analysis
test_query = "Desire Doue vs Lamine Yamal, who is better?"
analysis_result = await lead_agent.query_analyzer.acall(query=test_query)

print("=== AsyncLeadAgent Query Analysis Test ===")
print(f"Query: {test_query}")
print(f"\nAnalysis:")
print(f"Type: {analysis_result.analysis.query_type}")
print(f"Complexity: {analysis_result.analysis.complexity}")
print(f"Main Concepts: {analysis_result.analysis.main_concepts}")
print(f"Key Entities: {analysis_result.analysis.key_entities}")
print(f"Answer Format: {analysis_result.analysis.answer_format}")

=== AsyncLeadAgent Query Analysis Test ===
Query: Desire Doue vs Lamine Yamal, who is better?

Analysis:
Type: depth_first
Complexity: complex
Main Concepts: ['Player comparison', 'Football skills', 'Performance metrics', 'Potential']
Key Entities: ['Desire Doue', 'Lamine Yamal']
Answer Format: Comparative analysis highlighting strengths, weaknesses, and overall impact of each player.


In [79]:
# Test AsyncLeadAgent - Research Planning
# Use the analysis from the previous cell
plan_result = await lead_agent.planner.acall(
    query=test_query,
    analysis=analysis_result.analysis
)

print("=== AsyncLeadAgent Research Planning Test ===")
print(f"Plan has {len(plan_result.plan.steps)} steps:")
for step in plan_result.plan.steps:
    print(f"\nStep {step.id}: {step.description}")
    print(f"  Depends on: {step.depends_on}")


=== AsyncLeadAgent Research Planning Test ===
Plan has 7 steps:

Step 1: Gather detailed statistics for Desire Doue, focusing on goals, assists, key passes, dribbling success rate, aerial duels won, defensive actions (tackles, interceptions), and passing accuracy. Use sites like FBref, WhoScored, and Transfermarkt.
  Depends on: []

Step 2: Gather detailed statistics for Lamine Yamal, focusing on goals, assists, key passes, dribbling success rate, aerial duels won, defensive actions (tackles, interceptions), and passing accuracy. Use sites like FBref, WhoScored, and Transfermarkt.
  Depends on: []

Step 3: Find in-depth analysis of Desire Doue's playing style, strengths (e.g., dribbling, passing range, creativity), and weaknesses (e.g., defensive contribution, consistency, physicality). Look for scouting reports and expert opinions.
  Depends on: []

Step 4: Find in-depth analysis of Lamine Yamal's playing style, strengths (e.g., dribbling, creativity, finishing), and weaknesses (e.g.,

In [58]:
dspy.inspect_history(n=0)





[34m[2025-07-04T11:05:06.886524][0m

[31mSystem message:[0m

Your input fields are:
1. `query` (str): The user's research query
Your output fields are:
1. `reasoning` (str): 
2. `analysis` (QueryAnalysis): Strategic analysis for delegation planning
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## query ## ]]
{query}

[[ ## reasoning ## ]]
{reasoning}

[[ ## analysis ## ]]
{analysis}        # note: the value you produce must adhere to the JSON schema: {"type": "object", "description": "Analyze and break down the user's prompt to make sure you fully understand it.", "properties": {"answer_format": {"anyOf": [{"type": "string"}, {"type": "null"}], "description": "The recommended format of the answer such as detailed report, a list of entities, an analysis of different perspectives, etc.", "title": "Answer Format"}, "complexity": {"type": "string", "description": "The complexity of the query", "enum": ["simple", "medium", "comp

In [80]:
decompose_result = await lead_agent.decomposer.acall(
    query=test_query,
    completed_results=[],
    plan=plan_result.plan,
    current_step=plan_result.plan.steps[0]
)

print("=== DecomposeToTasks Test ===")
print(f"Strategy: {decompose_result.allocation.execution_strategy}")
print(f"Max concurrent: {decompose_result.allocation.max_concurrent}")

for i, task in enumerate(decompose_result.allocation.tasks):
    print(f"\nTask {i+1} (Step {task.id}): {task.description}")
    print(f"  Tools: {task.tools_to_use}")
    print(f"  Budget: {task.tool_budget} | Complexity: {task.complexity}")

=== DecomposeToTasks Test ===
Strategy: The tasks for gathering statistics for Desire Doue can be executed in parallel as they focus on different metrics and can be sourced from the same websites.
Max concurrent: 4

Task 1 (Step 1): Gather detailed statistics for Desire Doue focusing on goals and assists from FBref, WhoScored, and Transfermarkt.
  Tools: ['web_search', 'wikipedia_search', 'parallel_search']
  Budget: 8 | Complexity: medium

Task 2 (Step 2): Gather detailed statistics for Desire Doue focusing on key passes and dribbling success rate from FBref, WhoScored, and Transfermarkt.
  Tools: ['web_search', 'wikipedia_search', 'parallel_search']
  Budget: 8 | Complexity: medium

Task 3 (Step 3): Gather detailed statistics for Desire Doue focusing on aerial duels won and defensive actions (tackles, interceptions) from FBref, WhoScored, and Transfermarkt.
  Tools: ['web_search', 'wikipedia_search', 'parallel_search']
  Budget: 8 | Complexity: medium

Task 4 (Step 4): Gather detaile

In [63]:
dspy.inspect_history(n=5)





[34m[2025-07-04T13:33:17.508503][0m

[31mSystem message:[0m

Your input fields are:
1. `query` (str): The user's research query
2. `analysis` (QueryAnalysis): Strategic analysis from previous step
3. `trajectory` (str):
Your output fields are:
1. `next_thought` (str): 
2. `next_tool_name` (Literal['parallel_search', 'quick_search', 'finish']): 
3. `next_tool_args` (dict[str, Any]):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## query ## ]]
{query}

[[ ## analysis ## ]]
{analysis}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: parallel_search; quick_search; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object", "additionalProperties": true}

[[ ## completed ## ]]
In adhering to th