# Agent Testing Notebook

This notebook tests individual modules from agent.py. 

**Workflow**: 
1. Modify agent.py functions
2. Run the reload cell below
3. Run individual test cells to validate changes

## Setup Configuration

In [53]:
# Setup environment and configuration
import os
from dotenv import load_dotenv
import dspy
import logging
import importlib
import agent
from agent import *

load_dotenv(override=True)

# Reload agent.py after making changes
importlib.reload(agent)
print("Agent module reloaded successfully!")

# Disable verbose DSPy and LiteLLM outputs
dspy.settings.configure(show_guidelines=False, show_messages=False, show_cot=False)
logging.getLogger("LiteLLM").setLevel(logging.WARNING)

# Print configuration
print(f"SMALL_MODEL: {SMALL_MODEL}")
print(f"BIG_MODEL: {BIG_MODEL}")
print(f"TEMPERATURE: {TEMPERATURE}")
print(f"MAX_TOKENS: {MAX_TOKENS}")

# Configure DSPy default
default_lm = dspy.LM(
    model=SMALL_MODEL,
    api_key=OPENROUTER_API_KEY,
    api_base=OPENROUTER_BASE_URL,
    temperature=TEMPERATURE,
    max_tokens=MAX_TOKENS
)
dspy.configure(lm=default_lm)

print("\nConfiguration complete!")

Agent module reloaded successfully!
SMALL_MODEL: openrouter/google/gemini-2.5-flash-lite-preview-06-17
BIG_MODEL: openrouter/openai/gpt-4.1-mini
TEMPERATURE: 1.0
MAX_TOKENS: 4000

Configuration complete!


## Test Individual Async Tools

In [47]:
# Test web_search
result = await web_search("DSPy framework", count=2)
print("=== Web Search Test ===")
print(result)

=== Web Search Test ===
Search results for 'DSPy framework':\n\n1. GitHub - stanfordnlp/dspy: DSPy: The framework for programming—not prompting—language models\n   <strong>DSPy</strong>: <strong>The</strong> <strong>framework</strong> for programming—not prompting—language models - stanfordnlp/<strong>dspy</strong>\n   https://github.com/stanfordnlp/dspy\n\n2. DSPy\n   <strong>DSPy</strong> is a declarative <strong>framework</strong> for building modular AI software.\n   https://dspy.ai/


In [48]:
# Test wikipedia_search
result = wikipedia_search("Python programming", sentences=3)
print("=== Wikipedia Search Test ===")
print(result)

=== Wikipedia Search Test ===
Wikipedia – Python (programming language)\n\nPython is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically type-checked and garbage-collected.


In [49]:
# Test async_batch_call
calls = [
    {"tool_name": "web_search", "args": {"query": "Lamine Yamal", "count": 2}},
    {"tool_name": "wikipedia_search", "args": {"query": "Desire Doue", "sentences": 2}}
]

results = await async_batch_call(calls)
print("=== Async Batch Call Test ===")
for i, result in enumerate(results):
    print(f"\nResult {i+1}:")
    print(result[:200] + "..." if len(result) > 200 else result)

=== Async Batch Call Test ===

Result 1:
web_search: Search results for 'Lamine Yamal':\n\n1. Lamine Yamal - Wikipedia\n   <strong>Lamine</strong> <strong>Yamal</strong> Nasraoui Ebana (born 13 July 2007) is a Spanish professional footballer...

Result 2:
wikipedia_search: Wikipedia – Désiré Doué\n\nDésiré Nonka-Maho Doué (French pronunciation: [deziʁe dwe]; born 3 June 2005) is a French professional footballer who plays as an attacking midfielder or w...


## Test Data Models

## Test DSPy Modules

In [50]:
# Test AsyncLeadAgent - Query Analysis
test_query = "Desire Doue vs Lamine Yamal, who is better?"

lead_agent = AsyncLeadAgent()
analysis_result = await lead_agent.query_analyzer.acall(query=test_query)

print("=== AsyncLeadAgent Query Analysis Test ===")
print(f"Query: {test_query}")
print(f"\nAnalysis:")
print(f"Type: {analysis_result.analysis.query_type}")
print(f"Complexity: {analysis_result.analysis.complexity}")
print(f"Main Concepts: {analysis_result.analysis.main_concepts}")
print(f"Key Entities: {analysis_result.analysis.key_entities}")
print(f"Answer Format: {analysis_result.analysis.answer_format}")

=== AsyncLeadAgent Query Analysis Test ===
Query: Desire Doue vs Lamine Yamal, who is better?

Analysis:
Type: depth_first
Complexity: complex
Main Concepts: ['Player comparison', 'Football skills', 'Performance metrics', 'Potential']
Key Entities: ['Desire Doue', 'Lamine Yamal']
Answer Format: Comparative analysis highlighting strengths, weaknesses, and overall impact of each player.


In [51]:
# Test AsyncLeadAgent - Research Planning
# Use the analysis from the previous cell
plan_result = await lead_agent.planner.acall(
    query=test_query,
    analysis=analysis_result.analysis
)

print("=== AsyncLeadAgent Research Planning Test ===")
print(f"Plan has {len(plan_result.plan.steps)} steps:")

for step in plan_result.plan.steps:
    print(f"\nStep {step.id}: {step.description}")
    print(f"  Budget: {step.budget_calls} tool calls")
    print(f"  Depends on: {step.depends_on}")

=== AsyncLeadAgent Research Planning Test ===
Plan has 4 steps:

Step 1: Gather detailed statistical comparison between Desire Doue and Lamine Yamal across various metrics (goals, assists, dribbles, key passes, defensive contributions, etc.) for the current and recent seasons. Utilize sites like FBref, Sofascore, and WhoScored for data.
  Budget: 5 tool calls
  Depends on: []

Step 2: Analyze and compare the playing styles, strengths, and weaknesses of both Desire Doue and Lamine Yamal, focusing on their offensive and defensive contributions, creative output, and overall impact on their respective teams. Look for expert tactical analyses.
  Budget: 5 tool calls
  Depends on: []

Step 3: Research and compile information on the potential and future outlook for both players, considering their age, development trajectory, and market value. Include any notable awards or recognitions received by each player.
  Budget: 5 tool calls
  Depends on: []

Step 4: Synthesize the gathered statistical

In [54]:
dspy.inspect_history(n=10)





[34m[2025-06-28T17:26:28.728910][0m

[31mSystem message:[0m

Your input fields are:
1. `query` (str): The user's research query
2. `analysis` (QueryAnalysis): Strategic analysis from previous step
3. `trajectory` (str):
Your output fields are:
1. `reasoning` (str): 
2. `plan` (ResearchPlan): Delegation plan with specific tasks for subagents
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## query ## ]]
{query}

[[ ## analysis ## ]]
{analysis}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## plan ## ]]
{plan}        # note: the value you produce must adhere to the JSON schema: {"type": "object", "$defs": {"PlanStep": {"type": "object", "description": "A single step in the research plan", "properties": {"budget_calls": {"type": "integer", "default": 5, "maximum": 10, "minimum": 1, "title": "Budget Calls"}, "depends_on": {"type": "array", "items": {"type": "integer"}, "title": "Depends On"}, "descri

In [None]:
# Test AsyncLeadAgent - Research Planning
# Use the analysis from the previous cell
plan_result = await lead_agent.planner.acall(
    query=test_query,
    analysis=analysis_result.analysis
)

print("=== AsyncLeadAgent Research Planning Test ===")
print(f"Plan has {len(plan_result.plan.steps)} steps:")

for step in plan_result.plan.steps:
    print(f"\nStep {step.id}: {step.description}")
    print(f"  Budget: {step.budget_calls} tool calls")
    print(f"  Depends on: {step.depends_on}")

=== AsyncLeadAgent Research Planning Test ===
Plan has 4 steps:

Step 1: Gather detailed statistical comparison between Desire Doue and Lamine Yamal across various metrics (goals, assists, dribbles, key passes, defensive contributions, etc.) for the current and recent seasons. Utilize sites like FBref, Sofascore, and WhoScored for data.
  Budget: 5 tool calls
  Depends on: []

Step 2: Analyze and compare the playing styles, strengths, and weaknesses of both Desire Doue and Lamine Yamal, focusing on their offensive and defensive contributions, creative output, and overall impact on their respective teams. Look for expert tactical analyses.
  Budget: 5 tool calls
  Depends on: []

Step 3: Research and compile information on the potential and future outlook for both players, considering their age, development trajectory, and market value. Include any notable awards or recognitions received by each player.
  Budget: 5 tool calls
  Depends on: []

Step 4: Synthesize the gathered statistical

In [None]:
# Test AsyncLeadAgent - Research Planning
# Use the analysis from the previous cell
plan_result = await lead_agent.planner.acall(
    query=test_query,
    analysis=analysis_result.analysis
)

print("=== AsyncLeadAgent Research Planning Test ===")
print(f"Plan has {len(plan_result.plan.steps)} steps:")

for step in plan_result.plan.steps:
    print(f"\nStep {step.id}: {step.description}")
    print(f"  Budget: {step.budget_calls} tool calls")
    print(f"  Depends on: {step.depends_on}")

=== AsyncLeadAgent Research Planning Test ===
Plan has 4 steps:

Step 1: Gather detailed statistical comparison between Desire Doue and Lamine Yamal across various metrics (goals, assists, dribbles, key passes, defensive contributions, etc.) for the current and recent seasons. Utilize sites like FBref, Sofascore, and WhoScored for data.
  Budget: 5 tool calls
  Depends on: []

Step 2: Analyze and compare the playing styles, strengths, and weaknesses of both Desire Doue and Lamine Yamal, focusing on their offensive and defensive contributions, creative output, and overall impact on their respective teams. Look for expert tactical analyses.
  Budget: 5 tool calls
  Depends on: []

Step 3: Research and compile information on the potential and future outlook for both players, considering their age, development trajectory, and market value. Include any notable awards or recognitions received by each player.
  Budget: 5 tool calls
  Depends on: []

Step 4: Synthesize the gathered statistical

In [None]:
# Test AsyncLeadAgent - Research Planning
# Use the analysis from the previous cell
plan_result = await lead_agent.planner.acall(
    query=test_query,
    analysis=analysis_result.analysis
)

print("=== AsyncLeadAgent Research Planning Test ===")
print(f"Plan has {len(plan_result.plan.steps)} steps:")

for step in plan_result.plan.steps:
    print(f"\nStep {step.id}: {step.description}")
    print(f"  Budget: {step.budget_calls} tool calls")
    print(f"  Depends on: {step.depends_on}")

=== AsyncLeadAgent Research Planning Test ===
Plan has 4 steps:

Step 1: Gather detailed statistical comparison between Desire Doue and Lamine Yamal across various metrics (goals, assists, dribbles, key passes, defensive contributions, etc.) for the current and recent seasons. Utilize sites like FBref, Sofascore, and WhoScored for data.
  Budget: 5 tool calls
  Depends on: []

Step 2: Analyze and compare the playing styles, strengths, and weaknesses of both Desire Doue and Lamine Yamal, focusing on their offensive and defensive contributions, creative output, and overall impact on their respective teams. Look for expert tactical analyses.
  Budget: 5 tool calls
  Depends on: []

Step 3: Research and compile information on the potential and future outlook for both players, considering their age, development trajectory, and market value. Include any notable awards or recognitions received by each player.
  Budget: 5 tool calls
  Depends on: []

Step 4: Synthesize the gathered statistical

In [None]:
# Test AsyncLeadAgent - Research Planning
# Use the analysis from the previous cell
plan_result = await lead_agent.planner.acall(
    query=test_query,
    analysis=analysis_result.analysis
)

print("=== AsyncLeadAgent Research Planning Test ===")
print(f"Plan has {len(plan_result.plan.steps)} steps:")

for step in plan_result.plan.steps:
    print(f"\nStep {step.id}: {step.description}")
    print(f"  Budget: {step.budget_calls} tool calls")
    print(f"  Depends on: {step.depends_on}")

=== AsyncLeadAgent Research Planning Test ===
Plan has 4 steps:

Step 1: Gather detailed statistical comparison between Desire Doue and Lamine Yamal across various metrics (goals, assists, dribbles, key passes, defensive contributions, etc.) for the current and recent seasons. Utilize sites like FBref, Sofascore, and WhoScored for data.
  Budget: 5 tool calls
  Depends on: []

Step 2: Analyze and compare the playing styles, strengths, and weaknesses of both Desire Doue and Lamine Yamal, focusing on their offensive and defensive contributions, creative output, and overall impact on their respective teams. Look for expert tactical analyses.
  Budget: 5 tool calls
  Depends on: []

Step 3: Research and compile information on the potential and future outlook for both players, considering their age, development trajectory, and market value. Include any notable awards or recognitions received by each player.
  Budget: 5 tool calls
  Depends on: []

Step 4: Synthesize the gathered statistical

In [None]:
# Test Full AsyncLeadAgent workflow
test_query_2 = "Research the latest developments in quantum computing"

analysis_full, plan_full = await lead_agent.aforward(test_query_2)

print("=== Full AsyncLeadAgent Workflow Test ===")
print(f"Query: {test_query_2}")
print(f"\n=== Analysis ===")
print(f"Type: {analysis_full.analysis.query_type}")
print(f"Complexity: {analysis_full.analysis.complexity}")
print(f"Main Concepts: {analysis_full.analysis.main_concepts}")

print(f"\n=== Plan ===")
print(f"Generated {len(plan_full.plan.steps)} steps:")
for step in plan_full.plan.steps:
    print(f"\nStep {step.id}: {step.description}")
    print(f"  Budget: {step.budget_calls} calls")

## Test Main Orchestration Functions

In [None]:
# Test run_research function
research_query = "Analyze the impact of AI on software development"

analysis_main, plan_main = await run_research(research_query, verbose=True)

print("\n=== run_research Function Test Complete ===")
print(f"Returned analysis type: {type(analysis_main)}")
print(f"Returned plan type: {type(plan_main)}")
print(f"Plan contains {len(plan_main.plan.steps)} steps")

In [None]:
# Test run_research_sync function
sync_query = "Compare Python and JavaScript for web development"

print("=== Testing Synchronous Wrapper ===")
sync_analysis, sync_plan = run_research_sync(sync_query, verbose=True)

print(f"\nSync function completed successfully!")
print(f"Analysis complexity: {sync_analysis.analysis.complexity}")
print(f"Plan steps: {len(sync_plan.plan.steps)}")

## Test Error Handling

In [None]:
# Test error handling in async_batch_call
error_calls = [
    {"tool_name": "web_search", "args": {"query": "valid search", "count": 2}},
    {"tool_name": "invalid_tool", "args": {"query": "this will fail"}},
    {"tool_name": "web_search", "args": {}},  # Missing required 'query'
]

print("=== Error Handling Test ===")
error_results = await async_batch_call(error_calls)

for i, result in enumerate(error_results):
    print(f"\nCall {i+1}:")
    if "[ERROR]" in result:
        print(f"❌ Error detected: {result[:100]}...")
    else:
        print(f"✅ Success: {result[:100]}...")

## Performance Testing

In [None]:
# Test performance - multiple parallel searches
import time

performance_calls = [
    {"tool_name": "web_search", "args": {"query": "artificial intelligence trends", "count": 2}},
    {"tool_name": "web_search", "args": {"query": "machine learning algorithms", "count": 2}},
    {"tool_name": "wikipedia_search", "args": {"query": "deep learning", "sentences": 2}},
    {"tool_name": "wikipedia_search", "args": {"query": "neural networks", "sentences": 2}}
]

print("=== Performance Test ===")
start_time = time.time()

perf_results = await async_batch_call(performance_calls)

end_time = time.time()
duration = end_time - start_time

print(f"\nCompleted {len(performance_calls)} parallel operations in {duration:.2f} seconds")
print(f"Average time per operation: {duration/len(performance_calls):.2f} seconds")
print(f"All operations completed successfully: {all('[ERROR]' not in r for r in perf_results)}")

## Tools Configuration Test

In [None]:
# Test TOOLS dictionary registration
print("=== Tools Configuration Test ===")
print(f"Registered tools: {list(TOOLS.keys())}")

for tool_name, tool in TOOLS.items():
    print(f"\n{tool_name}:")
    print(f"  Type: {type(tool)}")
    print(f"  Name: {getattr(tool, 'name', 'N/A')}")
    print(f"  Description: {getattr(tool, 'desc', 'N/A')[:50]}...")