# Evaluating CrewAI Agent

This notebook demonstrates how to test and evaluate the CrewAI agent.

## Prerequisites

- Google Cloud project with Vertex AI enabled
- Authenticated with `gcloud auth application-default login`
- Optional: GOOGLE_API_KEY and GOOGLE_CSE_ID for real web search

## Setup and Authentication

In [None]:
import os
import google.auth
import sys

# Authenticate
credentials, project_id = google.auth.default()
print(f"Project ID: {project_id}")

# Set environment variables
os.environ["GOOGLE_CLOUD_PROJECT"] = project_id

# Check for Google Search API credentials
if os.getenv("GOOGLE_API_KEY") and os.getenv("GOOGLE_CSE_ID"):
    print("✓ Google Search API credentials found")
else:
    print("⚠ Google Search API credentials not set")
    print("Web search will use mock responses")
    print("\nTo enable real search:")
    print("  1. Get API key: https://console.cloud.google.com/apis/credentials")
    print("  2. Create CSE: https://programmablesearchengine.google.com/")
    print("  3. Set: GOOGLE_API_KEY and GOOGLE_CSE_ID")

## Import Agent

In [None]:
sys.path.append('../')

from {{cookiecutter.agent_directory}}.agent import run_agent, create_crew, research_agent

## Define Test Queries

In [None]:
# Define test queries
test_queries = [
    "What time is it?",
    "What are the latest developments in generative AI?",
    "What is CrewAI framework?",
    "What are the benefits of using Google Vertex AI?",
]

## Run Test Queries

In [None]:
import pandas as pd

# Run queries and collect results
results = []
for query in test_queries:
    print(f"\n{'='*60}")
    print(f"Query: {query}")
    print(f"{'='*60}")
    
    try:
        response = run_agent(query)
        print(f"Response: {response}\n")
        
        results.append({
            "query": query,
            "response": response,
            "response_length": len(response),
            "success": True
        })
    except Exception as e:
        print(f"Error: {e}\n")
        results.append({
            "query": query,
            "response": f"Error: {e}",
            "response_length": 0,
            "success": False
        })

## Analyze Results

In [None]:
# Create results dataframe
df = pd.DataFrame(results)

print("\nResults Summary:")
print(f"Total queries: {len(df)}")
print(f"Successful: {df['success'].sum()}")
print(f"Failed: {(~df['success']).sum()}")
print(f"\nAverage response length: {df['response_length'].mean():.0f} characters")

display(df)

## Agent Inspection

In [None]:
# Inspect agent configuration
print("Agent Configuration:")
print(f"Role: {research_agent.role}")
print(f"Goal: {research_agent.goal}")
print(f"Number of tools: {len(research_agent.tools)}")
print(f"Tools: {[tool.name for tool in research_agent.tools]}")
print(f"Allow delegation: {research_agent.allow_delegation}")
print(f"Verbose: {research_agent.verbose}")

## Performance Testing

Measure response time for queries.

In [None]:
import time

# Test query performance
test_query = "What time is it?"

print(f"Testing query: {test_query}")
start_time = time.time()
response = run_agent(test_query)
end_time = time.time()

print(f"\nResponse time: {end_time - start_time:.2f} seconds")
print(f"Response: {response}")

## Analysis and Insights

### Key Observations:

1. **Response Quality**: Evaluate if responses are accurate and helpful
2. **Response Time**: Check if performance is acceptable
3. **Tool Usage**: Verify agent uses appropriate tools for each query
4. **Error Handling**: Assess how agent handles edge cases

### Next Steps:

- Fine-tune agent parameters (temperature, model selection)
- Add more specialized tools
- Implement caching for frequently asked questions
- Deploy to production environment