# EcoHome Energy Advisor - Agent Run & Evaluation

In this notebook, you'll run the Energy Advisor agent with various real-world scenarios and see how it helps customers optimize their energy usage.

## Learning Objectives
- Create the agent's instructions
- Run the Energy Advisor with different types of questions
- Evaluate response quality and accuracy
- Measure tool usage effectiveness
- Identify areas for improvement
- Implement evaluation metrics

## Evaluation Criteria
- **Accuracy**: Correct information and calculations
- **Relevance**: Responses address the user's question
- **Completeness**: Comprehensive answers with actionable advice
- **Tool Usage**: Appropriate use of available tools
- **Reasoning**: Clear explanation of recommendations


## 1. Import and Initialize

In [1]:
from datetime import datetime
from agent import Agent

In [2]:
## TODO: Create the agent's instructions

ECOHOME_SYSTEM_PROMPT = """
You are EcoHome, an AI assistant that helps users save energy and reduce their carbon footprint at home. 

Your goal is to provide practical, actionable advice based on the user's specific home setup and energy usage patterns. 
"""

In [3]:
ecohome_agent = Agent(
    instructions=ECOHOME_SYSTEM_PROMPT,
)

In [4]:
response = ecohome_agent.invoke(
    question="When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
    context="Location: San Francisco, CA"
)

In [5]:
print(response["messages"][-1].content)

To minimize costs and maximize solar power for charging your electric car tomorrow (October 7, 2023), here are the key insights:

### Electricity Pricing (Time of Use)
- **Off-Peak Rates** (Lowest Cost):
  - 12 AM - 5 AM: $0.118 to $0.136 per kWh
  - 10 PM - 11 PM: $0.136 to $0.130 per kWh

- **Peak Rates** (Highest Cost):
  - 6 AM - 9 PM: $0.201 to $0.222 per kWh

### Solar Power Generation Forecast
- **Best Solar Generation Hours**:
  - **12 PM - 1 PM**: High solar irradiance (861.8 W/m²)
  - **1 PM - 2 PM**: Moderate solar irradiance (496.6 W/m²)
  - **2 PM - 3 PM**: Lower solar irradiance (681.1 W/m²)

### Recommendations
1. **Charge During Off-Peak Hours**: 
   - The best time to charge your electric car would be during the off-peak hours of **12 AM - 5 AM** to take advantage of the lowest rates.

2. **Consider Solar Generation**:
   - If you have solar panels, consider charging your car during the peak solar generation hours of **12 PM - 2 PM**. This way, you can utilize the ener

In [6]:
print("TOOLS:")
for msg in response["messages"]:
    obj = msg.model_dump()
    if obj.get("tool_call_id"):
        print("-", msg.name)

TOOLS:
- get_electricity_prices
- get_weather_forecast


## 2. Define Test Cases

In [7]:
# TODO: Define comprehensive test cases for the Energy Advisor
# Create 10 test cases covering different scenarios:
# - EV charging optimization
# - Thermostat settings
# - Appliance scheduling
# - Solar power maximization
# - Cost savings calculations

In [8]:
test_cases = [
    {
        "id": "ev_charging_1",
        "question": "When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices"],
        "expected_response": "The response should contain time recommendation, cost analysis and solar consideration",
    },
    {
        "id": "thermostat_setting_1",
        "question": "What is the optimal thermostat setting for my home during a heatwave?",
        "expected_tools": ["get_weather_forecast", "get_energy_saving_tips"],
        "expected_response": "The response should include recommended temperature settings and energy-saving tips",
    },
    {
        "id": "appliance_scheduling_1",
        "question": "When should I run my dishwasher to save on energy costs?",
        "expected_tools": ["get_electricity_prices"],
        "expected_response": "The response should provide specific time slots for running the dishwasher",
    },
    {
        "id": "solar_power_maximization_1",
        "question": "How can I maximize my solar power usage this weekend?",
        "expected_tools": ["get_weather_forecast", "get_solar_power_tips"],
        "expected_response": "The response should include strategies for maximizing solar power usage",
    },
    {
        "id": "cost_savings_calculation_1",
        "question": "How much can I save by adjusting my thermostat by 2 degrees?",
        "expected_tools": ["calculate_energy_savings"],
        "expected_response": "The response should provide a detailed cost savings calculation",
    },
    {
        "id": "ev_charging_2",
        "question": "Is it cheaper to charge my EV at night or during the day?",
        "expected_tools": ["get_electricity_prices"],
        "expected_response": "The response should compare costs and provide a recommendation",
    },
    {
        "id": "thermostat_setting_2",
        "question": "What thermostat settings should I use in winter to save energy?",
        "expected_tools": ["get_energy_saving_tips"],
        "expected_response": "The response should include recommended temperature settings and energy-saving tips",
    },
    {
        "id": "appliance_scheduling_2",
        "question": "When is the best time to run my washing machine to save energy?",
        "expected_tools": ["get_electricity_prices"],
        "expected_response": "The response should provide specific time slots for running the washing machine",
    },
    {
        "id": "solar_power_maximization_2",
        "question": "What are the best practices for using solar power during cloudy days?",
        "expected_tools": ["get_solar_power_tips"],
        "expected_response": "The response should include strategies for maximizing solar power usage on cloudy days",
    },
    {
        "id": "cost_savings_calculation_2",
        "question": "How much can I save by using energy-efficient appliances?",
        "expected_tools": ["calculate_energy_savings"],
        "expected_response": "The response should provide a detailed cost savings calculation based on appliance efficiency",
    }
]

if len(test_cases) < 10:
    raise ValueError("You MUST have at least 10 test cases")

## 3. Run Agent Tests

In [9]:
CONTEXT = "Location: San Francisco, CA"

In [10]:
# Run the agent tests
# For each test case, call the agent and collect the response
# Store results for evaluation

print("=== Running Agent Tests ===")
test_results = []

for i, test_case in enumerate(test_cases):
    print(f"\nTest {i+1}: {test_case['id']}")
    print(f"Question: {test_case['question']}")
    print("-" * 50)
    
    try:
        # Call the agent
        response = ecohome_agent.invoke(
            question=test_case['question'],
            context=CONTEXT
        )
        
        # Store the result
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': response,
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat()
        }
        test_results.append(result)
                
    except Exception as e:
        print(f"Error: {e}")
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': f"Error: {str(e)}",
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat(),
            'error': str(e)
        }
        test_results.append(result)

print(f"\nCompleted {len(test_results)} tests")


=== Running Agent Tests ===

Test 1: ev_charging_1
Question: When should I charge my electric car tomorrow to minimize cost and maximize solar power?
--------------------------------------------------

Test 2: thermostat_setting_1
Question: What is the optimal thermostat setting for my home during a heatwave?
--------------------------------------------------

Test 2: thermostat_setting_1
Question: What is the optimal thermostat setting for my home during a heatwave?
--------------------------------------------------

Test 3: appliance_scheduling_1
Question: When should I run my dishwasher to save on energy costs?
--------------------------------------------------

Test 3: appliance_scheduling_1
Question: When should I run my dishwasher to save on energy costs?
--------------------------------------------------

Test 4: solar_power_maximization_1
Question: How can I maximize my solar power usage this weekend?
--------------------------------------------------

Test 4: solar_power_maxim

In [11]:
test_results

[{'test_id': 'ev_charging_1',
  'question': 'When should I charge my electric car tomorrow to minimize cost and maximize solar power?',
  'response': {'messages': [SystemMessage(content='Location: San Francisco, CA', additional_kwargs={}, response_metadata={}, id='64481c4f-bbb7-4c41-bfbf-fc255d89d907'),
    HumanMessage(content='When should I charge my electric car tomorrow to minimize cost and maximize solar power?', additional_kwargs={}, response_metadata={}, id='a97c5bc8-ae8a-48c2-94a3-a8fef814a0b7'),
    AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 61, 'prompt_tokens': 907, 'total_tokens': 968, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_29330a9688', 'id': 'chat

## 4. Evaluate Responses

In [12]:
# TODO: Implement evaluation functions
# Create functions to evaluate:
# - Final Response
# - Tool usage

In [13]:
# TODO: Create a response evaluator
def evaluate_response(final_response, expected_response):
    """Evaluate a single response against expected response"""
    # Simple keyword matching for demonstration
    expected_keywords = expected_response.lower().split()
    response_text = final_response.lower()
    
    match_count = sum(1 for kw in expected_keywords if kw in response_text)
    score = match_count / len(expected_keywords) if expected_keywords else 0
    
    return score

In [14]:
# TODO: Create a tool udage evaluator
def evaluate_tool_usage(messages, expected_tools):
    """Evaluate if the right tools were used"""
    print("Evaluating tool usage...")

    for msg in messages:
        print(f"Message: {msg.content}, Name: {msg.name}")

    # get all used tool names from messages
    used_tools = [msg.name for msg in messages if msg.model_dump().get("tool_call_id")]
    used_tool_set = set(used_tools)
    expected_tool_set = set(expected_tools)
    
    correct_tools = used_tool_set.intersection(expected_tool_set)
    missing_tools = expected_tool_set.difference(used_tool_set)
    extra_tools = used_tool_set.difference(expected_tool_set)
    
    return {
        "used_tools": list(used_tool_set),
        "correct_tools": list(correct_tools),
        "missing_tools": list(missing_tools),
        "extra_tools": list(extra_tools),
    }   

In [15]:
# TODO: Generate a comprehensive evaluation report
# Calculate overall scores and metrics
# Identify strengths and weaknesses
# Provide recommendations for improvement
def generate_evaluation_report(test_results):
    total_response_score = 0
    total_tool_correct = 0
    total_tool_missing = 0
    total_tool_extra = 0
    
    for result in test_results:
        # Evaluate response
        final_response = result['response']['messages'][-1].content if isinstance(result['response'], dict) else ""
        response_score = evaluate_response(
            final_response=final_response,
            expected_response=result['expected_response']
        )
        total_response_score += response_score
        
        # Evaluate tool usage
        messages = result['response']['messages'] if isinstance(result['response'], dict) else []
        tool_evaluation = evaluate_tool_usage(
            messages=messages,
            expected_tools=result['expected_tools']
        )
        
        total_tool_correct += len(tool_evaluation['correct_tools'])
        total_tool_missing += len(tool_evaluation['missing_tools'])
        total_tool_extra += len(tool_evaluation['extra_tools'])
        
        # Print individual test results
        print(f"Test ID: {result['test_id']}")
        print(f"Response Score: {response_score:.2f}")
        print(f"Tool Usage: {tool_evaluation}")
        print("-" * 50)
    
    num_tests = len(test_results)
    avg_response_score = total_response_score / num_tests if num_tests > 0 else 0
    
    print("\n=== Evaluation Summary ===")
    print(f"Average Response Score: {avg_response_score:.2f}")
    print(f"Total Correct Tools Used: {total_tool_correct}")
    print(f"Total Missing Tools: {total_tool_missing}")
    print(f"Total Extra Tools: {total_tool_extra}")

In [16]:
generate_evaluation_report(test_results)

Evaluating tool usage...
Message: Location: San Francisco, CA, Name: None
Message: When should I charge my electric car tomorrow to minimize cost and maximize solar power?, Name: None
Message: , Name: energy_advisor
Message: {"date": "2023-10-07", "pricing_type": "time_of_use", "currency": "USD", "unit": "per_kWh", "hourly_rates": [{"hour": 0, "rate": 0.107, "period": "off-peak", "demand_charge": 0.0}, {"hour": 1, "rate": 0.138, "period": "off-peak", "demand_charge": 0.0}, {"hour": 2, "rate": 0.114, "period": "off-peak", "demand_charge": 0.0}, {"hour": 3, "rate": 0.12, "period": "off-peak", "demand_charge": 0.0}, {"hour": 4, "rate": 0.138, "period": "off-peak", "demand_charge": 0.0}, {"hour": 5, "rate": 0.135, "period": "off-peak", "demand_charge": 0.0}, {"hour": 6, "rate": 0.208, "period": "peak", "demand_charge": 0.086}, {"hour": 7, "rate": 0.193, "period": "peak", "demand_charge": 0.052}, {"hour": 8, "rate": 0.198, "period": "peak", "demand_charge": 0.064}, {"hour": 9, "rate": 0.18,