# EcoHome Energy Advisor - Agent Run & Evaluation

In this notebook, you'll run the Energy Advisor agent with various real-world scenarios and see how it helps customers optimize their energy usage.

## Learning Objectives
- Create the agent's instructions
- Run the Energy Advisor with different types of questions
- Evaluate response quality and accuracy
- Measure tool usage effectiveness
- Identify areas for improvement
- Implement evaluation metrics

## Evaluation Criteria
- **Accuracy**: Correct information and calculations
- **Relevance**: Responses address the user's question
- **Completeness**: Comprehensive answers with actionable advice
- **Tool Usage**: Appropriate use of available tools
- **Reasoning**: Clear explanation of recommendations


## 1. Import and Initialize

In [1]:
from datetime import datetime
from agent import Agent

ImportError: cannot import name 'create_react_agent' from 'langgraph.prebuilt' (/home/marius/Documents/Udacity Submissions/Agentic_AI_with_LangChain_and_LangGraph/Energy_Advisor/.venv/lib/python3.12/site-packages/langgraph/prebuilt/__init__.py)

In [None]:
## Agent instructions

ECOHOME_SYSTEM_PROMPT = """
You are EcoHome, a proactive residential energy advisor for homeowners and renters.
Role: deliver actionable, data-backed recommendations that reduce costs and improve energy efficiency.

Steps to follow:
1) Clarify location, timeframe, and devices if missing; state any assumptions.
2) Pull relevant data using tools: weather for solar/thermal context, electricity prices for time-of-use windows, usage and solar history for trends, recent summary when timeframe is unclear, and energy tips for best practices.
3) Analyze patterns (peaks, off-peak windows, forecasted conditions) and decide the best actions.
4) Quantify impact (kWh and USD) with calculate_energy_savings when numbers are available; otherwise give conservative ranges.
5) Present 2-4 prioritized recommendations with reasoning and next steps; note gaps and ask one concise follow-up question if needed.

Key capabilities:
- get_weather_forecast: assess upcoming conditions and solar potential.
- get_electricity_prices: identify off-peak vs peak hours for load shifting.
- query_energy_usage / query_solar_generation: inspect historical consumption and production.
- get_recent_energy_summary: get a quick view when the user provides little context.
- search_energy_tips: retrieve best practices via RAG.
- calculate_energy_savings: quantify savings for proposed actions.

Recommendations guidance:
- Tie every suggestion to retrieved data (price periods, forecast, usage patterns) and make them specific and time-bound.
- Prefer scheduling and load shifting to cheaper hours; suggest thermostat, EV, appliance, and solar-usage tweaks.
- Include expected savings and assumptions; provide quick wins plus one longer-term improvement when relevant.
- If data is missing, state the assumption and request the needed detail succinctly.

Example questions you handle:
- "Given this week's forecast, when should I run my dishwasher to save the most?"
- "How can I cut my EV charging costs in San Diego tomorrow?"
- "Review my past 7 days of usage and suggest ways to reduce peak load."
- "Compare my solar generation last week to expected weather and give optimizations."

Respond concisely, show key tool findings briefly, then deliver the final plan.
"""


In [None]:
ecohome_agent = Agent(
    instructions=ECOHOME_SYSTEM_PROMPT,
)

In [None]:
response = ecohome_agent.invoke(
    question="When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
    context="Location: San Francisco, CA"
)

In [None]:
print(response["messages"][-1].content)

In [None]:
print("TOOLS:")
for msg in response["messages"]:
    obj = msg.dict()
    if obj.get("type") == "function":
        print("-", obj.get("name"))

## 2. Define Test Cases

In [13]:
# Comprehensive scenario-based test cases for the Energy Advisor
# Covers EV charging, thermostat, appliance scheduling, solar usage, and cost savings calculations.


In [14]:
test_cases = [
    {
        "id": "ev_charging_peak_avoid",
        "question": "When should I charge my EV tomorrow to avoid peak rates and use my rooftop solar?",
        "expected_tools": ["get_electricity_prices", "get_weather_forecast"],
        "expected_response": "Should recommend off-peak/night or mid-day solar window with rate comparison and solar hours.",
    },
    {
        "id": "ev_charging_weekend_home",
        "question": "It's the weekend and I'll be home all day. What is the cheapest charging window for my EV?",
        "expected_tools": ["get_electricity_prices", "get_weather_forecast"],
        "expected_response": "Should highlight weekend pricing profile and suggest a 2-3 hour window with solar alignment.",
    },
    {
        "id": "thermostat_heatwave_peak",
        "question": "How should I set my thermostat this afternoon during a heatwave to stay comfortable but minimize cost?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices", "search_energy_tips"],
        "expected_response": "Should suggest pre-cooling before peak, target temp band, and ventilation/humidity tips.",
    },
    {
        "id": "thermostat_night_setback",
        "question": "What night-time thermostat setpoints save money without overcooling while I sleep?",
        "expected_tools": ["get_electricity_prices", "search_energy_tips"],
        "expected_response": "Should give a setback range, reference off-peak pricing, and comfort guidance.",
    },
    {
        "id": "laundry_offpeak",
        "question": "When should I run my laundry tomorrow to minimize electricity cost?",
        "expected_tools": ["get_electricity_prices", "search_energy_tips"],
        "expected_response": "Should recommend an off-peak window and mention load shifting benefits.",
    },
    {
        "id": "dishwasher_solar_midday",
        "question": "I want to run the dishwasher using my solar. What time window is best tomorrow?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices", "query_solar_generation"],
        "expected_response": "Should pick a sunny mid-day slot referencing solar output and any peak price overlap.",
    },
    {
        "id": "solar_self_consumption",
        "question": "How do I maximize solar self-consumption tomorrow afternoon to reduce grid draw?",
        "expected_tools": ["get_weather_forecast", "query_solar_generation", "get_recent_energy_summary"],
        "expected_response": "Should suggest shifting flexible loads into high-irradiance hours with expected kWh impact.",
    },
    {
        "id": "ev_vs_public_charger_savings",
        "question": "How much do I save charging my EV at home off-peak versus a public charger at $0.35/kWh?",
        "expected_tools": ["calculate_energy_savings", "get_electricity_prices"],
        "expected_response": "Should compute $/kWh delta, show savings per session, and yearly projection.",
    },
    {
        "id": "thermostat_savings_delta",
        "question": "Estimate the savings if I raise my cooling setpoint by 2°F for 8 hours a day.",
        "expected_tools": ["calculate_energy_savings", "search_energy_tips"],
        "expected_response": "Should quantify kWh and $ savings with the adjusted setpoint assumption.",
    },
    {
        "id": "daily_schedule_combo",
        "question": "Give me a day schedule for EV charging, dishwasher, and dryer to minimize cost and use solar.",
        "expected_tools": ["get_electricity_prices", "get_weather_forecast", "get_recent_energy_summary", "search_energy_tips"],
        "expected_response": "Should provide a staggered schedule with peak avoidance and solar-aware timing per device.",
    },
]

if len(test_cases) < 10:
    raise ValueError("You MUST have at least 10 test cases")



## 3. Run Agent Tests

In [15]:
CONTEXT = "Location: San Francisco, CA"

In [16]:
# Run the agent tests
# For each test case, call the agent and collect the response
# Store results for evaluation

print("=== Running Agent Tests ===")
test_results = []

for i, test_case in enumerate(test_cases):
    print(f"\nTest {i+1}: {test_case['id']}")
    print(f"Question: {test_case['question']}")
    print("-" * 50)
    
    try:
        # Call the agent
        response = ecohome_agent.invoke(
            question=test_case['question'],
            context=CONTEXT
        )
        
        # Store the result
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': response,
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat()
        }
        test_results.append(result)
                
    except Exception as e:
        print(f"Error: {e}")
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': f"Error: {str(e)}",
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat(),
            'error': str(e)
        }
        test_results.append(result)

print(f"\nCompleted {len(test_results)} tests")


=== Running Agent Tests ===

Test 1: ev_charging_peak_avoid
Question: When should I charge my EV tomorrow to avoid peak rates and use my rooftop solar?
--------------------------------------------------

Test 2: ev_charging_weekend_home
Question: It's the weekend and I'll be home all day. What is the cheapest charging window for my EV?
--------------------------------------------------

Test 3: thermostat_heatwave_peak
Question: How should I set my thermostat this afternoon during a heatwave to stay comfortable but minimize cost?
--------------------------------------------------

Test 4: thermostat_night_setback
Question: What night-time thermostat setpoints save money without overcooling while I sleep?
--------------------------------------------------


Number of requested results 5 is greater than number of elements in index 4, updating n_results = 4



Test 5: laundry_offpeak
Question: When should I run my laundry tomorrow to minimize electricity cost?
--------------------------------------------------

Test 6: dishwasher_solar_midday
Question: I want to run the dishwasher using my solar. What time window is best tomorrow?
--------------------------------------------------

Test 7: solar_self_consumption
Question: How do I maximize solar self-consumption tomorrow afternoon to reduce grid draw?
--------------------------------------------------

Test 8: ev_vs_public_charger_savings
Question: How much do I save charging my EV at home off-peak versus a public charger at $0.35/kWh?
--------------------------------------------------

Test 9: thermostat_savings_delta
Question: Estimate the savings if I raise my cooling setpoint by 2°F for 8 hours a day.
--------------------------------------------------

Test 10: daily_schedule_combo
Question: Give me a day schedule for EV charging, dishwasher, and dryer to minimize cost and use solar.
--

In [17]:
test_results

[{'test_id': 'ev_charging_peak_avoid',
  'question': 'When should I charge my EV tomorrow to avoid peak rates and use my rooftop solar?',
  'response': {'messages': [SystemMessage(content='\nYou are EcoHome, a proactive residential energy advisor for homeowners and renters.\nRole: deliver actionable, data-backed recommendations that reduce costs and improve energy efficiency.\n\nSteps to follow:\n1) Clarify location, timeframe, and devices if missing; state any assumptions.\n2) Pull relevant data using tools: weather for solar/thermal context, electricity prices for time-of-use windows, usage and solar history for trends, recent summary when timeframe is unclear, and energy tips for best practices.\n3) Analyze patterns (peaks, off-peak windows, forecasted conditions) and decide the best actions.\n4) Quantify impact (kWh and USD) with calculate_energy_savings when numbers are available; otherwise give conservative ranges.\n5) Present 2-4 prioritized recommendations with reasoning and ne

## 4. Evaluate Responses

In [None]:
# TODO: Implement evaluation functions
# Create functions to evaluate:
# - Final Response
# - Tool usage

In [None]:
# TODO: Create a response evaluator
def evaluate_response(question, final_response, expected_response):
    """Evaluate a single response against expected response"""
    pass

In [None]:
# TODO: Create a tool usage evaluator
def evaluate_tool_usage(messages, expected_tools):
    """Evaluate if the right tools were used"""
    pass

In [None]:
# TODO: Generate a comprehensive evaluation report
# Calculate overall scores and metrics
# Identify strengths and weaknesses
# Provide recommendations for improvement
def generate_evaluation_report():
    pass