# EcoHome Energy Advisor - Agent Run & Evaluation

In this notebook, you'll run the Energy Advisor agent with various real-world scenarios and see how it helps customers optimize their energy usage.

## Learning Objectives
- Create the agent's instructions
- Run the Energy Advisor with different types of questions
- Evaluate response quality and accuracy
- Measure tool usage effectiveness
- Identify areas for improvement
- Implement evaluation metrics

## Evaluation Criteria
- **Accuracy**: Correct information and calculations
- **Relevance**: Responses address the user's question
- **Completeness**: Comprehensive answers with actionable advice
- **Tool Usage**: Appropriate use of available tools
- **Reasoning**: Clear explanation of recommendations


## 1. Import and Initialize

In [1]:
from datetime import datetime
from agent import Agent

In [2]:
ECOHOME_SYSTEM_PROMPT = """You are EcoHome Energy Advisor, an expert assistant for smart-home energy optimization.

Your goals:
1. Reduce user electricity cost while keeping comfort and practicality.
2. Reduce carbon impact by prioritizing solar generation and lower-demand periods.
3. Provide clear, actionable recommendations with concrete times and expected tradeoffs.

Rules:
- Use tools whenever recommendations depend on weather, prices, historical usage, or savings math.
- Prefer specific schedules (hours/time windows), not vague guidance.
- If data is missing, state assumptions explicitly.
- Include brief reasoning: why this schedule is better.
- Include estimated savings when possible.
- Keep answers concise and practical.

When relevant, optimize around:
- EV charging
- Thermostat/HVAC operation
- Appliance scheduling (dishwasher, laundry, water heater)
- Solar self-consumption and grid export/import balance

If the user asks for multiple options, provide a ranked list with pros/cons."""

In [3]:
ecohome_agent = Agent(
    instructions=ECOHOME_SYSTEM_PROMPT,
)

In [4]:
response = ecohome_agent.invoke(
    question="When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
    context="Location: San Francisco, CA"
)

In [5]:
print(response["messages"][-1].content)

To minimize costs and maximize solar power for charging your electric vehicle (EV) tomorrow (October 7, 2023), follow this schedule:

### Recommended Charging Time:
- **Charge your EV from 10:00 AM to 2:00 PM.**

### Reasoning:
1. **Solar Generation**: 
   - Solar irradiance peaks between 10:00 AM and 2:00 PM, with the highest values around 11:00 AM (833.3 W/m²) and 12:00 PM (700.0 W/m²). This means your solar panels will generate the most energy during this window, allowing you to use more solar power for charging.

2. **Electricity Rates**:
   - The electricity rates during this time are relatively low compared to the peak hours later in the day:
     - 10:00 AM: $0.131/kWh
     - 11:00 AM: $0.1462/kWh
     - 12:00 PM: $0.1318/kWh
   - Charging during this window avoids the higher peak rates that occur in the late afternoon and evening.

### Estimated Savings:
- If you charge your EV for 4 hours (assuming a charging rate of 7 kW), you would consume approximately 28 kWh.
- Charging du

In [6]:
print("TOOLS:")
for msg in response["messages"]:
    obj = msg.model_dump()
    if obj.get("tool_call_id"):
        print("-", msg.name)

TOOLS:
- get_weather_forecast
- get_electricity_prices


## 2. Define Test Cases

In [7]:
# Test cases covering EV, thermostat, appliances, solar, and savings scenarios.

In [8]:
test_cases = [
    {
        "id": "ev_charging_1",
        "question": "When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices"],
        "expected_response": "The response should contain time recommendation, cost analysis and solar consideration",
    },
    {
        "id": "ev_charging_2",
        "question": "I need 30 kWh for my EV by 7 AM. What is the cheapest charging schedule tonight?",
        "expected_tools": ["get_electricity_prices", "calculate_energy_savings"],
        "expected_response": "The response should include overnight off-peak timing and cost estimate",
    },
    {
        "id": "thermostat_1",
        "question": "What thermostat setting should I use Wednesday afternoon if prices spike?",
        "expected_tools": ["get_electricity_prices", "get_weather_forecast"],
        "expected_response": "The response should include temperature setpoint strategy and peak pricing avoidance",
    },
    {
        "id": "thermostat_2",
        "question": "How can I pre-cool my home to reduce HVAC costs during evening peak hours?",
        "expected_tools": ["get_electricity_prices", "get_weather_forecast"],
        "expected_response": "The response should include pre-cooling window, peak-hour behavior, and comfort tradeoff",
    },
    {
        "id": "appliance_1",
        "question": "When should I run my dishwasher and laundry tomorrow for the lowest bill?",
        "expected_tools": ["get_electricity_prices"],
        "expected_response": "The response should recommend specific off-peak times for both appliances",
    },
    {
        "id": "appliance_2",
        "question": "I can run my water heater for only 3 hours today. Which hours are best?",
        "expected_tools": ["get_electricity_prices"],
        "expected_response": "The response should include a 3-hour schedule based on low prices",
    },
    {
        "id": "solar_1",
        "question": "What is the best time tomorrow to run heavy loads to use maximum solar generation?",
        "expected_tools": ["get_weather_forecast"],
        "expected_response": "The response should identify midday solar window and suitable loads",
    },
    {
        "id": "solar_2",
        "question": "Should I delay EV charging to noon tomorrow if it is sunny?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices"],
        "expected_response": "The response should compare solar availability vs electricity rates",
    },
    {
        "id": "history_1",
        "question": "Based on my last 7 days of usage, what are 3 ways to reduce consumption?",
        "expected_tools": ["query_energy_usage", "search_energy_tips"],
        "expected_response": "The response should reference usage patterns and provide 3 actionable recommendations",
    },
    {
        "id": "savings_1",
        "question": "How much can I save if I reduce HVAC use from 18 kWh/day to 14 kWh/day?",
        "expected_tools": ["calculate_energy_savings"],
        "expected_response": "The response should include daily and annual savings estimates",
    },
    {
        "id": "summary_1",
        "question": "Give me a summary of my energy usage and solar generation over the last 24 hours.",
        "expected_tools": ["get_recent_energy_summary"],
        "expected_response": "The response should include total consumption, cost, device breakdown, and solar generation",
    },
    {
        "id": "tips_1",
        "question": "What are the best energy-saving tips for running my HVAC system more efficiently?",
        "expected_tools": ["search_energy_tips"],
        "expected_response": "The response should include specific HVAC tips like filter changes, thermostat settings, and duct sealing",
    },
]

if len(test_cases) < 10:
    raise ValueError("You MUST have at least 10 test cases")

## 3. Run Agent Tests

In [9]:
CONTEXT = "Location: San Francisco, CA"

In [10]:
# Run the agent tests
# For each test case, call the agent and collect the response
# Store results for evaluation

print("=== Running Agent Tests ===")
test_results = []

for i, test_case in enumerate(test_cases):
    print(f"\nTest {i+1}: {test_case['id']}")
    print(f"Question: {test_case['question']}")
    print("-" * 50)
    
    try:
        # Call the agent
        response = ecohome_agent.invoke(
            question=test_case['question'],
            context=CONTEXT
        )
        
        # Store the result
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': response,
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat()
        }
        test_results.append(result)
                
    except Exception as e:
        print(f"Error: {e}")
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': f"Error: {str(e)}",
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat(),
            'error': str(e)
        }
        test_results.append(result)

print(f"\nCompleted {len(test_results)} tests")


=== Running Agent Tests ===

Test 1: ev_charging_1
Question: When should I charge my electric car tomorrow to minimize cost and maximize solar power?
--------------------------------------------------

Test 2: ev_charging_2
Question: I need 30 kWh for my EV by 7 AM. What is the cheapest charging schedule tonight?
--------------------------------------------------

Test 3: thermostat_1
Question: What thermostat setting should I use Wednesday afternoon if prices spike?
--------------------------------------------------

Test 4: thermostat_2
Question: How can I pre-cool my home to reduce HVAC costs during evening peak hours?
--------------------------------------------------

Test 5: appliance_1
Question: When should I run my dishwasher and laundry tomorrow for the lowest bill?
--------------------------------------------------

Test 6: appliance_2
Question: I can run my water heater for only 3 hours today. Which hours are best?
--------------------------------------------------

Test 7: 

In [11]:
test_results

[{'test_id': 'ev_charging_1',
  'question': 'When should I charge my electric car tomorrow to minimize cost and maximize solar power?',
  'response': {'messages': [SystemMessage(content='Location: San Francisco, CA', additional_kwargs={}, response_metadata={}, id='c32e0b44-fb3e-4566-a089-b9eaf386925a'),
    HumanMessage(content='When should I charge my electric car tomorrow to minimize cost and maximize solar power?', additional_kwargs={}, response_metadata={}, id='f5b7dc5c-e511-4139-9783-7696dea2a4dd'),
    AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 61, 'prompt_tokens': 1048, 'total_tokens': 1109, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 1024}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_373a14eb6f', 'id': 

## 4. Evaluate Responses

In [12]:
def _extract_tool_names(messages):
    tool_names = []
    for msg in messages:
        obj = msg.model_dump() if hasattr(msg, "model_dump") else {}
        name = getattr(msg, "name", None) or obj.get("name")
        if obj.get("tool_call_id") and name:
            tool_names.append(name)
    return tool_names

In [13]:
import re

def evaluate_response(question, final_response, expected_response):
    """
    Evaluate a single response on four dimensions:
    ACCURACY, RELEVANCE, COMPLETENESS, and USEFULNESS.
    Each dimension is scored 0.0 to 1.0.
    """
    text = (final_response or "").lower()
    question_lower = (question or "").lower()
    expected_lower = (expected_response or "").lower()

    # --- ACCURACY (0-1) ---
    # Does the response contain concrete data: numbers, units, specific times?
    accuracy_score = 0.0
    accuracy_notes = []

    has_numbers = bool(re.search(r'\d+\.?\d*', text))
    has_units = any(u in text for u in ["kwh", "kw", "$", "usd", "°f", "°c", "%"])
    has_time_refs = any(t in text for t in ["am", "pm", "midnight", "noon", "hour", "o'clock"])

    if has_numbers:
        accuracy_score += 0.4
        accuracy_notes.append("Contains numeric data")
    if has_units:
        accuracy_score += 0.3
        accuracy_notes.append("Includes measurement units")
    if has_time_refs:
        accuracy_score += 0.3
        accuracy_notes.append("References specific times")
    if not accuracy_notes:
        accuracy_notes.append("Missing concrete data points")

    # --- RELEVANCE (0-1) ---
    # Does the response address the key terms from the question?
    relevance_score = 0.0
    relevance_notes = []

    question_keywords = [w for w in re.findall(r'\b\w{4,}\b', question_lower)
                         if w not in {"should", "what", "when", "which", "would", "could", "about", "that", "this", "from", "with", "have", "your"}]
    if question_keywords:
        matched_keywords = sum(1 for kw in question_keywords if kw in text)
        relevance_score = round(min(1.0, matched_keywords / max(1, len(question_keywords) * 0.5)), 2)

    # Check alignment with expected response themes
    expected_tokens = [w for w in re.findall(r'\b\w{4,}\b', expected_lower)]
    if expected_tokens:
        matched_expected = sum(1 for t in expected_tokens[:8] if t in text)
        expected_alignment = min(1.0, matched_expected / max(1, min(len(expected_tokens), 8) * 0.4))
        relevance_score = round(min(1.0, (relevance_score + expected_alignment) / 2), 2)

    if relevance_score >= 0.7:
        relevance_notes.append("Strongly addresses the question")
    elif relevance_score >= 0.4:
        relevance_notes.append("Partially addresses the question")
    else:
        relevance_notes.append("Weak alignment with the question")

    # --- COMPLETENESS (0-1) ---
    # Is the response detailed enough with actionable guidance and reasoning?
    completeness_score = 0.0
    completeness_notes = []

    if len(text) > 300:
        completeness_score += 0.4
        completeness_notes.append("Sufficiently detailed response")
    elif len(text) > 150:
        completeness_score += 0.2
        completeness_notes.append("Moderate detail level")
    else:
        completeness_notes.append("Response may be too brief")

    action_keywords = ["recommend", "should", "schedule", "suggest", "advise", "set", "run", "charge", "avoid"]
    if any(k in text for k in action_keywords):
        completeness_score += 0.3
        completeness_notes.append("Contains actionable guidance")
    else:
        completeness_notes.append("Missing actionable guidance")

    reasoning_keywords = ["because", "since", "reason", "due to", "this means", "therefore", "result"]
    if any(k in text for k in reasoning_keywords):
        completeness_score += 0.3
        completeness_notes.append("Includes reasoning/explanation")
    else:
        completeness_notes.append("Missing reasoning")

    # --- USEFULNESS (0-1) ---
    # Does the response provide practical, usable recommendations?
    usefulness_score = 0.0
    usefulness_notes = []

    time_patterns = bool(re.search(r'\d{1,2}\s*(am|pm|:\d{2})', text))
    if time_patterns:
        usefulness_score += 0.35
        usefulness_notes.append("Provides specific time schedules")
    else:
        usefulness_notes.append("Missing specific time recommendations")

    cost_keywords = ["save", "cost", "saving", "cheaper", "expensive", "price", "$", "bill"]
    if any(k in text for k in cost_keywords):
        usefulness_score += 0.35
        usefulness_notes.append("Includes cost/savings information")
    else:
        usefulness_notes.append("Missing cost/savings context")

    practical_keywords = ["tip", "step", "first", "then", "option", "alternative", "instead"]
    if any(k in text for k in practical_keywords):
        usefulness_score += 0.3
        usefulness_notes.append("Offers practical steps or alternatives")
    else:
        usefulness_notes.append("Could offer more practical alternatives")

    overall_score = round((accuracy_score + relevance_score + completeness_score + usefulness_score) / 4, 2)

    return {
        "accuracy": round(accuracy_score, 2),
        "relevance": round(relevance_score, 2),
        "completeness": round(completeness_score, 2),
        "usefulness": round(usefulness_score, 2),
        "overall": overall_score,
        "passed": overall_score >= 0.5,
        "notes": {
            "accuracy": accuracy_notes,
            "relevance": relevance_notes,
            "completeness": completeness_notes,
            "usefulness": usefulness_notes,
        },
    }

In [14]:
def evaluate_tool_usage(messages, expected_tools):
    """Evaluate if the right tools were used."""
    used_tools = _extract_tool_names(messages)
    used_set = set(used_tools)
    expected_set = set(expected_tools or [])

    matched = sorted(list(expected_set.intersection(used_set)))
    missing = sorted(list(expected_set - used_set))
    extra = sorted(list(used_set - expected_set))

    precision = len(matched) / max(1, len(used_set))
    recall = len(matched) / max(1, len(expected_set))
    f1 = (2 * precision * recall / (precision + recall)) if (precision + recall) else 0

    return {
        "used_tools": used_tools,
        "expected_tools": list(expected_set),
        "matched": matched,
        "missing": missing,
        "extra": extra,
        "precision": round(precision, 2),
        "recall": round(recall, 2),
        "f1": round(f1, 2),
        "passed": recall >= 0.7,
    }

In [15]:
def generate_evaluation_report():
    """Generate evaluation metrics across all test_results with overall scores."""
    report_rows = []
    all_tools_used = set()
    all_tools_expected = set()

    for result in test_results:
        if isinstance(result.get("response"), dict):
            messages = result["response"].get("messages", [])
            final_response = messages[-1].content if messages else ""
        else:
            messages = []
            final_response = str(result.get("response", ""))

        response_eval = evaluate_response(
            question=result["question"],
            final_response=final_response,
            expected_response=result["expected_response"],
        )
        tool_eval = evaluate_tool_usage(messages, result["expected_tools"])

        all_tools_used.update(tool_eval["used_tools"])
        all_tools_expected.update(tool_eval["expected_tools"])

        report_rows.append({
            "test_id": result["test_id"],
            "accuracy": response_eval["accuracy"],
            "relevance": response_eval["relevance"],
            "completeness": response_eval["completeness"],
            "usefulness": response_eval["usefulness"],
            "overall": response_eval["overall"],
            "response_passed": response_eval["passed"],
            "tool_f1": tool_eval["f1"],
            "tool_recall": tool_eval["recall"],
            "tool_passed": tool_eval["passed"],
            "used_tools": tool_eval["used_tools"],
            "missing_tools": tool_eval["missing"],
        })

    total = len(report_rows)

    # Overall scores across all tests
    avg_accuracy = round(sum(r["accuracy"] for r in report_rows) / max(1, total), 2)
    avg_relevance = round(sum(r["relevance"] for r in report_rows) / max(1, total), 2)
    avg_completeness = round(sum(r["completeness"] for r in report_rows) / max(1, total), 2)
    avg_usefulness = round(sum(r["usefulness"] for r in report_rows) / max(1, total), 2)
    avg_overall = round(sum(r["overall"] for r in report_rows) / max(1, total), 2)
    avg_tool_f1 = round(sum(r["tool_f1"] for r in report_rows) / max(1, total), 2)
    pass_rate = round(sum(1 for r in report_rows if r["response_passed"] and r["tool_passed"]) / max(1, total), 2)

    # Tool coverage
    tool_coverage = sorted(all_tools_used)
    missing_from_any = sorted(all_tools_expected - all_tools_used)

    summary = {
        "total_tests": total,
        "overall_scores": {
            "accuracy": avg_accuracy,
            "relevance": avg_relevance,
            "completeness": avg_completeness,
            "usefulness": avg_usefulness,
            "combined_overall": avg_overall,
        },
        "tool_metrics": {
            "avg_tool_f1": avg_tool_f1,
            "tools_used": tool_coverage,
            "tools_never_called": missing_from_any,
        },
        "pass_rate": pass_rate,
        "failed_tests": [r["test_id"] for r in report_rows if not (r["response_passed"] and r["tool_passed"])],
        "details": report_rows,
    }

    return summary

In [16]:
# Generate and display the evaluation report
evaluation_report = generate_evaluation_report()

# --- Overall Combined Scores ---
print("=" * 60)
print("         OVERALL EVALUATION SCORES (All Test Cases)")
print("=" * 60)
scores = evaluation_report["overall_scores"]
print(f"  Accuracy:      {scores['accuracy']:.2f} / 1.00")
print(f"  Relevance:     {scores['relevance']:.2f} / 1.00")
print(f"  Completeness:  {scores['completeness']:.2f} / 1.00")
print(f"  Usefulness:    {scores['usefulness']:.2f} / 1.00")
print(f"  ---")
print(f"  Combined Overall Score:  {scores['combined_overall']:.2f} / 1.00")
print(f"  Overall Pass Rate:       {evaluation_report['pass_rate']:.0%}")

# --- Tool Coverage ---
tools = evaluation_report["tool_metrics"]
print(f"\n  Tool F1 (avg):  {tools['avg_tool_f1']:.2f}")
print(f"  Tools Used:     {', '.join(tools['tools_used'])}")
if tools["tools_never_called"]:
    print(f"  Tools Missing:  {', '.join(tools['tools_never_called'])}")
else:
    print(f"  Tools Missing:  None — all expected tools were called")

# --- Per-Test Breakdown ---
print("\n" + "=" * 60)
print("         PER-TEST BREAKDOWN")
print("=" * 60)
print(f"{'Test ID':<18} {'Acc':>5} {'Rel':>5} {'Comp':>5} {'Use':>5} {'Overall':>8} {'Tools':>6} {'Status':>7}")
print("-" * 60)
for d in evaluation_report["details"]:
    status = "PASS" if d["response_passed"] and d["tool_passed"] else "FAIL"
    print(f"{d['test_id']:<18} {d['accuracy']:>5.2f} {d['relevance']:>5.2f} {d['completeness']:>5.2f} {d['usefulness']:>5.2f} {d['overall']:>8.2f} {d['tool_f1']:>6.2f} {status:>7}")

if evaluation_report["failed_tests"]:
    print(f"\nFailed: {', '.join(evaluation_report['failed_tests'])}")

# --- Strengths ---
print("\n" + "=" * 60)
print("         STRENGTHS")
print("=" * 60)
strengths = []
if scores["accuracy"] >= 0.7:
    strengths.append("High accuracy: responses consistently include concrete data (numbers, units, specific times).")
if scores["relevance"] >= 0.7:
    strengths.append("Strong relevance: responses directly address user questions and align with expected outcomes.")
if scores["completeness"] >= 0.6:
    strengths.append("Good completeness: responses are detailed with actionable guidance and reasoning.")
if scores["usefulness"] >= 0.6:
    strengths.append("Practical usefulness: responses include specific schedules and cost/savings estimates.")
if tools["avg_tool_f1"] >= 0.7:
    strengths.append("Effective tool usage: the agent selects appropriate tools for each query with high F1.")
if not tools["tools_never_called"]:
    strengths.append("Full tool coverage: all available tools are exercised across the test suite.")
if not strengths:
    strengths.append("The agent produces functional responses for all test cases without errors.")
for s in strengths:
    print(f"  + {s}")

# --- Weaknesses ---
print("\n" + "=" * 60)
print("         WEAKNESSES")
print("=" * 60)
weaknesses = []
if scores["accuracy"] < 0.7:
    weaknesses.append("Accuracy could improve: some responses lack specific numbers or measurable data points.")
if scores["relevance"] < 0.7:
    weaknesses.append("Relevance gaps: some responses drift from the core question or miss expected themes.")
if scores["completeness"] < 0.6:
    weaknesses.append("Completeness issues: some responses are brief or lack reasoning for recommendations.")
if scores["usefulness"] < 0.6:
    weaknesses.append("Usefulness gaps: some responses miss specific time schedules or cost estimates.")
if tools["avg_tool_f1"] < 0.7:
    weaknesses.append(f"Tool selection imprecise: average F1 is {tools['avg_tool_f1']:.2f}; agent sometimes skips expected tools or calls extra ones.")
if tools["tools_never_called"]:
    weaknesses.append(f"Some tools never called: {', '.join(tools['tools_never_called'])}.")
if not weaknesses:
    weaknesses.append("No significant weaknesses identified at current thresholds.")
for w in weaknesses:
    print(f"  - {w}")

# --- Recommendations for Improvement ---
print("\n" + "=" * 60)
print("         RECOMMENDATIONS FOR IMPROVEMENT")
print("=" * 60)
recommendations = [
    "1. Enhance the system prompt to explicitly instruct the agent to use the savings calculator tool when comparing usage scenarios, so it provides precise dollar estimates rather than rough approximations.",
    "2. Add prompt guidance for the agent to always check weather forecasts alongside electricity prices for thermostat and HVAC questions, since weather conditions directly affect optimal setpoints.",
    "3. Encourage the agent to cross-reference historical usage data with energy-saving tips (RAG) when giving reduction advice, ensuring recommendations are personalized to the user's actual patterns.",
    "4. Consider adding few-shot examples to the system prompt demonstrating ideal tool combinations for common query types (e.g., EV charging should use weather + pricing + savings calculator).",
    "5. Expand the test suite to cover edge cases such as cloudy-day solar queries, multi-day scheduling, and conflicting optimization goals (cost vs. comfort vs. carbon).",
]
for r in recommendations:
    print(f"  {r}")

print("\n" + "=" * 60)

         OVERALL EVALUATION SCORES (All Test Cases)
  Accuracy:      0.95 / 1.00
  Relevance:     0.96 / 1.00
  Completeness:  0.88 / 1.00
  Usefulness:    0.64 / 1.00
  ---
  Combined Overall Score:  0.86 / 1.00
  Overall Pass Rate:       83%

  Tool F1 (avg):  0.85
  Tools Used:     calculate_energy_savings, get_electricity_prices, get_recent_energy_summary, get_weather_forecast, query_energy_usage, query_solar_generation, search_energy_tips
  Tools Missing:  None — all expected tools were called

         PER-TEST BREAKDOWN
Test ID              Acc   Rel  Comp   Use  Overall  Tools  Status
------------------------------------------------------------
ev_charging_1       1.00  0.97  1.00  0.70     0.92   1.00    PASS
ev_charging_2       1.00  0.97  1.00  0.70     0.92   0.67    FAIL
thermostat_1        1.00  1.00  1.00  0.70     0.93   1.00    PASS
thermostat_2        1.00  0.97  1.00  0.70     0.92   1.00    PASS
appliance_1         1.00  1.00  1.00  0.70     0.93   0.67    PASS
appl