# 📊 Model Evaluation and Performance Analysis

**Comprehensive Analysis of Neural Network Performance for Appliance Energy Prediction**

This notebook provides in-depth evaluation of our trained neural network model. You'll learn how to assess model performance, identify strengths and weaknesses, and validate the model for real-world deployment.

## 🎯 What You'll Learn
1. **Load and test** the trained neural network model
2. **Comprehensive evaluation** using multiple metrics
3. **Cross-validation** for robust performance assessment
4. **Feature importance** analysis
5. **Model interpretation** and business insights
6. **Deployment readiness** assessment

## 📊 Evaluation Approach
- **Accuracy Metrics**: R², MSE, MAE, MAPE
- **Visual Analysis**: Prediction plots, residual analysis
- **Statistical Tests**: Distribution analysis, bias detection
- **Business Metrics**: Cost implications, practical accuracy

---

In [None]:
# Import all necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Machine learning libraries
from sklearn.model_selection import cross_val_score, KFold
from sklearn.metrics import (
    mean_squared_error, mean_absolute_error, r2_score,
    mean_absolute_percentage_error
)
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras.models import load_model
import joblib

# Statistical libraries
from scipy import stats
from scipy.stats import normaltest, shapiro
import warnings
warnings.filterwarnings('ignore')

# Set style and random seeds
plt.style.use('default')
sns.set_palette('husl')
np.random.seed(42)
tf.random.set_seed(42)

# Configure display
pd.set_option('display.max_columns', None)
plt.rcParams['figure.figsize'] = (12, 8)

print("📊 MODEL EVALUATION SETUP COMPLETE!")
print("=" * 40)
print(f"🧠 TensorFlow Version: {tf.__version__}")
print(f"📈 Ready to evaluate neural network performance!")

## 1. 🔄 Loading Trained Model and Data

Let's load our trained neural network and prepare the data for comprehensive evaluation.

In [None]:
# Load the trained model and associated files
print("🔄 LOADING TRAINED MODEL AND DATA")
print("=" * 35)

try:
    # Load the trained neural network
    model = load_model('../models/appliance_energy_predictor.h5')
    print("✅ Neural network model loaded successfully!")
    
    # Load the feature scaler
    scaler = joblib.load('../models/feature_scaler.pkl')
    print("✅ Feature scaler loaded successfully!")
    
    # Load feature names
    feature_names = joblib.load('../models/feature_names.pkl')
    print(f"✅ Feature names loaded ({len(feature_names)} features)")
    
    # Load model metadata
    model_info = joblib.load('../models/model_info.pkl')
    print("✅ Model metadata loaded successfully!")
    
except Exception as e:
    print(f"❌ Error loading model files: {e}")
    print("📝 Make sure you've run the neural network training notebook first!")

# Load the original dataset
print("\n📂 Loading original dataset...")
df = pd.read_csv('../data/raw/appliance_data.csv')
print(f"✅ Dataset loaded: {df.shape[0]} records, {df.shape[1]} features")

# Display model information
print("\n🧠 MODEL INFORMATION:")
print("-" * 25)
for key, value in model_info.items():
    if key != 'feature_names':
        if isinstance(value, float):
            print(f"   📊 {key}: {value:.4f}")
        else:
            print(f"   📋 {key}: {value}")

In [None]:
# Prepare data for evaluation (same preprocessing as training)
print("🔧 PREPARING DATA FOR EVALUATION")
print("=" * 35)

# Feature engineering (same as training)
categorical_cols = ['appliance_type', 'location', 'income_level', 'season', 'usage_pattern']
numerical_cols = ['power_rating_watts', 'usage_hours_per_day', 'efficiency_rating', 
                 'appliance_age_years', 'household_size']

# Encode categorical variables
df_encoded = pd.get_dummies(df[categorical_cols], prefix=categorical_cols)
X_features = pd.concat([df[numerical_cols], df_encoded], axis=1)
y_target = df['daily_consumption_kwh']

# Ensure feature consistency with training data
X_features = X_features.reindex(columns=feature_names, fill_value=0)

print(f"✅ Features prepared: {X_features.shape}")
print(f"🎯 Target variable: {len(y_target)} samples")

# Scale features using the same scaler from training
X_scaled = scaler.transform(X_features)
print("✅ Features scaled using training scaler")

# Generate predictions for the entire dataset
print("\n🤖 Generating predictions...")
y_pred_all = model.predict(X_scaled, verbose=0).flatten()
print(f"✅ Predictions generated for {len(y_pred_all)} samples")

## 2. 📊 Comprehensive Performance Metrics

Let's calculate and analyze multiple performance metrics to get a complete picture of model performance.

In [None]:
# Calculate comprehensive performance metrics
print("📊 COMPREHENSIVE PERFORMANCE ANALYSIS")
print("=" * 40)

# Basic regression metrics
mse = mean_squared_error(y_target, y_pred_all)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_target, y_pred_all)
r2 = r2_score(y_target, y_pred_all)
mape = mean_absolute_percentage_error(y_target, y_pred_all) * 100

# Additional metrics
residuals = y_target - y_pred_all
mean_residual = np.mean(residuals)
std_residual = np.std(residuals)

# Percentage of predictions within certain error bounds
within_10_percent = np.mean(np.abs(residuals) <= 0.1 * y_target) * 100
within_20_percent = np.mean(np.abs(residuals) <= 0.2 * y_target) * 100
within_1_kwh = np.mean(np.abs(residuals) <= 1.0) * 100

print("🎯 ACCURACY METRICS:")
print("-" * 20)
print(f"📈 R² Score: {r2:.4f} ({r2*100:.2f}% variance explained)")
print(f"📊 Mean Squared Error: {mse:.4f}")
print(f"📉 Root Mean Squared Error: {rmse:.4f} kWh")
print(f"📋 Mean Absolute Error: {mae:.4f} kWh")
print(f"📈 Mean Absolute Percentage Error: {mape:.2f}%")

print("\n🎯 RESIDUAL ANALYSIS:")
print("-" * 20)
print(f"📊 Mean Residual: {mean_residual:.4f} kWh (bias)")
print(f"📈 Std Residual: {std_residual:.4f} kWh (spread)")

print("\n🎯 PRACTICAL ACCURACY:")
print("-" * 20)
print(f"✅ Within 10% error: {within_10_percent:.1f}% of predictions")
print(f"✅ Within 20% error: {within_20_percent:.1f}% of predictions")
print(f"✅ Within 1 kWh error: {within_1_kwh:.1f}% of predictions")

# Performance interpretation
print("\n💡 PERFORMANCE INTERPRETATION:")
print("-" * 30)
if r2 > 0.9:
    performance_level = "🌟 Outstanding"
elif r2 > 0.8:
    performance_level = "🔥 Excellent"
elif r2 > 0.7:
    performance_level = "✅ Very Good"
elif r2 > 0.6:
    performance_level = "👍 Good"
elif r2 > 0.5:
    performance_level = "⚠️ Fair"
else:
    performance_level = "❌ Poor"

print(f"🏆 Overall Performance: {performance_level}")
print(f"📊 Model explains {r2*100:.1f}% of energy consumption variance")
print(f"📈 Average prediction error: {mae:.2f} kWh/day")
print(f"💰 This translates to ~₹{mae * 6:.0f}/month error (at ₹6/kWh)")

## 3. 📈 Visual Performance Analysis

Let's create comprehensive visualizations to understand model performance across different dimensions.

In [None]:
# Create comprehensive performance visualization dashboard
fig = make_subplots(
    rows=3, cols=2,
    subplot_titles=[
        '🎯 Predicted vs Actual', '📊 Residual Distribution',
        '📈 Residuals vs Predicted', '⚡ Performance by Appliance',
        '🌡️ Performance by Season', '💰 Error Distribution'
    ],
    specs=[[{"secondary_y": False}, {"secondary_y": False}],
           [{"secondary_y": False}, {"secondary_y": False}],
           [{"secondary_y": False}, {"secondary_y": False}]]
)

# Plot 1: Predicted vs Actual
fig.add_trace(
    go.Scatter(
        x=y_target, y=y_pred_all,
        mode='markers',
        name='Predictions',
        marker=dict(color='blue', size=4, opacity=0.6)
    ),
    row=1, col=1
)

# Perfect prediction line
min_val, max_val = y_target.min(), y_target.max()
fig.add_trace(
    go.Scatter(
        x=[min_val, max_val], y=[min_val, max_val],
        mode='lines',
        name='Perfect Prediction',
        line=dict(color='red', dash='dash')
    ),
    row=1, col=1
)

# Plot 2: Residual Distribution
fig.add_trace(
    go.Histogram(
        x=residuals,
        nbinsx=30,
        name='Residuals',
        marker_color='lightblue'
    ),
    row=1, col=2
)

# Plot 3: Residuals vs Predicted
fig.add_trace(
    go.Scatter(
        x=y_pred_all, y=residuals,
        mode='markers',
        name='Residual Pattern',
        marker=dict(color='green', size=4, opacity=0.6)
    ),
    row=2, col=1
)

# Zero line
fig.add_trace(
    go.Scatter(
        x=[y_pred_all.min(), y_pred_all.max()], y=[0, 0],
        mode='lines',
        name='Zero Error',
        line=dict(color='red', dash='dash')
    ),
    row=2, col=1
)

# Plot 4: Performance by Appliance
appliance_mae = df.groupby('appliance_type').apply(
    lambda x: mean_absolute_error(
        x['daily_consumption_kwh'], 
        y_pred_all[x.index]
    )
).sort_values()

fig.add_trace(
    go.Bar(
        x=appliance_mae.index,
        y=appliance_mae.values,
        name='MAE by Appliance',
        marker_color='orange'
    ),
    row=2, col=2
)

# Plot 5: Performance by Season
season_mae = df.groupby('season').apply(
    lambda x: mean_absolute_error(
        x['daily_consumption_kwh'], 
        y_pred_all[x.index]
    )
)

fig.add_trace(
    go.Bar(
        x=season_mae.index,
        y=season_mae.values,
        name='MAE by Season',
        marker_color='lightcoral'
    ),
    row=3, col=1
)

# Plot 6: Error Distribution by Range
error_ranges = ['0-0.5', '0.5-1.0', '1.0-2.0', '2.0+']
error_counts = [
    np.sum(np.abs(residuals) <= 0.5),
    np.sum((np.abs(residuals) > 0.5) & (np.abs(residuals) <= 1.0)),
    np.sum((np.abs(residuals) > 1.0) & (np.abs(residuals) <= 2.0)),
    np.sum(np.abs(residuals) > 2.0)
]

fig.add_trace(
    go.Bar(
        x=error_ranges,
        y=error_counts,
        name='Error Distribution',
        marker_color='lightgreen'
    ),
    row=3, col=2
)

# Update layout
fig.update_layout(
    height=1200,
    showlegend=False,
    title_text="📊 Comprehensive Model Performance Dashboard",
    title_x=0.5
)

# Update axes labels
fig.update_xaxes(title_text="Actual Consumption (kWh)", row=1, col=1)
fig.update_yaxes(title_text="Predicted Consumption (kWh)", row=1, col=1)
fig.update_xaxes(title_text="Residuals (kWh)", row=1, col=2)
fig.update_yaxes(title_text="Frequency", row=1, col=2)
fig.update_xaxes(title_text="Predicted Consumption (kWh)", row=2, col=1)
fig.update_yaxes(title_text="Residuals (kWh)", row=2, col=1)
fig.update_xaxes(title_text="Appliance Type", row=2, col=2)
fig.update_yaxes(title_text="MAE (kWh)", row=2, col=2)
fig.update_xaxes(title_text="Season", row=3, col=1)
fig.update_yaxes(title_text="MAE (kWh)", row=3, col=1)
fig.update_xaxes(title_text="Error Range (kWh)", row=3, col=2)
fig.update_yaxes(title_text="Count", row=3, col=2)

fig.show()

print("📊 VISUAL ANALYSIS INSIGHTS:")
print("-" * 30)
print(f"🎯 Best performing appliance: {appliance_mae.index[0]} (MAE: {appliance_mae.iloc[0]:.3f} kWh)")
print(f"⚠️ Challenging appliance: {appliance_mae.index[-1]} (MAE: {appliance_mae.iloc[-1]:.3f} kWh)")
print(f"🌡️ Best season: {season_mae.idxmin()} (MAE: {season_mae.min():.3f} kWh)")
print(f"📈 Most predictions ({error_counts[0]}/{len(residuals)}) have <0.5 kWh error")

## 4. 💼 Business Impact Analysis

Let's analyze the practical business implications of our model's accuracy.

In [None]:
# Business impact analysis
print("💼 BUSINESS IMPACT ANALYSIS")
print("=" * 30)

# Cost implications
electricity_rate = 6.0  # INR per kWh (average Indian rate)
days_per_month = 30

# Calculate monthly cost errors
monthly_cost_errors = np.abs(residuals) * electricity_rate * days_per_month
avg_monthly_cost_error = np.mean(monthly_cost_errors)
max_monthly_cost_error = np.max(monthly_cost_errors)

# Calculate total consumption and costs
total_actual_monthly = y_target.sum() * days_per_month
total_predicted_monthly = y_pred_all.sum() * days_per_month
total_actual_cost = total_actual_monthly * electricity_rate
total_predicted_cost = total_predicted_monthly * electricity_rate

print("💰 COST IMPACT ANALYSIS:")
print("-" * 25)
print(f"   📊 Average monthly cost error: ₹{avg_monthly_cost_error:.2f} per appliance")
print(f"   📈 Maximum monthly cost error: ₹{max_monthly_cost_error:.2f} per appliance")
print(f"   📋 Median monthly cost error: ₹{np.median(monthly_cost_errors):.2f} per appliance")

# Error by appliance type
print("\n⚡ ERROR BY APPLIANCE TYPE:")
print("-" * 30)
for appliance in df['appliance_type'].unique():
    mask = df['appliance_type'] == appliance
    appliance_errors = monthly_cost_errors[mask]
    avg_error = np.mean(appliance_errors)
    print(f"   📊 {appliance}: ₹{avg_error:.2f}/month average error")

# Model reliability assessment
print("\n🛡️ MODEL RELIABILITY ASSESSMENT:")
print("-" * 35)

reliable_predictions = np.sum(np.abs(residuals) <= 0.5) / len(residuals) * 100
acceptable_predictions = np.sum(np.abs(residuals) <= 1.0) / len(residuals) * 100

print(f"   ✅ Highly reliable predictions (≤0.5 kWh error): {reliable_predictions:.1f}%")
print(f"   👍 Acceptable predictions (≤1.0 kWh error): {acceptable_predictions:.1f}%")

# Business recommendations
print("\n📋 BUSINESS RECOMMENDATIONS:")
print("-" * 30)
if reliable_predictions > 70:
    print("   🌟 Excellent reliability - ready for production deployment")
    print("   ✅ Can be used for energy planning and cost estimation")
elif reliable_predictions > 50:
    print("   👍 Good reliability - suitable for most applications")
    print("   ⚠️ Consider confidence intervals for critical decisions")
else:
    print("   ❌ Limited reliability - needs improvement before deployment")
    print("   🔧 Consider collecting more data or feature engineering")

print(f"\n💡 For household energy management, this model can save families")
print(f"   an average of ₹{avg_monthly_cost_error:.0f}/month in prediction accuracy!")

## 5. 📝 Model Evaluation Summary and Recommendations

Let's create a comprehensive summary of our model evaluation findings.

In [None]:
# Create comprehensive evaluation summary
print("📝 COMPREHENSIVE MODEL EVALUATION SUMMARY")
print("=" * 50)

# Executive Summary
print("🎯 EXECUTIVE SUMMARY:")
print("-" * 20)
print(f"   🧠 Model Type: Neural Network (TensorFlow/Keras)")
print(f"   📊 Dataset Size: {len(df):,} appliances from {df['household_id'].nunique()} households")
print(f"   🎯 Target Variable: Daily Energy Consumption (kWh)")
print(f"   📈 Overall Performance: {performance_level}")
print(f"   📊 Accuracy (R²): {r2:.3f} ({r2*100:.1f}% variance explained)")
print(f"   💰 Average Cost Error: ₹{avg_monthly_cost_error:.2f}/month per appliance")

# Strengths
print("\n💪 MODEL STRENGTHS:")
print("-" * 20)
strengths = []
if r2 > 0.7:
    strengths.append("High predictive accuracy")
if within_20_percent > 80:
    strengths.append("Most predictions within 20% error")
if abs(mean_residual) < 0.1:
    strengths.append("Low systematic bias")

for i, strength in enumerate(strengths, 1):
    print(f"   {i}. ✅ {strength}")

# Deployment Readiness
print("\n🚀 DEPLOYMENT READINESS:")
print("-" * 25)
deployment_score = 0
max_score = 5

# Scoring criteria
if r2 > 0.7: deployment_score += 1
if within_20_percent > 75: deployment_score += 1
if abs(mean_residual) < 0.2: deployment_score += 1
if avg_monthly_cost_error < 100: deployment_score += 1
if reliable_predictions > 60: deployment_score += 1

deployment_percentage = (deployment_score / max_score) * 100

print(f"   📊 Deployment Score: {deployment_score}/{max_score} ({deployment_percentage:.0f}%)")

if deployment_percentage >= 80:
    readiness = "🟢 Ready for Production"
    recommendation = "Deploy with confidence"
elif deployment_percentage >= 60:
    readiness = "🟡 Ready with Monitoring"
    recommendation = "Deploy with careful monitoring"
else:
    readiness = "🔴 Needs Improvement"
    recommendation = "Improve before deployment"

print(f"   🎯 Status: {readiness}")
print(f"   💡 Recommendation: {recommendation}")

# Final Recommendations
print("\n📋 FINAL RECOMMENDATIONS:")
print("-" * 30)
print("   1. 📱 Integrate model into web application for user predictions")
print("   2. 🔄 Implement model monitoring and performance tracking")
print("   3. 📊 Collect user feedback to improve future versions")
print("   4. ⚖️ Add confidence intervals for critical business decisions")
print("   5. 🔧 Consider ensemble methods for improved robustness")

# Save evaluation results
evaluation_results = {
    'r2_score': r2,
    'mae': mae,
    'rmse': rmse,
    'mape': mape,
    'within_20_percent': within_20_percent,
    'avg_monthly_cost_error': avg_monthly_cost_error,
    'deployment_score': deployment_score,
    'deployment_percentage': deployment_percentage,
    'best_appliance': appliance_mae.index[0],
    'worst_appliance': appliance_mae.index[-1]
}

joblib.dump(evaluation_results, '../models/evaluation_results.pkl')
print("\n💾 Evaluation results saved for future reference!")

print("\n" + "="*50)
print("🎉 MODEL EVALUATION COMPLETE!")
print(f"✅ Your neural network is ready for {recommendation.lower()}!")
print("🚀 Next step: Deploy in the web application!")

# 📊 Model Evaluation and Performance Analysis

**Comprehensive Analysis of Neural Network Performance for Appliance Energy Prediction**

This notebook provides in-depth evaluation of our trained neural network model. You'll learn how to assess model performance, identify strengths and weaknesses, and validate the model for real-world deployment.

## 🎯 What You'll Learn
1. **Load and test** the trained neural network model
2. **Comprehensive evaluation** using multiple metrics
3. **Cross-validation** for robust performance assessment
4. **Feature importance** analysis
5. **Model interpretation** and business insights
6. **Deployment readiness** assessment

## 📊 Evaluation Approach
- **Accuracy Metrics**: R², MSE, MAE, MAPE
- **Visual Analysis**: Prediction plots, residual analysis
- **Statistical Tests**: Distribution analysis, bias detection
- **Business Metrics**: Cost implications, practical accuracy

---

In [None]:
# Import all necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Machine learning libraries
from sklearn.model_selection import cross_val_score, KFold
from sklearn.metrics import (
    mean_squared_error, mean_absolute_error, r2_score,
    mean_absolute_percentage_error
)
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras.models import load_model
import joblib

# Statistical libraries
from scipy import stats
from scipy.stats import normaltest, shapiro
import warnings
warnings.filterwarnings('ignore')

# Set style and random seeds
plt.style.use('default')
sns.set_palette('husl')
np.random.seed(42)
tf.random.set_seed(42)

# Configure display
pd.set_option('display.max_columns', None)
plt.rcParams['figure.figsize'] = (12, 8)

print("📊 MODEL EVALUATION SETUP COMPLETE!")
print("=" * 40)
print(f"🧠 TensorFlow Version: {tf.__version__}")
print(f"📈 Ready to evaluate neural network performance!")