# ‚ö° Energy Consumption Forecasting
## Complete Time Series Analysis & Prediction Pipeline

**Author:** Alexy Louis  
**Email:** alexy.louis.scholar@gmail.com  
**Dataset:** 2 years of hourly energy consumption (17,500+ records)

---

### üéØ Objectives
1. Analyze energy consumption patterns and seasonalities
2. Engineer features for time series prediction
3. Compare classical, ML, and deep learning models
4. Build ensemble for production-grade forecasting
5. Detect anomalies in consumption patterns

### üèÜ Key Results
- **Best Model:** Ensemble (LightGBM + XGBoost)
- **MAPE:** 2.18%
- **Forecast Horizon:** 7 days (168 hours)


## 1. Setup & Configuration

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Configuration
plt.style.use('seaborn-v0_8-whitegrid')
pd.set_option('display.max_columns', 50)

COLORS = {
    'primary': '#2E86AB',
    'secondary': '#A23B72', 
    'success': '#27AE60',
    'danger': '#E74C3C',
    'warning': '#F39C12',
    'purple': '#9B59B6',
}

print("‚úÖ Libraries loaded successfully!")

## 2. Data Loading & Overview

Our dataset contains:
- **17,497 hourly records** (Jan 2022 - Dec 2023)
- **46 features** including consumption, weather, calendar, and solar data
- **Injected anomalies** for detection testing


In [None]:
# Load the main dataset
df = pd.read_csv('data/raw/energy_consumption.csv', parse_dates=['timestamp'])
df.set_index('timestamp', inplace=True)

print(f"Dataset Shape: {df.shape}")
print(f"Date Range: {df.index.min().date()} to {df.index.max().date()}")
print(f"\nColumns: {list(df.columns)[:15]}...")
print(f"\n{df.head()}")

In [None]:
# Summary statistics
print("üìä CONSUMPTION STATISTICS")
print(f"   Mean: {df['consumption_mwh'].mean():.2f} MWh")
print(f"   Std: {df['consumption_mwh'].std():.2f} MWh")
print(f"   Min: {df['consumption_mwh'].min():.2f} MWh")
print(f"   Max: {df['consumption_mwh'].max():.2f} MWh")

print(f"\nüå°Ô∏è WEATHER STATISTICS")
print(f"   Temperature: {df['temperature_c'].min():.1f}¬∞C to {df['temperature_c'].max():.1f}¬∞C")

print(f"\n‚ö†Ô∏è EXTREME EVENTS")
print(f"   Heat waves: {df['is_heat_wave'].sum():,} hours")
print(f"   Cold snaps: {df['is_cold_snap'].sum():,} hours")
print(f"   Storms: {df['is_storm'].sum():,} hours")

## 3. Exploratory Data Analysis

### 3.1 Time Series Overview

In [None]:
# Display saved visualizations
from IPython.display import Image, display

print("üìä Full Time Series Overview")
display(Image('images/01_time_series_overview.png'))

### 3.2 Seasonality Analysis

In [None]:
print("üìä Multiple Seasonalities Detected:")
print("   - Hourly: Peak at 11am and 6pm")
print("   - Weekly: Lower on weekends (8% reduction)")
print("   - Monthly: U-shaped (high in winter/summer)")
print("   - Annual: Heating/cooling demand cycles")

display(Image('images/02_seasonality_analysis.png'))

### 3.3 Temperature vs Consumption

In [None]:
print("üìä Non-linear U-shaped relationship:")
print("   - Below 15¬∞C: Heating demand increases")
print("   - 15-22¬∞C: Comfort zone (minimal HVAC)")
print("   - Above 22¬∞C: Cooling demand increases")

display(Image('images/03_temperature_consumption.png'))

### 3.4 Extreme Weather Impact

In [None]:
print("‚ö†Ô∏è Extreme Weather Effects:")
print("   - Heat waves: +35% consumption increase")
print("   - Cold snaps: +40% consumption increase")
print("   - Storms: -5% (industrial shutdown)")

display(Image('images/04_extreme_weather_impact.png'))

## 4. Feature Engineering

Created **138 features** across 10 categories:


In [None]:
# Load engineered features
df_features = pd.read_csv('data/processed/energy_features_model.csv')
print(f"Engineered dataset: {df_features.shape}")

import json
with open('data/processed/feature_list.json', 'r') as f:
    feature_cats = json.load(f)

print("\nüìã FEATURE CATEGORIES:")
for cat, features in feature_cats.items():
    print(f"   {cat}: {len(features)} features")

## 5. Model Training & Evaluation

### 5.1 Classical Time Series Models (Daily Data)

In [None]:
classical_metrics = pd.read_csv('models/classical_model_metrics.csv')
print("üìä Classical Models (30-day forecast):")
print(classical_metrics.to_string(index=False))

print("\n‚ö†Ô∏è High MAPE indicates classical models struggle with:")
print("   - Multiple overlapping seasonalities")
print("   - Complex weather interactions")
print("   - Extreme event patterns")

### 5.2 Prophet Models

In [None]:
prophet_metrics = pd.read_csv('models/prophet_metrics.csv')
print("üìä Prophet Models (7-day hourly forecast):")
print(prophet_metrics.to_string(index=False))

print("\n‚úÖ Prophet handles multiple seasonalities automatically!")
display(Image('images/13_prophet_components.png'))

### 5.3 Machine Learning Models

In [None]:
ml_metrics = pd.read_csv('models/ml_model_metrics.csv')
print("üìä ML Models (7-day hourly forecast):")
print(ml_metrics.to_string(index=False))

print("\nüèÜ LightGBM achieves best single-model performance!")
display(Image('images/17_ml_forecast_importance.png'))

### 5.4 Deep Learning (LSTM)

In [None]:
lstm_metrics = pd.read_csv('models/lstm_metrics.csv')
print("üìä LSTM Models (7-day hourly forecast):")
print(lstm_metrics.to_string(index=False))

display(Image('images/19_lstm_forecast.png'))

## 6. Final Model Comparison

### 6.1 All Models Ranked

In [None]:
final_metrics = pd.read_csv('models/final_model_comparison.csv')
print("üèÜ FINAL MODEL RANKINGS:")
print(final_metrics.to_string(index=False))

display(Image('images/22_model_comparison_final.png'))

### 6.2 Forecast Visualization

In [None]:
display(Image('images/23_forecast_comparison.png'))

### 6.3 Error Analysis

In [None]:
display(Image('images/24_error_analysis.png'))

## 7. Anomaly Detection

Multiple methods combined for robust detection:
- Statistical (Z-score, IQR)
- Machine Learning (Isolation Forest)
- Prediction Residual Analysis


In [None]:
print("üìä ANOMALY DETECTION RESULTS:")
print("   Injected anomalies: 45 hours")
print("   Methods: Z-score, IQR, Isolation Forest, Residual")
print("   Recall: 51.1% (23/45 detected)")
print("\n   Key Insight: Real-world anomaly detection is challenging")
print("   due to high natural variability in energy data.")

display(Image('images/20_anomaly_detection.png'))

## 8. Conclusions & Key Findings

### üéØ Model Performance Summary

| Approach | Best Model | MAPE |
|----------|------------|------|
| Classical | SARIMA | ~89% |
| Prophet | Basic | 9.35% |
| ML | LightGBM | 2.19% |
| Deep Learning | Simple LSTM | 2.83% |
| **Ensemble** | **Top-2 (LGB+XGB)** | **2.18%** |

### üìà Key Insights

1. **Tree-based models excel** at capturing complex feature interactions
2. **Lag features are critical** - same-hour-yesterday is #1 predictor
3. **Prophet captures seasonality well** but underperforms on point accuracy
4. **LSTMs are competitive** but computationally expensive
5. **Simple ensembles work best** - top-2 average beats weighted

### üîÆ Production Recommendations

1. Deploy LightGBM + XGBoost ensemble for real-time forecasting
2. Use Prophet for long-term trend analysis and capacity planning
3. Implement automated retraining with drift detection
4. Monitor forecast accuracy by time-of-day (error varies by hour)

### üíº Business Value

- **2.18% MAPE** enables precise grid load balancing
- **Extreme weather modeling** improves emergency preparedness
- **Anomaly detection** identifies equipment issues early


---

## üë§ Author

**Alexy Louis**
- üìß alexy.louis.scholar@gmail.com
- üíº [LinkedIn](https://www.linkedin.com/in/alexy-louis-19a5a9262/)
- üêô [GitHub](https://github.com/Smooth-Cactus0)

---

*Project completed as part of Data Analysis Portfolio*
