# **ELECTRICITY LOAD FORECAST**

## Business Understanding
- **Objective**: Forecast next-day household electricity consumption to enable proactive demand management and cost optimization.
- **Stakeholders**:  
  - **Operations Team**: Schedule generation and load balancing  
  - **Finance Team**: Budgeting for peak‐power purchases  
  - **Grid Automation Team**: Automated demand response triggers
- **Problem Statement**:  
  - How can we predict tomorrow’s aggregate power draw with sufficient accuracy to adjust procurement and grid dispatch ahead of time?
- **Key Questions**:  
  1. What is the expected total consumption for the next day?  
  2. How much deviation can we tolerate before incurring penalty costs with our suppliers?

## Metric of Success
- **Primary Success Metric**:  
  - **Mean Absolute Percentage Error (MAPE)** on next-day forecasts  
    - Target: MAPE ≤ 10%
- **Secondary Metrics**:  
  - **Root Mean Square Error (RMSE)**  
    - Target: RMSE ≤ 0.15 kW for daily-averaged data  
  - **Cost Savings**  
    - Reduction in penalty fees or spot‐market purchases (e.g., ≥ 5% per month)  
  - **Model Robustness**  
    - Consistent performance (MAPE and RMSE) across seasonal peaks and troughs  

## **Data Understanding**

### **Dataset Overview**

- **Source**: UCI Individual Household Electric Power Consumption
- **Time Period**: 4 years of measurements (2006-2010)
- **Frequency**: 1-minute intervals
- **Features**: 9 variables including power measurements and sub-metering data

### **Feature Description**

- `Global_active_power`: Total active power consumed (kW)
- `Global_reactive_power`: Total reactive power consumed (kW)
- `Voltage`: Average voltage (V)
- `Global_intensity`: Average current intensity (A)
- `Sub_metering_1`: Kitchen power consumption (Wh)
- `Sub_metering_2`: Laundry room power consumption (Wh)
- `Sub_metering_3`: Electric water heater and air conditioner (Wh)

In [None]:
# Imports

# Standard library imports
import warnings
import joblib
from pathlib import Path

# Third-party imports - Data manipulation
import pandas as pd
import numpy as np

# Third-party imports - Visualization
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns

# Configuration
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8')
sns.set_palette("Set2")

## **DATA PREPARATION**

In [None]:
# Read our Dataset and convert into a DataFrame

df = pd.read_csv("household_power_consumption.txt", sep=";", low_memory=False).replace("?", np.nan)

In [None]:
# Preview first five rows of the DataFrame

df.head()

In [None]:
# Number of Rows and Columns i.e., shape of the DataSet

df.shape

In [None]:
# Checking out information about my Data

df.info()

In [None]:
# Checking out columns in my DataFrame

df.columns

In [None]:
# Checking for Missing Values

df.isna().sum()

In [None]:
# Display missing data percentage per column

missing_pct = df.isna().mean() * 100
print("Missing values (%) per column:\n",missing_pct)

In [None]:
# Check out first five rows with missing Data

missing_rows = df[df.isna().any(axis=1)]
missing_rows.head()

### **Data Cleaning**

In [None]:
def wrangle(df):
    """
    Clean the dataset
    """
    # Create a copy to avoid modifying original
    df_clean = df.copy()

    # Create datetime column and set it as index to help with plotting
    df_clean["DateTime"] = pd.to_datetime(df["Date"] + ' ' + df["Time"], format="%d/%m/%Y %H:%M:%S")
    df_clean.set_index("DateTime", inplace=True)

    # Drop Original Date and Time columns
    df_clean.drop(columns=["Date","Time"], inplace=True)

    # Convert Column Data Types from String to Numeric
    numeric_columns = ["Global_active_power", "Global_reactive_power", 
                       "Voltage", "Global_intensity", "Sub_metering_1",	"Sub_metering_2"]
    for col in numeric_columns:
        df_clean[col] = pd.to_numeric(df_clean[col], errors="coerce")

    # Handle Missing Values
    df_clean = df_clean.dropna()


    return df_clean

In [None]:
# Wrangle Data

df = wrangle(df)

In [None]:
# Checking info of Cleaned DataSet

df.info()

In [None]:
df.head()

## **EXPLORATORY DATA ANALYSIS**

In [None]:
# Basic Statistics

df.describe()

#### Global Active Power Over Time

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(df.index, df["Global_active_power"], linewidth=2.5, color='green')
plt.title('Global Active Power Over Time', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Power (kW)', fontsize=12)
plt.tight_layout()
plt.show();

#### **Average Power Consumption by Month**

In [None]:
# Resample to monthly averages
monthly = df['Global_active_power'].resample('M').mean()

plt.figure(figsize=(12, 6))
plt.plot(monthly.index, monthly.values, linewidth=2.5, color='green')
plt.title('Average Power Consumption by Month', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Monthly Avg Power (kW)', fontsize=12)
plt.tight_layout()
plt.show();

#### **Average Power Consumption by Week**

In [None]:
# Resample to weekly averages
weekly = df['Global_active_power'].resample('W').mean()

plt.figure(figsize=(12, 6))
plt.plot(weekly.index, weekly.values, linewidth=2.5, color='green')
plt.title('Average Power Consumption by Week', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Weekly Avg Power (kW)', fontsize=12)
plt.tight_layout()
plt.show();

#### **Average Power Consumption by Date**

In [None]:
# Resample to daily averages
daily = df['Global_active_power'].resample('D').mean()

plt.figure(figsize=(12, 6))
plt.plot(daily.index, daily.values, linewidth=2.5, color='green')
plt.title('Global Active Power Over Time', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Daily Avg Power (kW)', fontsize=12)
plt.tight_layout()
plt.show();