# **Electricity Demand Forecasting**

## **1. Project Overview**

This project focuses on forecasting electricity demand (total load) and predicting electricity prices using historical energy generation and weather data. Accurate forecasting is crucial for efficient grid management, minimizing energy wastage, and optimizing power generation to meet demand at the lowest cost. The two datasets used provide information on energy production from various sources and detailed weather conditions, making it possible to explore the impact of external factors such as weather on energy demand and prices.

## **2. Problem Statement**

The aim is to build a model that can predict two main targets:

- **Total electricity load (demand)**: This helps energy producers adjust generation in real-time.

- **Electricity price**: Accurately predicting prices helps utilities and businesses with cost planning and optimization.


The models developed should help improve planning and operational decisions in the energy sector.

## **3. Dataset Overview**

We are using two main sets of data:

- **Energy Dataset**: This contains information about how much electricity was generated from different sources (like solar, wind, coal), forecasts (predictions made earlier), actual electricity loads, and energy prices.


- **Weather Dataset**: This includes weather information like temperature, pressure, humidity, wind speed, and conditions (e.g., clear, rainy) for different cities.

## **4. Imports**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')

## **5. Pipeline**

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')

def load_data(energy_path, weather_path):
    """
    Load the energy and weather datasets.
    
    Parameters:
    energy_path (str): Path to the energy dataset.
    weather_path (str): Path to the weather dataset.
    
    Returns:
    energy (DataFrame): Energy dataset.
    weather (DataFrame): Weather dataset.
    """
    energy = pd.read_csv(energy_path)
    weather = pd.read_csv(weather_path)
    return energy, weather

def clean_data(energy, weather):
    """
    Clean the datasets by filling missing values and dropping empty columns.
    
    Parameters:
    energy (DataFrame): Energy dataset.
    weather (DataFrame): Weather dataset.
    
    Returns:
    data (DataFrame): Cleaned and merged dataset.
    """
    # Drop empty columns
    energy = energy.drop(columns=['forecast wind offshore eday ahead', 'generation hydro pumped storage aggregated'])
    energy = energy.ffill()  # Fill missing values
    weather = weather.ffill()

    # Convert time columns to datetime
    energy['time'] = pd.to_datetime(energy['time'], utc=True)
    weather['dt_iso'] = pd.to_datetime(weather['dt_iso'], utc=True)

    # Merge datasets
    data = pd.merge(energy, weather, left_on='time', right_on='dt_iso', how='inner')
    return data

def feature_engineering(data):
    """
    Create new time-based features and select relevant columns for modeling.
    
    Parameters:
    data (DataFrame): Cleaned dataset.
    
    Returns:
    X (DataFrame): Feature matrix.
    y_price (Series): Target variable for price.
    y_load (Series): Target variable for load.
    """
    # Create time-based features
    data['hour'] = data['time'].dt.hour
    data['day'] = data['time'].dt.day
    data['month'] = data['time'].dt.month
    data['weekday'] = data['time'].dt.weekday

    # Select features and target
    features = ['generation biomass', 'generation fossil gas', 'generation solar', 'generation wind onshore', 
                'total load actual', 'temp', 'pressure', 'humidity', 'wind_speed', 'hour', 'weekday']
    X = data[features]
    y_price = data['price actual']
    y_load = data['total load actual']
    return X, y_price, y_load

def train_models(X_train, y_train):
    """
    Train Random Forest and Gradient Boosting models.
    
    Parameters:
    X_train (DataFrame): Training feature matrix.
    y_train (Series): Training target variable.
    
    Returns:
    rf (RandomForestRegressor): Trained Random Forest model.
    gb (GradientBoostingRegressor): Trained Gradient Boosting model.
    """
    rf = RandomForestRegressor(n_estimators=100, random_state=42)
    gb = GradientBoostingRegressor(random_state=42)
    rf.fit(X_train, y_train)
    gb.fit(X_train, y_train)
    return rf, gb

def evaluate_model(model, X_test, y_test):
    """
    Evaluate the model using Mean Squared Error and R-squared metrics.
    
    Parameters:
    model (Regressor): Trained model.
    X_test (DataFrame): Test feature matrix.
    y_test (Series): Test target variable.
    
    Returns:
    mse (float): Mean Squared Error.
    r2 (float): R-squared score.
    """
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    return mse, r2

## **6. Execution**

In [4]:
def main():
    # Load data
    energy_path = './datasets/energy_dataset.csv'
    weather_path = './datasets/weather_features.csv'
    energy, weather = load_data(energy_path, weather_path)
    
    # Clean and preprocess data
    data = clean_data(energy, weather)
    X, y_price, y_load = feature_engineering(data)
    
    # Split data
    X_train_price, X_test_price, y_train_price, y_test_price = train_test_split(X, y_price, test_size=0.2, random_state=42)
    X_train_load, X_test_load, y_train_load, y_test_load = train_test_split(X, y_load, test_size=0.2, random_state=42)
    
    # Train models
    rf_price, gb_price = train_models(X_train_price, y_train_price)
    rf_load, gb_load = train_models(X_train_load, y_train_load)
    
    # Evaluate models
    mse_rf_price, r2_rf_price = evaluate_model(rf_price, X_test_price, y_test_price)
    mse_gb_price, r2_gb_price = evaluate_model(gb_price, X_test_price, y_test_price)
    
    mse_rf_load, r2_rf_load = evaluate_model(rf_load, X_test_load, y_test_load)
    mse_gb_load, r2_gb_load = evaluate_model(gb_load, X_test_load, y_test_load)
    
    # Print results
    print(f"Random Forest Price - MSE: {mse_rf_price}, R2: {r2_rf_price}")
    print(f"Gradient Boosting Price - MSE: {mse_gb_price}, R2: {r2_gb_price}")
    print(f"Random Forest Load - MSE: {mse_rf_load}, R2: {r2_rf_load}")
    print(f"Gradient Boosting Load - MSE: {mse_gb_load}, R2: {r2_gb_load}")

if __name__ == '__main__':
    main()