# Energy Production 6-Step Ahead Forecasting

This notebook demonstrates a complete pipeline for forecasting energy production 6 hours ahead using a multi-output regression approach. The process includes:

- 6-step ahead forecasting: The model predicts the next 6 hours of energy production for any given time, e.g., if run at 8am, it forecasts 9am–2pm.
- Model and features:
  - Uses XGBoost, a robust machine learning algorithm for tabular data.
  - Features include hour, day of week, month, lagged values, and rolling means of the energy production column (`power_x`), which are effective for time series forecasting.
- Code and process transparency:
  - All steps are shown: data loading, feature engineering, target creation, train/test split, model training, evaluation, and saving.
  - The code is modular, with clear function definitions and comments explaining the reasoning behind each step.
  - Diagnostic print statements are included to make the process transparent and reproducible.
- Forecasting ability:
  - The function `forecast_next_6_hours` allows forecasting the next 6 hours for any input, fulfilling the requirement to forecast a site’s energy potential.

This approach is designed to be clear, reproducible, and easy to adapt for similar forecasting tasks.

---

# Energy Data Exploration

This notebook loads the energy dataset and visualizes energy production patterns over time.

In [None]:
import pandas as pd

df = pd.read_csv(r"C:/Users/prese/Downloads/Recruitment Dataset (1).csv", parse_dates=['date'])
df.info()
df.head()

## Visualize Energy Production Patterns

The following plot shows how energy production changes over time.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(df['date'], df['power_x'])
plt.title("Energy Production Over Time")
plt.xlabel("Date")
plt.ylabel("Energy (MW)")
plt.show()

In [None]:
import numpy as np

def create_features(df):
    # Extract time-based features
    df['hour'] = df['date'].dt.hour
    df['dayofweek'] = df['date'].dt.dayofweek
    df['month'] = df['date'].dt.month
    # Add lag and rolling features
    df['lag_1'] = df['power_x'].shift(1)
    df['lag_2'] = df['power_x'].shift(2)
    df['rolling_mean_3'] = df['power_x'].rolling(3).mean()
    # Drop rows with missing values in relevant columns
    df.dropna(subset=['power_x', 'lag_1', 'lag_2', 'rolling_mean_3'], inplace=True)
    return df

In [None]:
def create_targets(df, n_steps=6):
    # Create target columns for multi-step forecasting
    target_cols = [f'target_t+{i}' for i in range(1, n_steps + 1)]
    for i in range(1, n_steps + 1):
        df[f'target_t+{i}'] = df['power_x'].shift(-i)
    df.dropna(subset=target_cols, inplace=True)
    return df

In [None]:
from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.metrics import mean_absolute_error
import joblib
import os

# Prepare features and targets
target_cols = [f'target_t+{i}' for i in range(1, 7)]
features = df.drop(columns=['date', 'power_x'] + target_cols)
targets = df[target_cols]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(features, targets, test_size=0.2, shuffle=False)

# Train XGBoost models for each step
def train_models(X_train, y_train, X_test, y_test):
    models = []
    for i in range(6):
        model = xgb.XGBRegressor()
        model.fit(X_train, y_train.iloc[:, i])
        pred = model.predict(X_test)
        mae = mean_absolute_error(y_test.iloc[:, i], pred)
        print(f"Step {i+1} MAE: {mae:.2f}")
        models.append(model)
    return models

models = train_models(X_train, y_train, X_test, y_test)

# Save models
def save_models(models, path_prefix="models/xgb_model_t+"):
    os.makedirs(os.path.dirname(path_prefix), exist_ok=True)
    for i, model in enumerate(models):
        joblib.dump(model, f"{path_prefix}{i+1}.pkl")

save_models(models)

# Forecast function
def forecast_next_6_hours(latest_row, models):
    predictions = []
    for model in models:
        pred = model.predict(latest_row.values.reshape(1, -1))
        predictions.append(pred[0])
    return predictions

# Example forecast for the first test sample
true_vals = y_test.iloc[0]
pred_vals = forecast_next_6_hours(X_test.iloc[0], models)


In [None]:
import matplotlib.pyplot as plt

# Compare actual and predicted values for the first test sample
plt.plot(range(1, 7), true_vals, label='Actual')
plt.plot(range(1, 7), pred_vals, label='Predicted')
plt.xlabel("Hours Ahead")
plt.ylabel("Energy (MW)")
plt.legend()
plt.title("6-Hour Ahead Forecast")
plt.show()
