# 📊 Baseline Modeling – Energy Consumption Forecast

This notebook establishes baseline models for forecasting electricity consumption using the engineered dataset from previous steps.

We will:
- Load the processed feature matrix
- Split the data chronologically into training and testing sets
- Build simple baseline models (e.g., Naive, Mean)
- Evaluate their performance using MAE, RMSE, and R²

These baselines will serve as reference points to assess the performance of more advanced machine learning models in future notebooks.

## 🧰 0. Code: Setup + Load Data

In [None]:
# 📦 Import libraries
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [None]:
# 🔄 Load feature dataset
df = pd.read_csv("../data/processed/final_features.csv", parse_dates=["datetime"])

print(df.shape)
df.head()

## 🧪 1. Data Splitting

### 📅 1.1 Train/Test Split

As this is a time series forecasting task, we use a **chronological split** rather than a random one.

We train on data before 2018 and test on data from 2018 onward.

---

#### 📆 Code: Temporal split

In [None]:
# Temporal split
cutoff = "2018-01-01"

train = df[df["datetime"] < cutoff].copy()
test = df[df["datetime"] >= cutoff].copy()

print(f"Train: {train.shape}, Test: {test.shape}")

#### 🕰️ Why Temporal Split Instead of Random Split?

In traditional machine learning tasks, a **random 80/20 split** (using `train_test_split()` from `sklearn`) is commonly used because the data is assumed to be **i.i.d.** (independent and identically distributed).

However, in **time series forecasting**, this assumption does **not hold**.

Using a random split in this case would mix **past and future data** in both the training and test sets, which can result in:

- ❌ Data leakage (seeing future information during training)
- ❌ Over-optimistic performance metrics
- ❌ Unrealistic deployment expectations

Instead, we apply a **chronological split**:
- The **training set** contains all data *before 2018*.
- The **test set** contains all data *from 2018 onwards*.

This better reflects a real-world scenario where we aim to predict **future energy consumption** using **historical data only**.

## ⚙️ 2. Baseline Models

Before jumping into complex machine learning models, it's important to establish **baseline models**. These help set a reference for evaluating the performance of more advanced approaches.

We implement two simple yet meaningful baselines:

### 🐢 2.1 Naive Model

This model assumes that **the energy load at the current hour is equal to the load one hour ago**:

$$
\hat{y}_t = y_{t-1}
$$

where 

- $\hat{y}_t$ is the predicted energy load at time step $t$, and 
- $\hat{y}_{t-1}$ is the actual observed load at the previous time step $t-1$

This approach is commonly used as a **baseline for time series forecasting** tasks, especially when the data exhibits **high temporal continuity**, such as electricity consumption, which changes gradually over time.


### 📊 2.2 Mean Model

This model predicts the **mean of the training set** for every timestamp in the test set:

$$
\hat{y}_t = \frac{1}{n} \sum_{i=1}^{n} y_i
$$

where 

- $\hat{y}_t$ is the predicted value for time step $t$
- ${y}_{i}$ are the actual observed values in the training set
- and $n$ is the number of training samples

This model acts as a **constant predictor**, offering a **useful minimum performance threshold**. It’s helpful to understand whether a more complex model provides meaningful improvements beyond simply predicting the average.

While simple, it reflects the performance of a constant model and helps identify the **minimum performance threshold**.

### 📏 Evaluation Metrics

We will evaluate these baselines using three standard regression metrics:

#### **MAE** – *Mean Absolute Error*
$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |{y}_{i} - \hat{{y}_{i}} |
$$
- Measures the average absolute difference between predictions and actual values.  
- Intuitive to interpret, same unit as the target variable.  
- Does **not** penalize large errors more than small ones.

#### **RMSE** – *Root Mean Squared Error*
$$
\text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} \left({y}_{i} - \hat{{y}_{i}} \right)^{2}}
$$
- Similar to MAE, but penalizes **larger errors more heavily**.  
- Sensitive to outliers.  
- Also in the same unit as the target variable.

#### **R²** – *Coefficient of Determination*
$$
R² = 1 - \frac{\sum_{i=1}^{n} \left({y}_{i} - \hat{{y}_{i}} \right)^{2}}{\sum_{i=1}^{n} \left({y}_{i} - \bar{{y}} \right)^{2}}
$$
- Evaluates how well the model explains the variance of the target variable.  
- R² = 1 → perfect predictions  
- R² = 0 → no better than predicting the mean  
- R² < 0 → **worse** than the mean model

In [None]:
# Define target variable
target = "total_load_actual"

# Naive model: use previous hour’s load as prediction
naive_preds = test["load_lag_1h"].copy()

# Mean model: use mean of training target for all test predictions
mean_value = train[target].mean()
mean_preds = [mean_value] * len(test)

# Actual values
y_true = test[target]

In [None]:
# Metrics for both models


def evaluate_model(y_true, y_pred, name):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = mean_squared_error(y_true, y_pred)
    r2 = r2_score(y_true, y_pred)
    print(f"{name} → MAE: {mae:.2f} | RMSE: {rmse:.2f} | R²: {r2:.4f}")


evaluate_model(y_true, naive_preds, "Naive Model")
evaluate_model(y_true, mean_preds, "Mean Model")

In [None]:
# Visualización: comparar modelos vs real
plt.figure(figsize=(15, 5))
plt.plot(
    test["datetime"][:168], y_true[:168], label="Actual", color="black", linewidth=2
)
plt.plot(test["datetime"][:168], naive_preds[:168], label="Naive", linestyle="--")
plt.plot(test["datetime"][:168], mean_preds[:168], label="Mean", linestyle=":")
plt.title("Energy Load Forecast: Naive vs Mean vs Actual (First 7 days of test)")
plt.ylabel("Load (MW)")
plt.xlabel("Time")
plt.legend()
plt.tight_layout()
plt.show()

### 📉 Result Analysis: Why is R² Negative for the Mean Model?

While the Mean Model appears simple and intuitive, its performance in time series tasks is often very poor.

- It predicts a constant value (the mean of the training set) for all test timestamps.
- This completely ignores time-related patterns such as seasonality, trends, or cycles.
- As a result, its predictions deviate significantly from the true values.

This is reflected in a **negative R²**, which indicates that the model performs worse than simply predicting the average of the test labels (a horizontal line).

By contrast, the **Naive Model**, which uses the value from the previous hour, leverages the natural continuity in the data — especially useful in energy demand prediction — and achieves a much higher R².

📌 **Conclusion**:  
The poor performance of the Mean Model highlights the importance of incorporating temporal dynamics in forecasting models.