### Establishing the "Persistence" Baseline
In MLOps, a Baseline is the minimum threshold of performance. For time-series, we use Persistence Forecasting.

#### The Logic
If you tell a power grid manager, "I have a complex AI model," they will ask, "Is it better than just assuming the power 15 minutes from now will be the same as it is right now?"

If your model has a higher error than this simple assumption, your model is adding negative value.

#### Calculating the Baseline (MAE)
We calculate this by "shifting" the target column by one time-step and comparing it to the original.

In [6]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
import sys
import os
from pathlib import Path

# Ensure the notebook can find the 'src' directory
sys.path.append(os.path.abspath(os.path.join('..')))

from src.preprocessing.data_preprocessing import load_and_merge_data
from sklearn.metrics import mean_absolute_error

def calculate_persistence_baseline(df):
    """
    Calculates the Mean Absolute Error (MAE) for a 
    Persistence (Naive) Forecast using the provided dataframe.
    """
    # y_actual is the true power output
    y_actual = df['DC_POWER']
    
    # y_naive_pred is the power output from 15 minutes ago
    y_naive_pred = df['DC_POWER'].shift(1)
    
    # Drop the first row (NaN) to calculate error
    mask = y_naive_pred.notna()
    
    mae = mean_absolute_error(y_actual[mask], y_naive_pred[mask])
    
    print(f"--- Persistence Baseline ---")
    print(f"Baseline MAE: {mae:.2f} kW")
    return mae

In [8]:
# Load once
df = load_and_merge_data("Plant_1_Generation_Data.csv", "Plant_1_Weather_Sensor_Data.csv")

# Pass the loaded df to the function
baseline_mae = calculate_persistence_baseline(df)

--- Persistence Baseline ---
Baseline MAE: 237.23 kW


  gen_df['DATE_TIME'] = pd.to_datetime(gen_df['DATE_TIME'], dayfirst=False)
