In [4]:
import pandas as pd

df = pd.read_csv(r"C:\Users\Linds\Repos\East_River\data\processed\Cleaned_SCADA_Data.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Timestamp,OnLine_Load_MW,Load_Control_KW,Load_Control_MW,Estimated_Online_Load_Control_MW,Control_Threshold_MW
0,0,1/1/2021 0:00,514.37,0.0,0.0,514.37,626.2
1,1,1/1/2021 0:30,505.76,0.0,0.0,505.76,572.0
2,2,1/1/2021 1:00,504.8,0.0,0.0,504.8,572.0
3,3,1/1/2021 1:30,499.74,0.0,0.0,499.74,572.0
4,4,1/1/2021 2:00,496.19,0.0,0.0,496.19,572.0


In [5]:
import numpy as np

diff = df['Estimated_Online_Load_Control_MW'] - df['Control_Threshold_MW']
mae = diff.abs().mean()
rmse = np.sqrt((diff**2).mean())

print(f"Average absolute error (MAE): {mae:.3f}")
print(f"Root mean squared error (RMSE): {rmse:.3f}")

Average absolute error (MAE): 103.948
Root mean squared error (RMSE): 126.804


Potentail, reasonable, transparent way to show “how much better” the forecast is versus today’s static threshold logic. 

    • MAEforecast/RMSEforecast on  24–72 hr model  
    • MAEthreshold/RMSEthreshold between the current threshold and actual load  
    • ΔMAE = MAEthreshold – MAEforecast (and same for RMSE)  
    • % Improvement = ΔMAE / MAEthreshold × 100  



Todo:
    3. Consider additional KPIs if relevant—e.g. bias (mean error), peak‐load errors, or cost/risk metrics—so stakeholders see both average gains and tails.  
    4. Validate significance (e.g. bootstrap confidence intervals) if you need to prove the improvement isn’t due to chance.  



In [7]:
# Define columns (update `actual_col` if your actual‐load column has a different name)
forecast_col = 'Estimated_Online_Load_Control_MW'
threshold_col = 'Control_Threshold_MW'
actual_col = 'Actual_Online_Load_Control_MW'  

# 1. Bias (mean error)
# update actual_col to match your dataframe
actual_col = 'OnLine_Load_MW'

# 1. Bias (mean error)
df['error'] = df[forecast_col] - df[actual_col]
bias = df['error'].mean()

# 2. Peak‐load error (e.g. average absolute error when actual load in top 5%)
peak_cut = df[actual_col].quantile(0.95)
peak_err = df.loc[df[actual_col] >= peak_cut, 'error'].abs().mean()

# 3. Simple cost metric (assume $100 per MW error; adjust as needed)
unit_cost = 100
df['cost'] = df['error'].abs() * unit_cost
mean_cost = df['cost'].mean()
peak_cost = df.loc[df[actual_col] >= peak_cut, 'cost'].mean()

print(f"Bias (mean error): {bias:.3f} MW")
print(f"Peak‐load MAE (top 5%): {peak_err:.3f} MW")
print(f"Average cost: ${mean_cost:.2f}")
print(f"Peak‐load cost (top 5%): ${peak_cost:.2f}")

# 4. Bootstrap CI for ΔMAE = MAE_threshold – MAE_forecast
n_boot = 1000
delta_mae_bs = []
y = df[actual_col].values
y_hat = df[forecast_col].values
th = df[threshold_col].values
n = len(df)

for _ in range(n_boot):
  idx = np.random.choice(n, n, replace=True)
  mae_f = np.abs(y_hat[idx] - y[idx]).mean()
  mae_t = np.abs(th[idx] - y[idx]).mean()
  delta_mae_bs.append(mae_t - mae_f)

ci_low, ci_high = np.percentile(delta_mae_bs, [2.5, 97.5])
print(f"ΔMAE 95% CI: [{ci_low:.3f}, {ci_high:.3f}] MW")

Bias (mean error): 5.079 MW
Peak‐load MAE (top 5%): 22.053 MW
Average cost: $507.87
Peak‐load cost (top 5%): $2205.28
ΔMAE 95% CI: [nan, nan] MW


In [8]:
# Cost analysis under different electricity price scenarios ($/MWh)
prices = {
    'Residential': 120.90,
    'Commercial': 105.80,
    'Industrial': 85.00,
    'Average': 108.60,
    'Wholesale': 63.59 * 1.076  # include 7.6% increase for 2025
}

interval_hours = 0.5  # half‐hourly data

for sector, price in prices.items():
    col = f"cost_{sector.lower()}"
    df[col] = df['error'].abs() * interval_hours * price

mean_costs = {sector: df[f"cost_{sector.lower()}"].mean() for sector in prices}

for sector, cost in mean_costs.items():
    print(f"Mean cost ({sector}): ${cost:,.2f}")

print(f"Baseline mean cost (@ ${unit_cost}/MWh): ${mean_cost:,.2f}")

Mean cost (Residential): $307.01
Mean cost (Commercial): $268.66
Mean cost (Industrial): $215.85
Mean cost (Average): $275.77
Mean cost (Wholesale): $173.75
Baseline mean cost (@ $100/MWh): $507.87


In [9]:
# compute threshold‐based cost for each sector and then Δcost / % improvement
cost_stats = {}
for sector, price in prices.items():
    f_col = f"cost_{sector.lower()}"                      # forecast‐based cost
    # compute threshold‐based cost on the fly
    t_col = f"cost_thr_{sector.lower()}"
    df[t_col] = (df[threshold_col].subtract(df[actual_col])
                           .abs()
                           * interval_hours
                           * price)
    mean_f = df[f_col].mean()
    mean_t = df[t_col].mean()
    delta = mean_t - mean_f
    pct = (delta / mean_t) * 100 if mean_t else float("nan")
    cost_stats[sector] = (delta, pct)

for sector, (delta, pct) in cost_stats.items():
    print(f"{sector}: Δcost = ${delta:,.2f},  Improvement = {pct:.2f}%")

Residential: Δcost = $5,658.70,  Improvement = 94.85%
Commercial: Δcost = $4,951.95,  Improvement = 94.85%
Industrial: Δcost = $3,978.41,  Improvement = 94.85%
Average: Δcost = $5,083.00,  Improvement = 94.85%
Wholesale: Δcost = $3,202.52,  Improvement = 94.85%


Here’s a quick rundown of how each KPI and cost metric was computed from your DataFrame and the price inputs:

1. Errors between forecast, threshold and actual  
    - You have three series in df:  
      • forecast_col = Estimated_Online_Load_Control_MW  
      • threshold_col = Control_Threshold_MW  
      • actual_col = OnLine_Load_MW  
    - diff = forecast – threshold (used only for comparing threshold vs forecast).  
    - error = forecast – actual (used for most of the error metrics).

2. MAE and RMSE  
    - MAE = mean(|forecast – actual|)  
    - RMSE = sqrt(mean((forecast – actual)²))  
    - You also computed the analogous MAE_threshold = mean(|threshold – actual|), and ΔMAE = MAE_threshold – MAE_forecast.

3. Bias and peak‐load error  
    - Bias = mean(forecast – actual)  
    - Peak‐load error = mean(|forecast – actual|) over the top 5% of actual loads (actual ≥ 95th percentile).

4. Simple cost ($100/MW)  
    - cost = |forecast – actual| × unit_cost  
    - mean_cost = average of that over all rows  
    - peak_cost = same but restricted to the top 5% of loads

5. Bootstrap confidence interval on ΔMAE  
    - Re‐sample your index (n_boot times) with replacement  
    - For each sample compute MAE_forecast and MAE_threshold → record ΔMAE  
    - 95% CI = [2.5th, 97.5th] percentiles of that bootstrap ΔMAE distribution

6. Sector‐specific cost scenarios  
    - You have a dict of half‐hour prices for Residential, Commercial, Industrial, Average and Wholesale.  
    - cost_<sector> = |forecast – actual| × interval_hours (0.5h) × price  
    - mean_costs per sector = their averages

7. Threshold‐vs‐forecast cost savings  
    - For each sector build cost_thr_<sector> = |threshold – actual| × 0.5h × price  
    - Δcost = mean(cost_thr) – mean(cost_forecast)  
    - Improvement % = Δcost / mean(cost_thr) × 100  

All of these derive directly from your raw load columns plus the per‐MWh prices you supplied, turning deviations into dollars and then aggregating with means (and percentiles for peaks or CIs).