### Feature Exploration

Explore the definition of the following features.
* Preceding Load: Previous `n` hours of actual load
* Temperature Changes: Previous `n` hours of temperature changes
* Similar Days Preceding Load
* Discomfort level


---
**How to we the belief that the space of functions is in a neighborhood of the historical functions?**

Turn this subjective prior into an empirical one? Load is monotonic on certain
intervals, e.g. from overnight low to daily peak.

Create a daily cubic spline model for every historical day using hourly actual
load as control points. Cluster days according to coefficients in this model. 

Feature: Empirical probability that the current hours is peak.

Feature: Last `n` hours of temperature changes

---

In [1]:
%load_ext autoreload
%autoreload 2

from validation import DataSet

ds = DataSet('data/zone1.parquet', mtlf='LRZ1 MTLF (MWh)', actual='LRZ1 ActualLoad (MWh)')

ds.data.head()

Unnamed: 0_level_0,MSP,DayOfYear,HourEnding,IsBusinessHour,LRZ1 MTLF (MWh),LRZ1 ActualLoad (MWh)
hour,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-02-01 00:00:00-05:00,23.0,32,1,0,11099,11337.89
2015-02-01 01:00:00-05:00,21.02,32,2,0,10829,11014.87
2015-02-01 02:00:00-05:00,19.04,32,3,0,10565,10795.37
2015-02-01 03:00:00-05:00,19.04,32,4,0,10468,10714.42
2015-02-01 04:00:00-05:00,17.06,32,5,0,10432,10700.09


### Feature: Preceding `n` Hours of Actual Load

In [71]:
feature_names = ds.features.copy()
X = ds.data[feature_names].copy()
y = ds.data[ds.actual].copy()

num_hours_prior = 7
def prior_load_colname(i : int):
    return f"Actual Load {i} hours prior"

for i in range(num_hours_prior, 0, -1):
    col_name = prior_load_colname(i)
    X[col_name] = y.shift(i)
    feature_names.append(col_name)

# shift the beginning of our data by `num_hours_prior`
# our train/validate/test split allows the training set time period
# to change while the validate and test are both fixed to a year
X = X[num_hours_prior:]
y = y[num_hours_prior:]
X.head()

Unnamed: 0_level_0,MSP,DayOfYear,HourEnding,IsBusinessHour,Actual Load 7 hours prior,Actual Load 6 hours prior,Actual Load 5 hours prior,Actual Load 4 hours prior,Actual Load 3 hours prior,Actual Load 2 hours prior,Actual Load 1 hours prior
hour,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2015-02-01 07:00:00-05:00,14.0,32,8,0,11337.89,11014.87,10795.37,10714.42,10700.09,10765.42,10977.36
2015-02-01 08:00:00-05:00,15.08,32,9,0,11014.87,10795.37,10714.42,10700.09,10765.42,10977.36,11320.16
2015-02-01 09:00:00-05:00,12.92,32,10,0,10795.37,10714.42,10700.09,10765.42,10977.36,11320.16,11744.05
2015-02-01 10:00:00-05:00,12.92,32,11,0,10714.42,10700.09,10765.42,10977.36,11320.16,11744.05,12071.1
2015-02-01 11:00:00-05:00,12.92,32,12,0,10700.09,10765.42,10977.36,11320.16,11744.05,12071.1,12328.69


In [72]:
from validation import Error, walkforward
import numpy as np
from sklearn.neighbors import KNeighborsRegressor

knn = KNeighborsRegressor(7, weights='distance')

def next_hour(actual, previous_hour, predicted_load):
    next = actual.copy()
    # HACK: assume the temps are forecasted...
    for i in range(num_hours_prior, 1, -1):
        next[prior_load_colname(i)] = previous_hour[prior_load_colname(i-1)].iloc[0]

    next[prior_load_colname(1)] = predicted_load
    return next

(predictions, errors) = walkforward(knn, X, y,
                                    ds.validation_start, ds.validation_end, next_hour)
overall_error = Error(ds.validation_data[ds.actual], np.concatenate(predictions))
print(overall_error)