## Modeling

**Explain your reasoning behind the Algorithm used**

Energy consumption is a continous variable. Values in one time period are not discrete and absolute — strictly self-contained and grown from zero within that period — in fact they are the average of the values that fit within the cut-offs from the previous and next time periods.

Following this continuous characteristic, the previous target value is good indicator of what the next target value will be, and such characteristic makes this problem suitable for the implementation of a recurrent neural network. However, as it's good practice, simpler algorithms will be tried first. 

In [648]:
# Importing libraries

## data handling
import numpy as np, pandas as pd

## visualizations
import matplotlib.pyplot as plt, seaborn as sns

# Machine learning

## utilities
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from dask_ml.model_selection import GridSearchCV, RandomizedSearchCV
from dask.diagnostics import ProgressBar
## transformers
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.decomposition import PCA
from sklearn.feature_selection import RFE
## estimators
import sklearn.linear_model as sklm
import sklearn.svm as svm
from sklearn.neural_network import MLPRegressor
import keras

# widening the cells
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

In [620]:
# importing the dataset, the data shall keep its sequential nature (not be shuffled)
data = pd.read_csv("data_post_preprocessing.csv", parse_dates=True, index_col='date') 

In [621]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 19735 entries, 2016-01-11 17:00:00 to 2016-05-27 18:00:00
Data columns (total 6 columns):
TotalConsmp    19735 non-null float64
Press_mm_hg    19735 non-null float64
Windspeed      19735 non-null float64
Visibility     19735 non-null float64
H              19735 non-null float64
RTo            19735 non-null float64
dtypes: float64(6)
memory usage: 1.1 MB


In [632]:
# we have 5 predictors, and a target variable
data.head(5)

Unnamed: 0_level_0,TotalConsmp,Press_mm_hg,Windspeed,Visibility,H,RTo
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-01-11 17:00:00,540.0,733.75,6.166667,53.416667,45.331865,12.107153
2016-01-11 18:00:00,1370.0,734.266667,5.416667,40.0,45.311706,12.466059
2016-01-11 19:00:00,1190.0,734.791667,6.0,40.0,47.636091,12.685903
2016-01-11 20:00:00,960.0,735.283333,6.0,40.0,46.973889,12.941007
2016-01-11 21:00:00,760.0,735.566667,6.0,40.0,46.111587,13.420729


Given the computational expense of hyperparameter tuning via Grid Search, and the time-constraint, the data will be downsampled to hourly. 

In [633]:
data = data.resample('H').agg({ 'TotalConsmp': 'sum', 
                                'Press_mm_hg': 'mean', 
                                'Windspeed': 'mean',
                                'Visibility': 'mean',
                                'H': 'mean',
                                'RTo': 'mean',})

In [634]:
# Generating the training and testing data sets
X_train, X_test, y_train, y_test = train_test_split(data.drop(['TotalConsmp'], axis=1), \
                                                    data.TotalConsmp, \
                                                    test_size=0.2, random_state=None, shuffle=False)

Random_state was set to None and shuffle to False to maintain the continuity of the Time Series data, because, as stated above, the current value is informed/influenced by the previous value (i.e. power/energy values in consecutive time periods change relative to their previous value, not from absolute zero).

In [635]:
X_train.head()

Unnamed: 0_level_0,Press_mm_hg,Windspeed,Visibility,H,RTo
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-01-11 17:00:00,733.75,6.166667,53.416667,45.331865,12.107153
2016-01-11 18:00:00,734.266667,5.416667,40.0,45.311706,12.466059
2016-01-11 19:00:00,734.791667,6.0,40.0,47.636091,12.685903
2016-01-11 20:00:00,735.283333,6.0,40.0,46.973889,12.941007
2016-01-11 21:00:00,735.566667,6.0,40.0,46.111587,13.420729


## Classic ML Regression Models

In [636]:
scaler = MinMaxScaler(feature_range=(0, 1)) # used because of its inverse_transform method 
pca    = PCA(n_components=3)
estimator = MLPRegressor(learning_rate='adaptive', max_iter=5000)

In [637]:
pipe = Pipeline(steps=[('scaler', scaler), ('pca', pca), ('est', estimator)])

### GridSearch and CrossValidation

Grid Search is a standard machine learning tool for hyperparameter tuning. Given the prototyping-nature of this project, Grid Search will be used to find the model that performs the best on default (out-of-the-box) hyperparameters.

In [638]:
param_grid = {
    'est': [
        sklm.LinearRegression(),
        sklm.Lasso(),
        sklm.Ridge(),
        svm.SVR(), 
        svm.NuSVR(),
        svm.LinearSVR(), 
        MLPRegressor(), 
        MLPRegressor(learning_rate='adaptive', max_iter=5000, hidden_layer_sizes=(50, 100, 2)) # params found via prev grid search
    ]
}

#     'est__activation': ['logistic', 'relu', 'tanh'],
#     'est__solver': ['sgd', 'adam' 'lbfgs']
#     'est_hidden_layer_sizes': [(100,), (500,), (1000,), (10, 100), (50, 100, 2)]

In [644]:
# grid search for classic model comparison
gs_cmc = GridSearchCV(estimator = pipe,
                  scoring = 'neg_mean_squared_error',
                  param_grid = param_grid,
                  n_jobs= -1, 
                  cv = 5 )

In [645]:
with ProgressBar():
    gs_cmc.fit(X_train, y_train)

[######                                  ] | 16% Completed |  0.2s



[########                                ] | 20% Completed |  0.4s



[#########                               ] | 24% Completed |  5.3s



[###########                             ] | 27% Completed |  8.6s



[############                            ] | 30% Completed |  9.5s



[##############################          ] | 76% Completed | 11.4s



[################################        ] | 80% Completed | 13.1s



[###################################     ] | 88% Completed | 15.7s



[########################################] | 100% Completed |  6min 12.0s




In [646]:
gs_cmc.cv_results_



{'params': [{'est': LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
            normalize=False)},
  {'est': Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)},
  {'est': Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)},
  {'est': SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
     gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
     tol=0.001, verbose=False)},
  {'est': NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, gamma='auto_deprecated',
      kernel='rbf', max_iter=-1, nu=0.5, shrinking=True, tol=0.001,
      verbose=False)},
  {'est': LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
        intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000,
        random_state=None, tol=0.0

In [647]:
print("Best score", gs_cmc.best_score_, "with ", gs_cmc.best_params_)

Best score -268180.49322994146 with  {'est': Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)}


## ______________________________________________________________________________________________

Classical machine learning models consider each sample independently and are thus not performing well in this problem. 
As previously stated, in time series data the current value is highly dependent on the previous value, and we are not feeding this previous value back as an input. Considering the previous target value to estimate the next is precisely what recurrent neural networks (RNNs) do.

To test the hypothesis that knowing the previous target value will yield better predictions, one could create another predictive feature that for each sample/row is equal to the target variable value of the previous row — using pandas shift method (.shift(1) on TotalConsmp column. However this was not tried because of the GridSearch time constraint and because it will not yield a definite solution as it will lock us into single-step and prevent multi-step predictions. This will be explained in the report but refers to the ability to predict only the next period, (hour, day) instead of various periods ahead. 