# Capstone Project Title
___
**Author**: Evan Holder

### Overview <a class="anchor" id="Overview"></a>
___

### Business Problem <a class="anchor" id="Business-Problem"></a>
___

### Data Collection <a class="anchor" id="Data-Collection"></a>
___


### Data Cleaning <a class="anchor" id="Data-Cleaning"></a>
____

### Import Libraries and Functions  <a class="anchor" id="Import-Libraries-and-Funtions"></a>
___
Data manipulation, cleaning, massaging: pandas, numpy<br>
Modeling: sklearn, keras<br>
Plotting: matplotlib<br>
Custom functions: function.py

In [1]:
# Libraries for data cleaning, massaging:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
import datetime as dt


# Modeling Libraries
from xgboost import XGBRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.feature_selection import RFE
from sklearn.linear_model import Lasso

import tensorflow as tf
from tensorflow import keras
from keras import layers, models, regularizers
from tensorflow.keras.layers import TimeDistributed
from sklearn.preprocessing import MinMaxScaler

# Save Models
import pickle

# Plotting
import matplotlib.pyplot as plt
import seaborn as sns

# Manipulate directories 
import os

# Import custom functions
os.chdir('../scripts')
from functions import split_data, sMAPE, SMAPE, compute_metrics, r2,impute_immediate_mean
from functions import resample, plot_metric_range, compile_fit, ensemble_nn
os.chdir('../notebooks')

#from tensorflow.keras.preprocessing import timeseries_dataset_from_array
#from sklearn.model_selection import train_test_split, GridSearchCV
#from sklearn.metrics import r2_score


### Import Data <a class="anchor" id="Import-Data"></a>
___

In [2]:
# Read in data
df_lag = pd.read_csv('../data/clean/df_clean_lag.csv', index_col=0, parse_dates=True)

### Data Preparation <a class="anchor" id="Data-Preparation"></a>
___
In this project, we'll focus on three main algorithms types: Lasso Regression, XGBoost, and Neural Networks.  We'll need to prepare the data in slightly different ways for each of the these model types. Much of the preprocessing was already taken care of as part of the steps list above in [Data Cleaning](#Data-Cleaning). The remaining steps are model specific, and so are prepared below:<br><br>
**Lasso Regression**:<br>
* Encode the categorical features
* Remove mulitcolinearities

**Neural Networks**:<br>
* Encode the categorical features

**XGBoost**: All preprocessing steps were previously 
While not required, we'll scale the continuous features for neural networks. And finally for lasso regression, it would be wise to remove any multicolinearities in the data. In order to run each of these models, I'll copy the dataset and process

### Encode Catergorical Features

In [3]:
# Get Categorical columns
categorical = df_lag.select_dtypes(include='object')

# Instationate wind_dir_coder LabelEncoder, fit
wind_dir_coder = LabelEncoder()
wind_dir_coder.fit(df_lag['wind_madrid_lag'])

# Transform wind_direction cols
for col in categorical.filter(regex='wind').columns:
    df_lag[col] = wind_dir_coder.transform(df_lag[col])
    

# Stack condition columns into single col
stacked_conditions = categorical.filter(regex='condition').stack()

# Instantiate condition_coder LabelEncoder, fit on stacked conditions
condition_coder = LabelEncoder()
condition_coder.fit(stacked_conditions)

# Transform condition cols
for col in categorical.filter(regex='condition').columns:
    df_lag[col] = condition_coder.transform(df_lag[col])

In [4]:
# Get price components not to be used in modeling
price_cols = df_lag.filter(regex='price').columns.to_list()[1:]
price_cols.remove('price_day_ahead')


### Scaling Continuous Features
For neural networks, continuous features do not necessarily need to be scaled. However according to this [article](https://www.sciencedirect.com/science/article/pii/S030626191830196X#s0235), which uses neural networks to predict electrical prices, scaling your continuous features generally increases accuracy of deep learning models on the validation set.  We'll give it a go here and scale the the continuos features between [-1,1] for the neural networks we'll train later.

In [5]:
# Copy the dataframe for neural networks
df_nn = df_lag.drop(columns=price_cols).copy()
continuous = df_nn.select_dtypes(exclude='object').filter(regex='^(?!.*price).*').columns

# Get rid of negatives
time = dt.datetime(2021,3,24,22)
df_nn.loc[time, 'dew_point_bilbao_lag'] = impute_immediate_mean(df_nn['dew_point_bilbao_lag'], time)

# Rescale data [-1,1]
scaler = MinMaxScaler(feature_range=(-1, 1))
df_nn[continuous] = scaler.fit_transform(df_nn[continuous])

### Multicolinearity
One of the assumptions for regression is that featu
res do not contain multicolinearities.  In this section, we'll investigate the predictors and eliminate any multicolinearities in preparation for a lasso regression. We'll need to find out which features are correlated with each other, and remove some of them to rid our dataset of multicolinearities. The steps are outlined below:
1. Copy the dataframe, we'll modify this dataset for use in lasso regression
2. Get the correlations between predictors, sort them in descending order
3. Get the correlations between each individual predictor and the response variable
4. Get the features which have a correlation greater than 0.8, add the feature which correlates less with price_actual to the drop list
5. Drop the features in the drop list

In [6]:
# Copy dataset for lasso regression specific preparation
df_lr = df_lag.drop(columns=price_cols).copy()

# Create correlation matrix predictors to predictors
corr = df_lr.drop(columns='price_actual').corr().abs().stack().reset_index().sort_values(0, ascending=False)
corr.rename(columns={0:'cor'}, inplace=True)  # Rename correlation column
corr = corr.loc[corr['cor']!=1]  # remove correlations between same variables
corr.drop_duplicates(subset='cor', inplace=True) # remove duplicate correlations
corr.reset_index(drop=True, inplace=True) # Reset the index
corr.cor =corr.cor.apply(lambda x: round(x,3))  # Round

# Create correlation matrix predictors to response variable
corr_price = df_lr.corr()['price_actual'].reset_index().sort_values('price_actual', ascending=False)
corr_price = corr_price.loc[corr_price['price_actual']!=1] # remove correlations between same variables
corr_price.reset_index(drop=True, inplace=True)  # Reset the index


drop = []

# For each feature pair where corr > 0.8, add feature with lower corr to price_actual to drop list
for row in range(len(corr.loc[corr.cor>.8])):
    var1 = corr.loc[row,'level_0'] # Get var1 name
    var2 = corr.loc[row,'level_1'] # Get var2 name
    var1_corr = float(corr_price.loc[corr_price['index'] == var1, 'price_actual'])  # Get var1 corr
    var2_corr = float(corr_price.loc[corr_price['index'] == var2, 'price_actual'])  # Get var2 corr
    
    # Add the lower correlation to the drop list
    if var1_corr > var2_corr:
        drop.append(var2)
    else:
        drop.append(var1)
        
# Drop the features in the drop listi
df_lr.drop(columns=drop, inplace=True)

In [7]:
df_xg = df_lag.drop(columns=price_cols)

### Modeling `price_actual` <a class="anchor" id="Modeling-`price_actual`"></a>
___

Create results_actual dataframe to hold results

Add TSO (price_day_ahead) as benchmark prediction to beat

In [8]:
# Benchmark results
TSO_train = df_lag.loc[:'2019', 'price_day_ahead']
TSO_val = df_lag.loc['2020', 'price_day_ahead']

actual_train = df_lag.loc[:'2019', 'price_actual']
actual_val = df_lag.loc['2020', 'price_actual']

# Create dataframe
results_actual = pd.DataFrame(index=['Parameters','SMAPE_train', 'SMAPE_val', 'r2_train', 'r2_val'])

# Add the baseline TSO predictions
results_actual['TSO'] = ['None',
                                    round(sMAPE(actual_train, TSO_train), 3),
                                    round(sMAPE(actual_val, TSO_val), 3),
                                    round(r2(actual_train, TSO_train), 3), 
                                    round(r2(actual_val, TSO_val),3)]
results_actual

Unnamed: 0,TSO
Parameters,
SMAPE_train,16.03
SMAPE_val,16.922
r2_train,0.954
r2_val,0.971


In [10]:
X_train, y_train, X_val, y_val = split_data(df_lr, 2020, 'price_actual')
with open('../models/Lasso.pickle', 'rb') as file:
    lasso = pickle.load(file)
with open('../models/Lasso1.pickle', 'rb') as file:
    lasso1 = pickle.load(file)
with open('../models/Lasso2.pickle', 'rb') as file:
    lasso2 = pickle.load(file)
with open('../models/XGBoost.pickle', 'rb') as file:
    xg = pickle.load(file)
with open('../models/XGBoost1.pickle', 'rb') as file:
    xg1 = pickle.load(file)
with open('../models/XGBoost2.pickle', 'rb') as file:
    xg2 = pickle.load(file)
with open('../models/XGBoost3.pickle', 'rb') as file:
    xg3 = pickle.load(file)
nn1 = keras.models.load_model('../models/nn1', custom_objects={'SMAPE':SMAPE})
nn2 = keras.models.load_model('../models/nn2', custom_objects={'SMAPE':SMAPE})
nn3 = keras.models.load_model('../models/nn3', custom_objects={'SMAPE':SMAPE})
nn4 = keras.models.load_model('../models/nn4', custom_objects={'SMAPE':SMAPE})
nn5 = keras.models.load_model('../models/nn5', custom_objects={'SMAPE':SMAPE})
nn6 = keras.models.load_model('../models/nn6', custom_objects={'SMAPE':SMAPE})
    

results_actual['Lasso'] = compute_metrics(lasso, 'Vanilla', (X_train, y_train), (X_val, y_val))
results_actual['Lasso1'] = compute_metrics(lasso, {'num_features':5}, (X_train, y_train), (X_val, y_val))
X_train1, X_val1 = X_train.drop(columns='price_day_ahead'), X_val.drop(columns='price_day_ahead')
results_actual['Lasso2'] = compute_metrics(lasso2, {'price_day_ahead':False}, (X_train1, y_train), (X_val1, y_val))
X_train, y_train, X_val, y_val = split_data(df_xg, 2020, 'price_actual')
results_actual['XGBoost'] = compute_metrics(xg, 'Vanilla',(X_train, y_train), (X_val, y_val))
results_actual['XGBoost1'] = compute_metrics(xg1, {'max_depth':2},(X_train, y_train), (X_val, y_val))
X_train1, X_val1 = X_train.drop(columns='price_day_ahead'), X_val.drop(columns='price_day_ahead')
results_actual['XGBoost2'] = compute_metrics(xg2, {'max_depth':2,
                                                   'price_day_ahead':False},(X_train1, y_train), (X_val1, y_val))
results_actual['XGBoost3'] = compute_metrics(xg3,
                                             {'max_depth':16, 'price_day_ahead':False},
                                             (X_train1, y_train), 
                                             (X_val1, y_val))


X_train, y_train, X_val, y_val = split_data(df_nn, 2020, 'price_actual')
results_actual['nn1'] = compute_metrics(nn1, '1-to-1', (X_train,y_train), (X_val, y_val))

X_train, y_train, X_val, y_val = split_data(df_nn.drop(columns='price_day_ahead'), 2020, 'price_actual')
results_actual['nn2'] = compute_metrics(nn2, '1-to-1, price_day_ahead:False', (X_train,y_train), (X_val, y_val))

X_train, y_train, X_val, y_val = split_data(df_nn, 2020, 'price_actual')
X_train, y_train = resample((X_train, y_train), 24, 24, 24)
X_val, y_val = resample((X_val,y_val), 24, 24, 24)
results_actual['nn3'] = compute_metrics(nn3, '24-to-24', (X_train,y_train), (X_val, y_val))


X_train, y_train, X_val, y_val = split_data(df_nn, 2020, 'price_actual')
X_train, y_train = resample((X_train, y_train), 24*7, 24, 24)
X_val, y_val = resample((X_val,y_val), 24*7, 24, 24)
results_actual['nn4'] = compute_metrics(nn4, 'LSTM, 7-day input', (X_train,y_train), (X_val,y_val))
results_actual['nn5'] = compute_metrics(nn5, 'LSTM, 7-day input', (X_train,y_train), (X_val,y_val))

In [11]:
X_train, y_train, X_val, y_val = split_data(df_nn, 2020, 'price_actual')
X_train_lstm = X_train.filter(regex='lag')
X_train_dnn = X_train.drop(columns=X_train_lstm.columns)
X_val_lstm = X_val.filter(regex='lag')
X_val_dnn = X_val.drop(columns=X_val_lstm.columns)
X_train_dnn, y_train_dnn = resample((X_train_dnn, y_train), 24, 24, 24)
X_val_dnn, y_val_dnn = resample((X_val_dnn, y_val), 24, 24, 24)
X_train_lstm, y_train_lstm = resample((X_train_lstm, y_train), 24, 24, 24)
X_val_lstm, y_val_lstm = resample((X_val_lstm, y_val), 24, 24, 24)
results_actual['nn6'] = compute_metrics(nn6,
                                        'dnn-lstm, 1-day input',
                                        ([X_train_dnn, X_train_lstm],y_train),
                                        ([X_val_dnn, X_val_lstm],y_val))

In [12]:
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.021,5.869,0.977,0.973
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403
nn1,1-to-1,2.468,3.946,0.983,0.979
nn2,"1-to-1, price_day_ahead:False",6.849,23.042,0.907,0.428


### Lasso Regression
I'll start with a simple Lasso Regression.  Lasso Regression is really just a a linear regression that introduces a penalty infront of each coefficient in the model. Lasso is well-suited for datasets with high multicolinearities since it automatically selects for one of the features in a a colinear pair. As part of the [preprocessing] for this model, I also remove features with high colinearity (>0.8). In fitting the below, I'll take the following steps:

* Split the data into training (2015-2019) and validation (2020)
* Fit a Vanilla lasso regression model with max_iter=10000 to make sure that the model converges.
* Compute the output and add it to the results table.

In [12]:
# Split the data
X_train, y_train, X_val, y_val = split_data(df_lr, 2020, 'price_actual')

# Instatiate and fit model on 
lasso = Lasso(max_iter=10000)
lasso.fit(X_train, y_train)

# Add results of vanilla lasso to dataframe
results_actual['Lasso'] = compute_metrics(lasso, 'Vanilla', (X_train, y_train), (X_val, y_val))
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.021,5.869,0.977,0.973
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403
nn1,1-to-1,2.468,3.946,0.983,0.979
nn2,24-to-24,4.022,3.848,0.983,0.979


In [33]:
with open('../models/lasso_vanilla.pickle', 'rb') as file:
    lasso_test = pickle.load(file)

The vanilla model performed extremely well.  That's great, but it also isn't suprising given that we used the `price_day_ahead` as a predictor which has an r-squared value of 0.971 on the validation set.  The model outperformed the TSO predictions in SMAPE and even increased r-squared by a small margin. Increased r-squared must mean that some of the other features were important in our prediction of `price_actual`.  I plotted the coefficients for this vanilla model on the below barchart.

<img src="../images/lasso_feature_importance.png" style="width:700px;height:272px"/>

As expected, `price_day_ahead` dominates this model, though renewable generation and waste generation have an impact, negatively affecting the price (renewable and waste increase, results in price decrease).  Other than that, the other features have very little influence on the final price.  

**Recursive Feature Elimination**<br>
As part of the [Lasso](https://github.com/EvanHolder/capstone/blob/main/notebooks/LassoRegression.ipynb) notebook, I ran a recursive feature elimination to see how the model performs with varying amounts of features in the model.  I started with a single feature (`price_day_ahead`), trained a model, and computed its metrics. Then I iteratively added in the next most important feature, trained the new model, and computed its metrics.  This process was repeated until all features were added back into the training set and the metrics were plotted as below.

![RFE_LassoRegression](../images/RFE_LassoRegression.png)

As shown above, when trained on top five features, the model minimizes r-squared.  Below, I'll train the model on these top five features (`price_day_ahead`, `renewable_lag`,`waste_lag`,`oil_lag`,`humidities_seville_lag`).

In [31]:
# Important Features
train_cols = ['humidities_bilbao_lag', 'oil_lag', 'renewable_lag', 'waste_lag', 'price_day_ahead']

# Instatiate and fit model on 
lasso1 = Lasso(max_iter=10000)
lasso1.fit(X_train[train_cols], y_train)

# Add results of vanilla lasso to dataframe
results_actual['Lasso1'] = compute_metrics(lasso1, {'num_features':5}, (X_train[train_cols], y_train), (X_val[train_cols], y_val))

# Save model
with open('../models/Lasso1.pickle', 'wb') as f:
    pickle.dump(lasso1, f)
    
# Preview
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},4.134,8.354,0.954,0.971
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403
nn_1-1,"{'Dense1': 59, 'Dense2': 239, 'Dense3': 1}",200.0,200.0,,
nn-24-24,"{'Dense1': 59, 'Dense2': 239, 'Dense3': 162, '...",3.871,3.738,0.981,0.978


Since Lasso1 includes only five predictors, it's not surprising that the model performance (SMAPE & r-squared) decreased from the vanilla model.  Performance decreased only marginally though.  Next, let's see just how well we can do without using `price_day_ahead`.

In [11]:
# Drop 'price_day_ahead'
X_train1, X_val1 = X_train.drop(columns='price_day_ahead'), X_val.drop(columns='price_day_ahead')

# Instatiate and fit model on 
lasso2 = Lasso(max_iter=10000)
lasso2.fit(X_train1, y_train)

# Add results of vanilla lasso to dataframe
results_actual['Lasso2'] = compute_metrics(lasso2, {'price_day_ahead':False}, (X_train1, y_train), (X_val1, y_val))

# Save Model
with open('../models/Lasso2.pickle', 'wb') as f:
    pickle.dump(lasso2, f)

# Preview
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.367,5.056,0.971,0.969
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557


Lasso2 performed approximately half as well as the other two Lasso models (SMAPE, r-squared).  The difference in performance between these models indicates that we really need `price_day_ahead` to match the TSO's performance.

### XGBoost
XGBoost is the next choice in Machine Learning algorithms because of it's ability to learn non-linear decision boundaries.  So while we were unsuccessful in modeling without `price_day_ahead` with Lasso Regression, we'll give it another shot here with XGBoost.  XGBoost was chosen over Random Forest and other gradient boosted ensembles because it trains the fastest and generally performs better. The other great thing about XGBoost is it requires very little prepocessing. In fitting the below, I'll take the following steps:

* Split the data into training (2015-2019) and validation (2020)
* Fit a Vanilla XGBoostRegressor with random_state set to 17
* Compute the output and add it to the results table.

In [12]:
# Split the data
X_train, y_train, X_val, y_val = split_data(df_xg, 2020, 'price_actual')

# Instantiate and fit XGBRegressor
xg = XGBRegressor(random_state=17)
xg.fit(X_train, y_train)

# Compute sMAPE, r2 and add to the table
results_actual['XGBoost'] = compute_metrics(xg, 'Vanilla',(X_train, y_train), (X_val, y_val))

# Save Model
with open('../models/XGBoost.pickle', 'wb') as f:
    pickle.dump(xg, f)

# Preview
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.367,5.056,0.971,0.969
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968


The vanilla model did beat the TSO performance (SMAPE , r-squared) but surprisingly did not outperform the vanilla Lasso model. The model does not appear too overfit on r2_val, but partly on SMAPE_val.  As part of the [XGBoost notebook](https://github.com/EvanHolder/capstone/blob/main/notebooks/XGBoost.ipynb) I tried to reduce this overfitting on by iteratively training XGBoost models and tuning the below parameters.

max_depth....................... [1,2,4,6,8,10,14,16,20],<br>
gamma............................. [n/10 for n in range(11)]<br>
min_child_weight.............. [1,2,4,8,16,32],<br>
subsample........................ [n/10 for n in range(0, 12, 2)],<br>
colsample_bytree............. [n/10 for n in range(0, 12, 2)],<br>
reg_alpha......................... [.001, .01, .1, .5, 1],<br>
reg_lambda...................... [.001, .01, .1, .5, 1]

Adjusting all of these parameters in their respective ranges, there really wasn't much improvement in either of the metrics sMAPE or r2. The one exception may be `max_depth` which did see minor improvement in SMAPE when reduced set to 2.

Below, I'll run another model ith `max_depth` set to 2 and add it to the results table.

Tuning these parameters did not change the metrics in a substantial way.  `Max_depth`, the most influential, made only a marginal difference SMAPE.  

In [13]:
# Split the data
X_train, y_train, X_val, y_val = split_data(df_xg, 2020, 'price_actual')

# Instantiate and fit XGBRegressor
xg1 = XGBRegressor(random_state=17, max_depth=2)
xg1.fit(X_train, y_train)

# Compute sMAPE, r2 and add to the table
results_actual['XGBoost1'] = compute_metrics(xg1, {'max_depth':2},(X_train, y_train), (X_val, y_val))

# Save Model
with open('../models/XGBoost1.pickle', 'wb') as f:
    pickle.dump(xg1, f)

# Preview
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.367,5.056,0.971,0.969
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97


XGBoost1 does have a reduced SMAPE_val and increased r2_val, but does not outperform Lasso.  Let's next look at the feature importances from XGBoost1. I suspect that like Lasso, and given that a max tree depth of two is all we need for good performance, that `price_day_ahead` again dominates.

In [14]:
imp = pd.DataFrame({'importance':xg1.feature_importances_},
                   index=X_train.columns).sort_values(by='importance', ascending=False)
imp.head(10)

Unnamed: 0,importance
price_day_ahead,0.735356
biomass_lag,0.046353
renewable_lag,0.037915
load_forecast,0.035208
coal_lag,0.034028
waste_lag,0.020222
transmission_fs_lag,0.008342
dew_point_seville_lag,0.007706
solar_lag,0.006477
reservoir_lag,0.006204


As expected, the `price_day_ahead` has dominated.  Let's remove it and see how we do.

In [15]:
# Drop price_day_ahead
X_train1, X_val1 = X_train.drop(columns='price_day_ahead'), X_val.drop(columns='price_day_ahead')


# Instantiate and fit XGBRegressor
xg2 = XGBRegressor(random_state=17)
xg2.fit(X_train1, y_train)

# Compute sMAPE, r2 and add to the table
results_actual['XGBoost2'] = compute_metrics(xg2, {'max_depth':2,
                                                   'price_day_ahead':False},(X_train1, y_train), (X_val1, y_val))
# Save Model
with open('../models/XGBoost2.pickle', 'wb') as f:
    pickle.dump(xg2, f)

# Preview
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.367,5.056,0.971,0.969
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427


Again without our most important feature, the model performance has essentially halved (SMAPE and r2).  I further tuned XGBoost trained without `price_day_ahead` but the peformance did not improve substantially. Increasing tree depth was the only parameter that seemed to increase performance.  Below is another list of the tuned parameters, see the [notebook](https://github.com/EvanHolder/capstone/blob/main/notebooks/XGBoost.ipynb) for the plots displaying each model, and how it performed with the below list of tuned parameters.

max_depth....................... [1,2,4,6,8,10,14,16,20],<br>
gamma............................. [n/10 for n in range(11)]<br>
min_child_weight.............. [1,2,4,8,16,32],<br>
subsample........................ [n/10 for n in range(0, 12, 2)],<br>
colsample_bytree............. [n/10 for n in range(0, 12, 2)],<br>
reg_alpha......................... [.001, .01, .1, .5, 1],<br>
reg_lambda...................... [.001, .01, .1, .5, 1]

Lastly, I'll fit an XGBoost model with a max_depth of 16 and add it to the table

In [16]:
# Instantiate and fit XGBRegressor
xg3 = XGBRegressor(random_state=17, max_depth=16)
xg3.fit(X_train1, y_train)

# Compute sMAPE, r2 and add to the table
results_actual['XGBoost3'] = compute_metrics(xg3,
                                             {'max_depth':16, 'price_day_ahead':False},
                                             (X_train1, y_train), 
                                             (X_val1, y_val))
# Save Model
with open('../models/XGBoost3.pickle', 'wb') as f:
    pickle.dump(xg3, f)

#Preview
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.367,5.056,0.971,0.969
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403


### Neural Networks (1-to-1) <a class="anchor" id="Neural-Networks (1-to-1)"></a>
The last class of alorithms to try and fit is neural nets.  These may fit the model better since they have the ability to find non-linear patterns through the hidden layers of the network.  In addition, they more readily take in sequence data. The intuition is that while our features may not have a direct relationship with the actual price, they may nudge the price in a certain direction over the course of time. To start off, we'll set up a simple 1-to-1 (one input, one output) model and see how that performs. The steps to fit this model are:
* Split the data into training (2015-2019) and validation (2020)
* Set the input_shape
* Establish network architecture
* Compute the output and add it to the results table.

In [13]:
# Split Data
X_train, y_train, X_val, y_val = split_data(df_nn, 2020, 'price_actual')

# Define input_shape
input_shape = (X_train.shape[1],)

# Instantiate model and build layers
nn = models.Sequential()
nn.add(layers.Dense(59, activation='relu', input_shape=input_shape))
nn.add(layers.Dense(239, activation='relu'))
nn.add(layers.Dense(162, activation='relu'))
nn.add(layers.Dense(1, activation='relu'))

# Compile and Fit
nn1 = compile_fit(nn, (X_train,y_train), (X_val, y_val), patience=10,
                  loss = tf.keras.metrics.mean_absolute_error)

# Compute metrics, add to table
results_actual['nn1'] = compute_metrics(nn1, '1-to-1', (X_train,y_train), (X_val, y_val))

# Save Model
nn1.save('../models/nn1')

# Preview
results_actual.T

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
INFO:tensorflow:Assets written to: ../models/nn1\assets


Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.021,5.869,0.977,0.973
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403
nn-24-24,"{'Dense1': 59, 'Dense2': 239, 'Dense3': 162, '...",3.871,3.738,0.981,0.978
LSTM,"{'LSTM1': 60, 'LSTM2': 24, 'TimeDistributed': ...",8.805,10.646,0.787,0.804


The first neural network is our best yet outperforming both Lasso and XGBoost (SMAPE and r2). This is good news, and bodes well for the more complicated neural networks we'll set up soon. Before that, let's run another model without `price_day_ahead`

In [16]:
# Split Data
X_train, y_train, X_val, y_val = split_data(df_nn.drop(columns='price_day_ahead'), 2020, 'price_actual')

# Define input_shape
input_shape = (X_train.shape[1],)

# Instantiate model and build layers
nn = models.Sequential()
nn.add(layers.Dense(59, activation='relu', input_shape=input_shape))
nn.add(layers.Dense(239, activation='relu'))
nn.add(layers.Dense(162, activation='relu'))
nn.add(layers.Dense(1, activation='relu'))

# Compile and Fit
nn2 = compile_fit(nn, (X_train,y_train), (X_val, y_val), patience=10,
                  loss = tf.keras.metrics.mean_absolute_error)

# Compute metrics, add to table
results_actual['nn2'] = compute_metrics(nn2, '1-to-1, price_day_ahead:False', (X_train,y_train), (X_val, y_val))

# Save Model
nn2.save('../models/nn2')

# Preview
results_actual.T

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
INFO:tensorflow:Assets written to: ../models/nn2\assets


Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.021,5.869,0.977,0.973
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403
nn1,1-to-1,2.468,3.946,0.983,0.979
nn2,"1-to-1, price_day_ahead:False",6.849,23.042,0.907,0.428


As expected performance decreased (SMAPE and r2).

### Neural Networks (24 to 24)
The next network will be a little more complicated by using sequences to predict another sequence. We'll reshape the input and output into 24 hour sequences, and 1825 batches. The network architecture change slightly too. I'll wrap the output Dense layer in a TimeDistributed wrapper.  This will transform my output into a vector of length 24, to match up with the output sequences of length 24. The steps to fit this model are the same as the 1-to-1 network with an extra step to process the input/output data.

In [14]:
results_actual.drop(columns='nn-24-24', inplace=True)

In [14]:
# Split Data
X_train, y_train, X_val, y_val = split_data(df_nn, 2020, 'price_actual')

# Reorganize the training and testing data into batches
X_train, y_train = resample((X_train, y_train), 24, 24, 24)
X_val, y_val = resample((X_val,y_val), 24, 24, 24)

# Define input_shape
input_shape = (X_train.shape[1], X_train.shape[2])

# Instantiate model and build layers
nn = models.Sequential()
nn.add(layers.Dense(59, activation='relu', input_shape=input_shape))
nn.add(layers.Dense(239, activation='relu'))
nn.add(layers.Dense(162, activation='relu'))
nn.add(TimeDistributed(layers.Dense(1)))

# Compile and Fit
nn3 = compile_fit(nn, (X_train, y_train), (X_val, y_val))

# Compute metrics and add to table
results_actual['nn3'] = compute_metrics(nn3, '24-to-24', (X_train,y_train), (X_val, y_val))

# Save model
nn3.save('../models/nn3')

# Preview
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.021,5.869,0.977,0.973
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403
nn1,1-to-1,2.468,3.946,0.983,0.979
nn2,"1-to-1, price_day_ahead:False",6.849,23.042,0.907,0.428


Great, we have again improved (be it slightly) in SMAPE_val and r2_val.

### LSTM Neural Network
Long-Short-Term-Memory neural networks are a type of recurrent neural network that have gates that allow the network to "remember" and "forget" information from a specified input window.  In this way, the model is able to better estimate time-dependent output sequences. Since much of the dataset are time-series sequences (including price_actual) it is possible that an LSTM will be able to pick up on time-dependent relationships that our other alogorithms and networks could not. The process for setting up this network is the same as the previous.  Architecture willl be change to include two LSTM layers, and a repeat vector layer. 

In [15]:
# Split the data into train and validation
X_train, y_train, X_val, y_val = split_data(df_nn, 2020, 'price_actual')

# Reorganize the training and testing data into batches
X_train, y_train = resample((X_train, y_train), 24*7, 24, 24)
X_val, y_val = resample((X_val,y_val), 24*7, 24, 24)

# Input Shape
input_shape = (X_train.shape[1], X_train.shape[2])

# Instantiate model and build layers
nn = models.Sequential()
nn.add(layers.LSTM(60, activation='tanh', input_shape=input_shape))
nn.add(layers.RepeatVector(y_train.shape[1]))
nn.add(layers.LSTM(24, activation='tanh', return_sequences=True))
nn.add(TimeDistributed(layers.Dense(1)))

# Compile Fit
nn4 = compile_fit(nn, (X_train, y_train), (X_val, y_val))

# Compute metrics, add to table
results_actual['nn4'] = compute_metrics(nn4, 'LSTM, 7-day input', (X_train,y_train), (X_val,y_val))

# Save Model
nn4.save('../models/nn4')

# Preview
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.021,5.869,0.977,0.973
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403
nn1,1-to-1,2.468,3.946,0.983,0.979
nn2,"1-to-1, price_day_ahead:False",6.849,23.042,0.907,0.428


While the model trained and did not overfit, it did not outpeform the previous models.  Just once, I'll try to the number of nodes in the first LSTM layer to see if that makes a difference.

In [16]:
# Input Shape
input_shape = (X_train.shape[1], X_train.shape[2])

# Instantiate model and build layers
nn = models.Sequential()
nn.add(layers.LSTM(83, activation='tanh', input_shape=input_shape))
nn.add(layers.RepeatVector(y_train.shape[1]))
nn.add(layers.LSTM(24, activation='tanh', return_sequences=True))
nn.add(TimeDistributed(layers.Dense(1)))

# Compile Fit
nn5 = compile_fit(nn, (X_train, y_train), (X_val, y_val))

# Compute metrics, add to table
results_actual['nn5'] = compute_metrics(nn5, 'LSTM, 7-day input', (X_train,y_train), (X_val,y_val))

# Save Model
nn5.save('../models/nn5')

# Preview
results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.021,5.869,0.977,0.973
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403
nn1,1-to-1,2.468,3.946,0.983,0.979
nn2,"1-to-1, price_day_ahead:False",6.849,23.042,0.907,0.428


Unsurprisingly, the addition of a few extra nodes in the first layer did not change the performance of the LSTM.

### DNN-LSTM
The motivation for this model comes from this [article](https://www.sciencedirect.com/science/article/pii/S030626191830196X#s0220). The idea is that time-dependent features (values from the past) can be modeled by an LSTM, but if the data represetns a specific property associated with the day ahead (forecast data) then it cannot be modeled as a time sequence.  In this case, all the columns in the dataset with `_lag` suffix are past values, while all other columns represent future predictions.  With this distinction, I'll set up a combined DNN-LSTM model:

**DNN**
* Single best non-lstm neural network
* Trained on forecast columns only

**LSTM**
* Single best lstm neural network
* Trained on the `_lag` columns

**Ensemble**<br>
In order to combine these network, I created the function [`ensemble_nn`](https://github.com/EvanHolder/capstone/blob/main/scripts/functions.py) which takes in a list of models. For each model in the list, the function changes the model's layers untrainable.  Once complete, the function sets up another mini neural network.  The architecture is the concatenated output of each model, then hidden dense layer with 24 nodes, and finally a TimeDistributed dense layer as the final output.  Once the dnn and lstm are trained on their respective columns, we can call the `ensemble_nn` function to instantiate it.  All that's left to do is fit the resulting ensemble on the data and compute the results.

In the Neural Networks [notebook](https://github.com/EvanHolder/capstone/blob/main/notebooks/NeuralNets.ipynb), I played around with both the dnn and lstm architecture input windows. In the end, this tuning did not prove to impact either the dnn or lstm significantly.

In [30]:
# Split the data into train and validation
X_train, y_train, X_val, y_val = split_data(df_nn, 2020, 'price_actual')

# Get the x cols for lstm network, lagged cols
X_train_lstm = X_train.filter(regex='lag')
X_train_dnn = X_train.drop(columns=X_train_lstm.columns)

# Get the x cols for dnn network, forecast cols
X_val_lstm = X_val.filter(regex='lag')
X_val_dnn = X_val.drop(columns=X_val_lstm.columns)

# Reorganize the training and testing data into batches
X_train_dnn, y_train_dnn = resample((X_train_dnn, y_train), 24, 24, 24)
X_val_dnn, y_val_dnn = resample((X_val_dnn, y_val), 24, 24, 24)

# LSTM
X_train_lstm, y_train_lstm = resample((X_train_lstm, y_train), 24, 24, 24)
X_val_lstm, y_val_lstm = resample((X_val_lstm, y_val), 24, 24, 24)

# Instantiate, compiled and fit dnn
input_shape = (X_train_dnn.shape[1], X_train_dnn.shape[2])
nn = models.Sequential()
nn.add(layers.Dense(59, activation='relu', input_shape=input_shape))
nn.add(layers.Dense(239, activation='relu'))
nn.add(layers.Dense(162, activation='relu'))
nn.add(TimeDistributed(layers.Dense(1)))
dnn = compile_fit(nn, (X_train_dnn, y_train_dnn), (X_val_dnn, y_val_dnn))


# Instantiate, compiled and fit lstm
input_shape = (X_train_lstm.shape[1], X_train_lstm.shape[2])
nn = models.Sequential()
nn.add(layers.LSTM(83, activation='tanh', input_shape=input_shape))
nn.add(layers.RepeatVector(y_train_lstm.shape[1]))
nn.add(layers.LSTM(24, activation='tanh', return_sequences=True))
nn.add(TimeDistributed(layers.Dense(1)))
lstm = compile_fit(nn, (X_train_lstm, y_train_lstm), (X_val_lstm, y_val_lstm))

# Create ensemble to combine dnn and lstm, compile and fit
LSTM_DNN = ensemble_nn([dnn, lstm])
nn6 = compile_fit(LSTM_DNN, ([X_train_dnn, X_train_lstm], y_train_dnn), ([X_val_dnn, X_val_lstm], y_val_dnn))

# Compute metrics, add to table
results_actual['nn6'] = compute_metrics(nn6,
                                        'dnn-lstm, 1-day input',
                                        ([X_train_dnn, X_train_lstm],y_train),
                                        ([X_val_dnn, X_val_lstm],y_val))

# Save Model
nn6.save('../models/nn6')

results_actual.T

Unnamed: 0,Parameters,SMAPE_train,SMAPE_val,r2_train,r2_val
TSO,,16.03,16.922,0.954,0.971
Lasso,Vanilla,3.021,5.869,0.977,0.973
Lasso1,{'num_features': 5},3.021,5.869,0.977,0.973
Lasso2,{'price_day_ahead': False},11.811,32.664,0.676,0.557
XGBoost,Vanilla,1.248,6.668,0.996,0.968
XGBoost1,{'max_depth': 2},2.465,5.85,0.984,0.97
XGBoost2,"{'max_depth': 2, 'price_day_ahead': False}",4.331,27.241,0.953,0.427
XGBoost3,"{'max_depth': 16, 'price_day_ahead': False}",0.026,24.84,1.0,0.403
nn1,1-to-1,2.468,3.946,0.983,0.979
nn2,24-to-24,4.022,3.848,0.983,0.979


Looking at the table above 

## Results
___

## Conclusion
___

## Next Steps
___