In [14]:
##############################################
# Notebook: RNN for TIME SERIES PREDICTION   #
##############################################
# Author: Alejandro Benjamin Jimenez Pnata   #
# Version: 1.0                               #
# Git: alex200420                            #
# Python_version: 3.6                        #
##############################################

# RNN FOR TIME SERIES PREDICTION

<img src = 'https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2018/11/Sliding-Window-Approach-to-Modeling-Time-Series.png' ></img>

Process
====================

Data preparation
---------------------

Whether it's a multioutput/input or single output/input timeseries, 
we need to accommodate some of it to prepare it before training.

### Keynotes

> It helps to think of it as a sensor-register where each register (at a certain time) gives "metrics" or inputs to consider, in that sense, you could be considering multiple time series as inputs, where each time-series is a metric registered by the sensor.
> 
> Remember to be consistent in the time-steps between registers. Meaning that the separation between each register should remain consistent; if it weren't the case, then we'll need to work the data to keep-these records equally-time-distant from each other
>
> ## Remain consistent

**Output**

After all the preprocessing, the output should be a dataframe/numpy array with a similar structure as the following(**with 2 metrics**): 

| Time-step | Metric-1| Metric-2 |
|-----------|---------|----------|
|   1       |  201.7  |  101.5   |
|   2       |  95.7   |  33.2    |

In [15]:
#<YOUR CODE HERE>##

Preparing input and output
---------------------

Once the data is preprocessed for each time-step, we will need to
prepare the X and Y output, so as to predict the next value(s) for the timeseries.

The procedure is as followed:

### Procedure
> ### Remember to first SORT BY TIME_STEP
> For each time-step "i" compute an array which contains the data from: $${i-windowsize}: {i}$$ this last ${i}$ step will actually be used to obtain the output for a certain array of inputs . Each input will therefore have: $$singleinputshape =  (windowsize, numfeatures)$$ obtaining, for ${m}$ samples of data: $$inputshape = (m, windowsize, numfeatures)$$
> 
> Were there many diferent outputs to be obtained at once, you should obtain them as you compute the previously mentioned array, all at once
>
> In order to have better control over Train-Test sets for training, it's recommended to label the dataset beforehand and after the $X$ and $Y$ processing, with the label tag added, split them in $X_{train}, X_{test}, Y_{train}, Y_{test}$
>
> Values should, finally, be **NORMALIZED** before entering the model, so as to avoid exploding gradient problems because of the loss 
>
> #### Some code for reference is provided below

### Sample code


```python
from tqdm import tqdm # recommended to keep track of process-time per iteration
import sys
import warnings
import numpy as np
from sklearn.preprocessing import StandardScaler
warnings.filterwarnings("ignore") # warning input type por transformarse a float

def X_to_LSTM_ready(X_unpr, window_size = 5):
    '''
    Takes as input a dataframe.values array an outputs
    the computed arrays for a given window_size
    param:: window_size default value: 5
    param:: X_unpr the array obtained from DataFrameObject.values
    '''
    l = []
    for ind in range(0,int(X_unpr.shape[0])- window_size + 1):
        l.append(X_unpr[ind:ind + window_size])
    X = np.array(l)
    return X

df['TrainorTest'] = #define TrainorTest acoording to index values for sklearn.train_test_split on df

#=============================== OBTAINING X AND Y ================================#

x_features= #as defined
reference_features = #as defined including "TrainorTest"
window_size = #as defined

x_ls = []
y_ls = []
for mat in tqdm(df["column_as_unique_identifier"].unique()):
    '''
    this FOR SHOULD ONLY be used when a certain column or identifier can be considered to split largely different behaviors in these dataframes. For example, a certain client behaves differently from a different client, or a certain product sells diferently from another one. In a direct TimeSeries is usually not necessary.
    '''
    
    tmp_df = df[df["column_as_unique_identifier"] == mat].sort_values('date_column', ascending = True)
    features = tmp_df[x_features + reference_features].copy()
    feat_values = features.values
    
    feats = X_to_LSTM_ready(feat_values, window_size)

    tmp_x = feats[:,:,:-len(reference_features)] # Bota Semana y Cantidad Vendida
    
    tmp_y = feats[:,:,-len(reference_features):][:,-1] #obtain last value from the obtained array
    tmp_y = tmp_y.reshape(len(tmp_y), len(reference_features))
    
    x_ls.append(tmp_x)
    y_ls.append(tmp_y)

X = np.concatenate(x_ls)
y = np.concatenate(y_ls)

#=================OBTAINING X_TRAIN, X_TEST, Y_TRAIN, Y_TEST============================#

cond_test = y[:, '#index for TrainorTest'] == 'Test'
cond_train = y[:,'#index for TrainorTest'] == 'Train'

X_train = X[cond_train,:,:]
X_test = X[cond_test,:,:]

y_train =  y[cond_train, '#indexes needed to output' ]
#y_train = y_train.reshape(len(y_train),1) #in case of single output

y_test =  y[cond_test, '#indexes needed to output']
#y_test = y_test.reshape(len(y_test),1) # in case of single output

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

#=======================NORMALIZING OUTPUT================================#
## necessary to train a neural network and avoid exploding gradients because of the loss
scalers = {}

for i in range(X_train.shape[1]):
    scalers[i] = StandardScaler()
    X_train[:, i, :] = scalers[i].fit_transform(X_train[:, i, :]) 

for i in range(X_test.shape[1]):
    X_test[:, i, :] = scalers[i].transform(X_test[:, i, :])
    
    
## defining parameters for training
batch_size = # as defined (used 100 for 13000 inputs, the value of batch_size should increase if more data is provided so as to avoid under-fitting and training-low-speeds)

max_train = len(X_train)//batch_size*batch_size

X_train_ready = X_train[:max_train,:,:]
y_train_ready = y_train[:max_train,:]

max_test = len(X_test)//batch_size*batch_size

X_test_ready = X_test[:max_test,:,:]
y_test_ready = y_test[:max_test,:]

y_test_ready_out = y_test_ready[:,-1].reshape(len(y_test_ready),1)
y_train_ready_out = y_train_ready[:,-1].reshape(len(y_train_ready),1)

```

In [16]:
 ##<YOUR CODE HERE>##

Model training and predict
---------------------

Now to the interesting part! Training the model.
* Read first about LSTM and how they work, example is provided below for guidance

* So there's a lot of things to consider in this part, each of which is highly docummented and I would suggest to review topics such as: Finetuning models, Techniques to avoid overfitting, Keras callbacks to obtain best model, Bias and variance balance.

### Keynotes
> ### Orthogonality
> One of the objectives namely on any neural network is to keep the concept of orthogonality in mind. What does this mean? Remember how on a remote control, each button only changes one thing from the configuration on the TV?, so you can change channel, or you can change the volumne, each one independently from each other?. Neural networks don't quite work the same way. So in order to find a best model you should always be careful about what you change and keep a record of the changes you make.
>
>
> ### Bias and Variance
>
> So bias is basically how my ${training_{accuracy}}$ is far from what i actually want:${bayes_{accuracy}}$, and the variance is how far is my ${test_{accuracy}}$ from my ${training_{accuracy}}$. There is different solutions for each problem(to keep orthogoanlity).
>
> **In general terms** bias problem can be solved through a bigger dataset, deeper-network or more complex-structured network, whereas the variance problem is solved through regularization, though sometimes the problem may be that the test_set has a different nature from the training_set, so keeping the same nature problem in test and train set is also important
>
> **Identify if the problem is a classification or regression problem** and choose the loss function and **Last Layer OUTPUT function accordingly**
>
> Choose a loss function that relates to the objective of the problem.
>
>**Lastly,**
> ### Remember that with sufficient epochs any model can be completely overfitted in the train_set, the goal is to generalize the model and the way to observe that is to see the variance also decreases!
>
> ### Always solve the bias problem first before the variance problem! and iterate!
>
> Have Fun!!

## Sample code

```python

## This example is done with keras, as it's the simplest way to build a lstm-net

import keras
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import LSTM
import numpy as np
import keras.backend as tf
from keras import optimizers
## model-related-parameters
dropout_rate = .15
batch_size = #defined in the previous example given that it was necessary for processing the data
## defining model structure
adam_opt = keras.optimizers.Adam(lr=1e-5)

model = Sequential()
model.add(LSTM(60, batch_input_shape=(batch_size, 3, 7), stateful=True, return_sequences = True))
model.add(LSTM(30, stateful=True, return_sequences = False))
model.add(Dense(40, activation = 'relu'))
model.add(Dropout(dropout_rate))
model.add(Dense(20, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid')) # choose appropriate output function depending of whether it's a classification or regression problem
model.compile(loss= 'binary_crossentropy', optimizer= adam_opt, metrics = ['acc']) ## choose an appropriate loss function depending on the objective of the problem, remember that the metric 'acc' is NOT MEANINFUL IN REGRESSION PROBLEMS, INSTEAD RMSE, MSE, MAE ARE SUGGESTED
##------##
model.fit(X_train_ready, y_train_ready_out, validation_data = (X_test_ready, y_test_ready_out), epochs=50, batch_size=batch_size, verbose= True, shuffle=True)

y_test_predicted = model.predict(X_test_ready, batch_size = batch_size)

##### IF THE EXAMPLE IS A BINARY-CLASSIFICATION PROBLEM YOU MAY CONSIDER THE FOLLOWING ######
## CALCULATING ROC-AUC FOR PREDICTED VALUES ####

# # roc curve and auc
# from sklearn.datasets import make_classification
# from sklearn.linear_model import LogisticRegression
# from sklearn.model_selection import train_test_split
# from sklearn.metrics import roc_curve
# from sklearn.metrics import roc_auc_score
# from matplotlib import pyplot
# # generate 2 class dataset
# # calculate scores
# lr_auc = roc_auc_score(list(y_test_ready_out.reshape(len(y_test_ready_out),)), list(y_test_predicted.reshape(len(y_test_predicted),)))
# # summarize scores
# print('Logistic: ROC AUC=%.3f' % (lr_auc))
# # calculate roc curves
# lr_fpr, lr_tpr, _ = roc_curve(list(y_test_ready_out.reshape(len(y_test_ready_out),)), list(y_test_predicted.reshape(len(y_test_predicted),)))
# # plot the roc curve for the model
# pyplot.plot(lr_fpr, lr_tpr, marker='.', label='Logistic')
# # axis labels
# pyplot.xlabel('False Positive Rate')
# pyplot.ylabel('True Positive Rate')
# # show the legend
# pyplot.legend()
# # show the plot
# pyplot.show()
# model.compile(loss= 'mae', optimizer= 'adam', metrics = ['acc'])
```

In [17]:
#< YOUR CODE HERE >#