# Implementing RNN for sequential data processing.

The previous labs were focused on image recognition. CNN are really good for that as they capture spatial information. But what happens when we have sequential data ? We need a model that is able to capture dependency through time or space, that's what RNNs are made for.

In [1]:
## Setup the imports 
import numpy as np 
import matplotlib.pyplot as plt
import csv

## Data preparation

This time the dataset will be a weather dataset. You have 2 csv file (train and test data) with climate information over 4 years. Each data point has five features : the date, the temperature, the humidity, the pressure, and the wind speed. 

You will select one of these features as the target that you need to predict (except the date) and use the other features as the input data.

In [9]:
## First let's load the data
with open("DailyDelhiClimateTrain.csv") as f :
    reader = csv.reader(f)
    data = list(reader)

field_names = data[0]

print(field_names)
print(data[1])

['date', 'meantemp', 'humidity', 'wind_speed', 'meanpressure']
['2013-01-01', '10.0', '84.5', '0.0', '1015.6666666666666']


As the data is in a csv file, loading it loads strings, we need to convert it to an input that the model is familiar with. The following code achieves this purpose.

In [11]:
## dates are in date format, we will simply increment a value and convert them to timestamp :
timestamp = np.arange(len(data[1:]))
print(timestamp)

## the other values are float values that have been read as strings :
meantemp = np.array([float(d[1]) for d in data[1:]])
print(meantemp)

humidity = np.array([float(d[2]) for d in data[1:]])
print(humidity)

windspeed = np.array([float(d[3]) for d in data[1:]])
print(windspeed)

meanpressure = np.array([float(d[4]) for d in data[1:]])
print(meanpressure)

[   0    1    2 ... 1459 1460 1461]
[10.          7.4         7.16666667 ... 14.0952381  15.05263158
 10.        ]
[ 84.5         92.          87.         ...  89.66666667  87.
 100.        ]
[0.         2.98       4.63333333 ... 6.26666667 7.325      0.        ]
[1015.66666667 1017.8        1018.66666667 ... 1017.9047619  1016.1
 1016.        ]


The data for the train set has been loaded, we need now to create a dataset so that we can access the data conveniently. Write the class for ClimateDataset such that accessing ds[i] returns the input data and the target at timestamp i.

The dataset must distinguish between train and test data.

You can chose the target to be predicted.

IMPORTANT : You must be able to get a `sequence` of data, not just a single datapoint. So this time you should implement batch loading, so that we can extract sequences of arbitrary sizes.

In [None]:
class ClimateDataset :
    def __init__(self, is_train) -> None:
        pass

    @property
    def data(self):
        pass

    @property
    def target(self):
        pass

    def __getitem__(self, index):
        return 0

### Data preprocessing

This Time we are not dealing with image pixels but with recorded values of real world value. As you may notice, these values vary greatly in scale. Some are between 0 and 10 while others are in the thousand. We have seen in previous labs that neural networks are sensitive to scale, we thus need to normalize values between 0 and 1. We will use as we did before the min-max scaling : 

$$ x' = \frac{x - min(x)}{max(x)-min(x)} $$

However this time we have a `regression` problem and not a classification problem, so we need to also scale the `target`. This also means that you need to keep track of the $max$ and $min$ so you can retrieve the true values later.

In [None]:
##TODO: Apply mean-max preprocessing to all the features.

### Train-test-val split

This time also we need a validation split.

In [12]:
## TODO : create a validation split for this task.

## Model implementation.

We will implement the LSTM (long short term memory) module. LSTM is a modification of RNN that was introduced to deal with the problem of vanishing gradients in RNNs.

The following is the diagram of the LSTM unit taken from [wikipedia](https://en.wikipedia.org/wiki/Long_short-term_memory)

<img src="LSTM_Cell.svg">


An LSTM unit is composed of a cell and three gates : an input gate, an output gate, and a forget gate. The computation made by the unit are as follows : 

- $f_t = \sigma_g(W_fx_t + U_fh_{t-1} + b_f)$   forget gate computation.
- $i_t = \sigma_g(W_ix_t + U_ih_{t-1} + b_o)$   input gate computation.
- $o_t = \sigma_g(W_ox_t + U_oh_{t-1} + b_o)$   output gate computation.
- $c'_t = \sigma_c(W_{c'}x_t + U_{c'}h_{t-1} + b_{c'})$  cell intermediate value.
- $c_t = f_t\odot c'_{t-1} + i_t \odot c'_t$    update of cell value.
- $h_t = o_t \odot \sigma_h(c_t)$   update of hidden state value.

Where $c_t$ represents the value of the cell at time $t$ and $h_t$ is the hidden state at time t, with $c_0$ = 0 and $h_0=0$. 

If we note $d$ the input dimension and $h$ the hidden dimension, $W \in \mathbb{R}^{h\times d}$ and $U \in \mathbb{R}^{h\times h}$ are weight matrices.

Finally $\sigma_g$ is the sigmoid activation function, $\sigma_c$ is the hyperbolic tangent activation, and $\sigma_h$ is either hyperbolic tangent or identity (you can choose).

$\odot$ is just the element-wise product.

The hyperbolic tangent is given by : 

$$ tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}

TODO : compute the derivative of the tanh function (pen and paper).

In [13]:
## TODO : implement sigmoid and tanh and their backward function. 
## normally you already have sigmoid.
def sigmoid(x):
    return 0

def tanh(x):
    return 0

class SigmoidLayer : 
    """This implements the Sigmoid layer"""

class TanhLayer :
    """This implements the tanh layer"""

We will now implement the LSTM unit. For that, it is good to separate it into gates, so you can flow the gradients properly.

In [None]:
class Gate:
    """Implement the gates.
    Hint : have you noticed the gates perform the same computation, 
    only with a different set of weights?"""

class LSTMLayer :
    """Implements the LSTM layer. You can follow the previous lab's API.
    Use three gates and maintain a cell and a hidden state. 
    The output should be the hidden state at time t.
    """
    def __init__(self, input_dim, hidden_state_dim) -> None:
        pass

    def forward(self, x):
        return x
    
    def backward(self, gradient):
        return gradient
    
    def step(self, alpha):
        pass

    def __call__(self, x):
        return self.forward(x)

We will now implement a full model. You need to use linear layers from the MLP lab as well as at least one LSTM unit.
Don't forget the activation for the linear layers.

We need to predict only one value so the output should only have one dimension.

In [14]:
## TODO : implement a RNN model, that has Linear layers and LSTM units.
class RNNModel :
    """Implements a full RNN model, with at least one LSTM unit.
    Take inspiration from the full MLP and the full CNN model.
    """
    def __init__(self) -> None:
        pass

    def forward(self, x):
        return x
    
    def backward(self, gradient):
        return gradient
    
    def step(self, alpha):
        pass

    def __call__(self, x):
        return self.forward(x)

## Training the model.

We will now train our model. This time our problem is a regression problem, because we are trying to predict a real value. We will thus use the same loss function as we used in lab 1 - Perceptron to solve OR. The L2-loss function : 

$$ \mathcal{L}(\hat{y}, y) = (y-\hat{y})^2 $$

Whose derivative is : 

$$ \frac{\mathcal{L}}{d\hat{y}} = 2(y-\hat{y}) 

In [None]:
##TODO : implement the L2-loss, you can reuse lab 1 implementation
def l2_loss(y_pred, y_true):
    return 0

In [None]:
# TODO : implement the training loop with the validation loop
## IMPORTANT : this time your RNNN should deal with SEQUENTIAL data, and output MULTIPLE values
def train(model, train_data, train_labels, validation_data, validation_labels, lr, num_epochs):
    best_model = None  ## should be returned at the end of the training
    best_model_validation_loss = None
    ### Loop over epochs
    for epoch in num_epochs : 
        pass
        ### TODO : training loop, similar to previous lab

        ### TODO : implement validation loop that runs every few epoch.
        ### it should allow to choose the best model as the one that minimizes validation loss
    
    return best_model

## Evaluation

This time it is not a classification problem, so there is no accuracy or recall to compute. Instead we will compute the Root Mean Squared Error (RMSE) of the model over the test set. And visualize the prediction against the true values.

The RMSE is given by : 
$$ RMSE = \sqrt{\sum_{i_1}^n \frac{(y_i - \hat{y_i})^2}{n}} $$



In [None]:
## TODO: COmpute the RMSE of the model over the test set

In [None]:
# TODO : Show the predicted values and the test values on the same graph

BONUS : Repeat the experiment for each possible feature of the dataset, changing the target each time. Plot the results every time.