# <strong>Recurrent Neural Networks Using Pytorch</strong>

### In this notebook you will learn the basics of a Recurrent Neural Network using the python library Pytorch.
---
### <strong>Table of Contents</strong>
1. [Introduction to Recurrent Neural Networks](#intro)
2. [Time Series Data](#time)
3. [Using Pytorch](#pytorch)
4. [Code](#code)
---
### By the end of this notebook, you should be able to implement a basic RNN using Pytorch with the provided data set.


---
## <a name="intro"></a> <strong>Introduction</strong>
---
*<strong>Recurrent neural networks</strong>*, or RNNs, are widely used in a variety of mediums. RNNs leverage sequential data to make predictions. **Sequential memory** makes it easier for the neural network to recognize patterns and replicate the input. In order to achieve learning through sequential memory, a **feedforward neural network** with looping mechanisms is implemented. 

As the image below outlines, there are *three* layers: **input, hidden and output**. There are loops that pass previous information forward, allowing the model to *sequentially* store and learn the data. The complexity of a hidden state is based on how much “historic” information is being stored, it is a representation of all previous steps. When training a model, once there is a prediction from a given output, a **loss function** is used to determine the error between the predicted output and real output. The model is trained through back propagation. The weight of each node in the neural network is adjusted with their corresponding gradient that is calculated during **back propagation**. 
<br>
<br>
<p align="center">
  <img src="images/rnnImg.png">
</p>
<br>
<br>

The advantage of using sequential data to successfully predict certain outcomes is especially relevant when analyzing **time series data**. 

---
## <a name="time"></a><strong>Time Series Data</strong>
---
Prior to training a model, it is important to understand the type of data you are working with. There are many different types of data, this notebook incorporates time series data. In essence, time series data is a collection of chronologically collected observations made over a period of time- sometimes during specific intervals. Time series data can be grouped as either *<strong>metrics</strong>* or *<strong>events</strong>*. 

* **Metrics**: measurements taken at regular intervals.

* **Events**: measurements taken at irregular intervals. 

Distinguishing if the data is comprised of metrics or events is critical. Events are not condusive for creating predictive models. The irregular intervals between each data point prevents sequential logic from creating patterns on past behavior. In contrast, the characteristic regularity between each metric allows machine learing models to learn from previous data and construct possible outcomes for the future. Creating an RNN using time series data, specifically metrics, is a great way to take advantage of the sequential learning pattern they leverage. 

Furthermore, time series data can also be categorized as *<strong>linear</strong>* or *<strong>non-linear</strong>*. Based on the mathematical relationship created by the model, the data is classified as one or the other.

Popular examples of time series data include weather, stock, and health care data. In this notebook, we will be using stock data to create an RNN model to predict the value of the given stock. 



---
## <a name="LSTM"></a><strong>Long-Short Term Memory</strong>
---
LSTMs, or long short term memory, is a type of RNN used to keep track of long term dependencies. [LSTMs](https://developer.ibm.com/tutorials/iot-deep-learning-anomaly-detection-1/) are necessary when processing tiem series data because they hold memory, unlike traditional RNNs. This feature allows for patterns to be identified and learned by the model. The architecture of long short term memory is dependent on $tanh$ and $sigmoid$ functions implemented in the network. The $tahn$ function ensures that the values in the network remain between -1 and 1 while the $sigmoid$ function regulates if data should be remembered or forgotten. Furthermore, an LSTM has an internal state variable that is modified based on weights and biases through operation gates. Traditionally, an LSTM is comprised of three operation gates: the forget gate, input gate, and output gate. 

The mathematical representations of each gate are as follows:

<strong>Forget Gate</strong>: $$f_t = \sigma(w_f*[h_{t-1},x_t] + b_f)$$

<strong>Input Gate</strong>: $$i_t = \sigma(w_f*[h_{t-1},x_t] + b_i)$$

<strong>Output Gate</strong>: $$O_t = \sigma(w_f*[h_{t-1},x_t] + b_o)$$

Where:  
* $w_f$ = weight matrix between forget and input gate
* $h_{t-1}$ = previous hidden state
* $x_t$ = input
* $b_f$ = connection bias at forget gate 
* $b_i$ = connection bias at input gate 
* $b_o$ = connection bias at output gate 


Each gate modifies the input a different way. The forget gate determines what data is relevant to keep and what information can be "forgotten". The input gate analyzes what information needs to be added to the current step, and the output gate finalizes the proceeding hidden state. Each of these gates allows for sequential data to be efficiently stored and analyzed, allowing for an accurate predictive model to be developed. 

---
## <a name="pytorch"></a><strong>Using Pytorch</strong>
---
Pytorch is a python library that uses the specialized data structure Tensors to encode model parameters and inputs. The following is a brief tutorial on imports that we will be using to evaluate the stock data. 


In order to use Pytorch, we must first import the library into our workspace. To do this type the following code...

In [33]:
import torch

In order to be able to work with data and create a neural network, we can use the Pytorch class nn and the primatives DataLoader and datasets. Dataset is meant to wrap an iterable around the dataset while DataLoader is meant to load and store the desired data. The matplotlib import allows us to change, create and plot a figure in a plotting area. This is useful for the model we are trying to create in this exercise. 

In [34]:
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

Now that we know what imports to use, we are ready to begin creating our model for our stock data set!

---
## <a name="code"></a><strong>Code</strong>
---

In [35]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

In [37]:
stock_data = pd.read_csv(body)
stock_data.head()

#get how many elements are in data
stock_data.shape
print(stock_data.shape)
stock_data.dropna(subset=['High','Low','Open','Close'], axis = 0, inplace =True)
stock_data = stock_data.sort_values(by="Date")
#stock_data = stock_data[['High','Low','Open','Close']]
stock_data = stock_data['Close']
stock_data = stock_data.values.astype(float)
print(stock_data)
#stock_data = stock_data.drop(columns=['Date'])


(9400, 6)
[ 28.75  27.25  25.25 ... 164.94 172.77 168.34]


In [38]:
#use 30 as test number initially
test_close_size = 30
train_close = stock_data[:-test_close_size]
test_close = stock_data[-test_close_size:]
print(train_close.shape)
print(len(test_close))


(9370,)
30


In [39]:
#train data
from sklearn.preprocessing import MinMaxScaler

#normalize data
scaler = MinMaxScaler(feature_range=(-1, 1))
train_close_normalized = scaler.fit_transform(train_close .reshape(-1, 1))
train_close_normalized = torch.FloatTensor(train_close_normalized).view(-1)

print(train_close_normalized.shape)


train_window = 7


torch.Size([9370])


In [40]:
def create_inout_sequences(input, tw):
    inout_seq = []
    L = len(input)
    print('Length = ',L)
    for i in range(L-tw):
        train_seq = input[i:i+tw]
        train_label = input[i+tw:i+tw+1] 
        inout_seq.append((train_seq ,train_label))
        
    return inout_seq

train_inout_seq = create_inout_sequences(train_close_normalized, train_window)

print(train_inout_seq[:4])


Length =  9370
[(tensor([-0.9486, -0.9530, -0.9588, -0.9570, -0.9548, -0.9501, -0.9461]), tensor([-0.9425])), (tensor([-0.9530, -0.9588, -0.9570, -0.9548, -0.9501, -0.9461, -0.9425]), tensor([-0.9378])), (tensor([-0.9588, -0.9570, -0.9548, -0.9501, -0.9461, -0.9425, -0.9378]), tensor([-0.9291])), (tensor([-0.9570, -0.9548, -0.9501, -0.9461, -0.9425, -0.9378, -0.9291]), tensor([-0.9277]))]


In [41]:
class LSTM(nn.Module):
    def __init__(self, input_size=1, hidden_layer_size=100, output_size=1):
        super().__init__()
        self.hidden_layer_size = hidden_layer_size

        self.lstm = nn.LSTM(input_size, hidden_layer_size)

        self.linear = nn.Linear(hidden_layer_size, output_size)

        self.hidden_cell = (torch.zeros(1,1,self.hidden_layer_size),
                            torch.zeros(1,1,self.hidden_layer_size))

    def forward(self, input_seq):
        lstm_out, self.hidden_cell = self.lstm(input_seq.view(len(input_seq), 1, -1), self.hidden_cell)
        predictions = self.linear(lstm_out.view(len(input_seq), -1))
        return predictions[-1]


In [42]:
model = LSTM()
loss_function = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [None]:
epochs = 25
for i in range(epochs):
    for seq, labels in train_inout_seq:
        optimizer.zero_grad()
        model.hidden_cell = (torch.zeros(1, 1, model.hidden_layer_size),
                        torch.zeros(1, 1, model.hidden_layer_size))

        y_pred = model(seq)

        single_loss = loss_function(y_pred, labels)
        single_loss.backward()
        optimizer.step()

    if true:
        print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')

print(f'epoch: {i:3} loss: {single_loss.item():10.10f}')

In [45]:
fut_pred = 12

test_inputs = train_close_normalized[-train_window:].tolist()
print(test_inputs)

[-0.567732572555542, -0.5760671496391296, -0.5600347518920898, -0.5701345801353455, -0.5823469758033752, -0.5804659128189087, -0.5609607696533203]


In [46]:
model.eval()

for i in range(fut_pred):
    seq = torch.FloatTensor(test_inputs[-train_window:])
    with torch.no_grad():
        model.hidden = (torch.zeros(1, 1, model.hidden_layer_size),
                        torch.zeros(1, 1, model.hidden_layer_size))
        test_inputs.append(model(seq).item())

In [48]:
import numpy as np
actual_predictions = scaler.inverse_transform(np.array(test_inputs[train_window:] ).reshape(-1, 1))
print(actual_predictions)

[[162.58053717]
 [162.60758022]
 [162.63070996]
 [162.64854643]
 [162.66181051]
 [162.67167617]
 [162.6789261 ]
 [162.68432235]
 [162.68835925]
 [162.69126334]
 [162.69348775]
 [162.69511486]]


### <strong>Want to Learn More?</strong>

Running deep learning programs usually needs a high performance platform. **PowerAI** speeds up deep learning and AI. Built on IBM’s Power Systems, **PowerAI** is a scalable software platform that accelerates deep learning and AI with blazing performance for individual users or enterprises. The **PowerAI** platform supports popular machine learning libraries and dependencies including TensorFlow, Caffe, Torch, and Theano. You can use [PowerAI on IMB Cloud](https://cocl.us/ML0120EN_PAI).

Also, you can use **Watson Studio** to run these notebooks faster with bigger datasets. **Watson Studio** is IBM’s leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, **Watson Studio** enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of **Watson Studio** users today with a free account at [Watson Studio](https://cocl.us/ML0120EN_DSX). This is the end of this lesson. Thank you for reading this notebook, and good luck!

### <strong>References</strong>

* https://pytorch.org/tutorials/beginner/basics/intro.html
* https://www.youtube.com/watch?v=LHXXI4-IEns
* https://www.influxdata.com/what-is-time-series-data/