# Introduction

<br></br>
Take me to the [code and Jupyter Notebook]() for Stock Market Prediction!

<br></br>
This article explores a Machine Learning algorithm called Recurrent Neural Network (CNN), it's a common Deep Learning technique used for continuous data pattern recognition. RNNs take into account how data changes over time, it's typically used for time-series data (stock prices, sensor readings, etc). RNNs can also be used for video analysis.


<br></br>
You are provided with a dataset 




# Recurrent Neural Networks

As we try to model Machine Learning to behave like brains, weights represent Long Term Memory in the Temporal Lobe.

<br></br>
Recognition of patterns and images is done by the Occipital Lobe which is like CNNs.

<br></br>
RNNs are like Short Term Memory which remember recent memory and can create context like the Frontal Lobe.

<br></br>
Parietal lobe is responsible for spacial recognition like Botlzman Machines.

<br></br>
RNNs connect Neurons to themselves through time, creating a feedback look and Short Term Memory like awareness.

<img src="attachment:37%20-%20Brain%20Diagram.png" width="400">

The following diagram represents the old-school way to represent RNNs, which shows a Feedback Loop (Temporal Loop) structure that connects Hidden Layers to themselves AND the Output Layer which gives them a Short Term Memory. Each layer represents reseval Nodes.

Compact Form | Expanded Form
- | -
<img src="attachment:38%20-%20Old%20RNN%20Representation.png" width="100"> | <img src="attachment:39%20-%20Expanded%20%20RNN%20Representation.png" width="400">


A more modern representation shows the following RNN types and use examples: 
1. One-To-Many: Computer description of an image. CNN used to classify images and then RNN used to make sense of images and generate context.

2. Many-To-One: Sentiment Analysis of text (gague the positivity or negativity of text)

3. Many-to-Many: Google translate of language who's vocabulary changes based on the gender of the subject. Also subtitling of a movie.


<img src="attachment:40%20-%20RNN%20Examples.png" width="600">


Check out Andrej Karpathy's Blog (Director of AI at Tesla) on [Github](http://karpathy.github.io/) and [Medium](https://medium.com/@karpathy/).

<br></br>
Here is the movie [Sunspring by Benjamin the LSTM Recurrant Neural Network AI](https://www.youtube.com/watch?v=LY7x2Ihqjmc)


# RNN Exploding/Vanishing Gradient Problem

The gradient is used to update the weights in an RNN looking back a certain number of user defined steps.

<br></br>
The lower the gradient, the harder it is to update the weights (vanishing gradient) of nodes furthur back in time. Especially because previous layers are used as inputs for future layers.

<br></br>
So old Neurons are training much slower that more current Neurons. It's like a domino effect.

![41%20-%20RNN%20Vanish%20Gradient.png](attachment:41%20-%20RNN%20Vanish%20Gradient.png)


# Exploding Gradient Solutions

**1. Truncated Backpropagation**
<br></br>
Stop backprop after a certain point (not an optimal because not updating all the weights). Better than doing nothing which can produce an irrelavent network.

**2. Penalties**
<br></br>
The gradient can be penalized and artificially reduced.

**3. Gradient Clipping**
<br></br>
A maximum limit for the gradient which stops it from rising more.


# Vanishing Gradient Solutions

**1. Weight Initialization**
<br></br>
You can be smart about how you initialize weights to minimize the vanishing gradient problem.

**2. Echo State Network**
<br></br>
Designed to solve vanishing gradient problem. It's a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned.

**3. Long Short-Term Memory Networks (LSTM)**
<br></br>
Most popular RNN structure to tackle this problem


# LSTM

**History**
<br>
When the weight of an RNN gradient 'W_rec' is less than 1 we get Vanighing Gradient, when 'W_rec' is more than 1 we get Exploding Gradient. So why not set W_rec = 1.

![42%20-%20LSTM.png](attachment:42%20-%20LSTM.png)

Circles represent Layers (Vectors).
<br></br>
**'C'** represents Memory Cells Layers
<br></br>
**'h'** represents Output Layers (Hidden States)
<br></br>
**'X'** represents Input Layers
<br></br>
**Lines** represent values being transferred.
<br></br>
**Concatenated** lines represent pipelines running in parallel.
<br></br>
**Forks** are when Data is copied.
<br></br>
**Pointwise Element-by-Element Operation (X)** represents valves (from left-to-right: Forget Valve, Memory Valve, Output Valve). Valves can be open, closed or partially open as decided by an Activation Function.
<br></br>
**Pointwise Element-by-Element Operation (+)** represent a Tee pipe joint, allowing stuff through if the corresponding valve is activated.
<br></br>
**Pointwise Element-by-Element Operation (Tanh)** Tangent function that outputs (values between -1 to 1)
<br></br>
**Sigma Layer Operation** Sigmoid Activation Function (values from 0 to 1)
<br></br>

![43%20-%20LSTM%20Cell.png](attachment:43%20-%20LSTM%20Cell.png)



# LSTM Step 1

New Value (X_t) and value from Previous Node (h_t-1). Together they decide if the Forget Valve should be opened or closed (Sigmoid).

<img src="attachment:44%20-%20LSTM%20Step%201.png" width="600">


# LSTM Step 2

New Value (X_t) and value from Previous Node (h_t-1). Together they decide if the Memory Valve should be opened or closed (Sigmoid). To what extent to let values through (Tanh from -1 to 1).

<img src="attachment:45%20-%20LSTM%20Step%202.png" width="600">


# LSTM Step 3

Decide the extent to which a Memory Cell (C_t) should be updated from the previous Memory Cell (C_t-1). Forget and Memory Valves used to decide this. You can update Memory completely, not at all or only partially.

<img src="attachment:46%20-%20LSTM%20Step%203.png" width="600">


# LSTM Step 4

New Value (X_t) and value from Previous Node (h_t-1) decides which part of the Memory Pipeline (and to what extent) will be used as an Output (h_t).

<img src="attachment:47%20-%20LSTM%20Step%204.png" width="600">


# LSTM Variation 1 (Add Peep holes)

Sigmoid Layer Activation Functions now have additional information about the current state of the Memory Cell. So Valve decisions are made, taking into account Memory Cell State.

<img src="attachment:48%20-%20LSTM%20Var%201.png" width="600">


# LSTM Variation 2 (Connect Forget & Memory Valves)

Forget and Memory Valves can make a combined decision. They're conncected with a '-1' multiplier so one opens when the other closes.

<img src="attachment:49%20-%20LSTM%20Var%202.png" width="600">


# LSTM Variation 3 (GRU: Gated Recurring Units)

The Memory Pipeline is replaced by the Hidden Pipeline. Simpler but less flexible in terms of how many things are being monitored and controlled.

<img src="attachment:50%20-%20LSTM%20Var%203.png" width="600">




# Code

<br></br>
Download the code and run it with 'Jupyter Notebook' or copy the code into the 'Spyder' IDE found in the [Anaconda Distribution](https://www.anaconda.com/download/). 'Spyder' is similar to MATLAB, it allows you to step through the code and examine the 'Variable Explorer' to see exactly how the data is parsed and analyzed. Jupyter Notebook also offers a [Jupyter Variable Explorer Extension](http://volderette.de/jupyter-notebook-variable-explorer/) which is quite useful for keeping track of variables.


<br></br>
```shell
$ git clone 
$ cd 
```

<br></br>
<br></br>
<br></br>
<br></br>


In [None]:
# Recurrent Neural Network

# Part 1 - Data Preprocessing

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the training set
dataset_train = pd.read_csv('Google_Stock_Price_Train.csv')
# '.values' need the 2nd Column Opening Price as a Numpy array (not vector)
# '1:2' is used because the upper bound is ignored
training_set = dataset_train.iloc[:, 1:2].values

# Feature Scaling
# Use Normalization (versus Standardization) for RNNs with Sigmoid Activation Functions
# 'MinMaxScalar' is a Normalization Library
from sklearn.preprocessing import MinMaxScaler
# 'feature_range = (0,1)' makes sure that training data is scaled to have values between 0 and 1
sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set)

# Creating a data structure with 60 timesteps (look back 60 days) and 1 output
# This tells the RNN what to remember (Number of timesteps) when predicting the next Stock Price
# The wrong number of timesteps can lead to Overfitting or bogus results
# 'x_train' Input with 60 previous days' stock prices
X_train = []
# 'y_train' Output with next day's stock price
y_train = []
for i in range(60, 1258):
    X_train.append(training_set_scaled[i-60:i, 0])
    y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)

# Reshaping (add more dimensions)
# This lets you add more indicators that may potentially have corelation with Stock Prices
# Keras RNNs expects an input shape (Batch Size, Timesteps, input_dim)
# '.shape[0]' is the number of Rows (Batch Size)
# '.shape[1]' is the number of Columns (timesteps)
# 'input_dim' is the number of factors that may affect stock prices
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))


In [None]:
# Part 2 - Building the RNN
# Building a robust stacked LSTM with dropout regularization

# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout

# Initialising the RNN
# Regression is when you predict a continuous value
regressor = Sequential()

# Adding the first LSTM layer and some Dropout regularisation
# 'units' is the number of LSTM Memory Cells (Neurons) for higher dimensionality
# 'return_sequences = True' because we will add more stacked LSTM Layers
# 'input_shape' of x_train
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
# 20% of Neurons will be ignored (10 out of 50 Neurons) to prevent Overfitting
regressor.add(Dropout(0.2))

# Adding a second LSTM layer and some Dropout regularisation
# Not need to specify input_shape for second Layer, it knows that we have 50 Neurons from the previous layer
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a fourth LSTM layer and some Dropout regularisation
# This is the last LSTM Layer. 'return_sequences = false' by default so we leave it out.
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))

# Adding the output layer
# 'units = 1' because Output layer has one dimension
regressor.add(Dense(units = 1))

# Compiling the RNN
# Keras documentation recommends 'RMSprop' as a good optimizer for RNNs
# Trial and error suggests that 'adam' optimizer is a good choice
# loss = 'mean_squared_error' which is good for Regression vs. 'Binary Cross Entropy' previously used for Classification
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the RNN to the Training set
# 'X_train' Independent variables
# 'y_train' Output Truths that we compare X_train to.
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)


In [None]:
# Part 3 - Making the predictions and visualising the results

# Getting the real stock price of 2017
dataset_test = pd.read_csv('Google_Stock_Price_Test.csv')
real_stock_price = dataset_test.iloc[:, 1:2].values

# Getting the predicted stock price of 2017
# We need 60 previous inputs for each day of the Test_set in 2017
# Combine 'dataset_train' and 'dataset_test'
# 'axis = 0' for Vertical Concatenation to add rows to the bottom
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
# Extract Stock Prices for Test time period, plus 60 days previous
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
# 'reshape' function to get it into a NumPy format
inputs = inputs.reshape(-1,1)
# Inputs need to be scaled to match the model trained on Scaled Feature
inputs = sc.transform(inputs)
# The following is pasted from above and modified for Testing, romove all 'Ys'
X_test = []
for i in range(60, 80):
    X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
# We need a 3D input so add another dimension
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
# Predict the Stock Price
predicted_stock_price = regressor.predict(X_test)
# We need to inverse the scaling of our prediction to get a Dollar amount
predicted_stock_price = sc.inverse_transform(predicted_stock_price)

# Visualising the results
plt.plot(real_stock_price, color = 'red', label = 'Real Google Stock Price')
plt.plot(predicted_stock_price, color = 'blue', label = 'Predicted Google Stock Price')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()
