Predict the Open price

# Part1 - Data Preprocessing

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### 1. import the training set

In [2]:
datase_train = pd.read_csv('Google_Stock_Price_Train.csv')

#create a numpy array - open price
training_set = datase_train.iloc[:, 1:2].values
training_set

array([[325.25],
       [331.27],
       [329.83],
       ...,
       [793.7 ],
       [783.33],
       [782.75]])

### 2. Feature Scaling

In [4]:
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set)
training_set_scaled

array([[0.08581368],
       [0.09701243],
       [0.09433366],
       ...,
       [0.95725128],
       [0.93796041],
       [0.93688146]])

### 3. Creating a data structure with 60 timesteps and 1 output

60 timesteps: 
- At each time t the RNN is going to look at the previous 60 stock prices
- Past information
- 60 gets from many Experiments

In [5]:
X_train = []
y_train = []
for i in range(60, 1258):
    X_train.append(training_set_scaled[i-60:i]) # previous 60 stock prices at time t(59)
    y_train.append(training_set_scaled[i, 0]) # predict the price at t+1(60)
X_train, y_train = np.array(X_train), np.array(y_train)
X_train.shape[1]

60

In [6]:
y_train

array([0.08627874, 0.08471612, 0.07454052, ..., 0.95725128, 0.93796041,
       0.93688146])

### 4. Reshape

keras.layers.RNN
input shapes: (batch_size, timesteps, input_dim)
- batch size: total number of stock price that we have from 2012-2016
- timesteps: 60
- input_dim: you also can add new predictors like close stock or stock price from other correlated companies
    - indicators
    - our case: 1

In [7]:
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_train

array([[[0.08581368],
        [0.09701243],
        [0.09433366],
        ...,
        [0.07846566],
        [0.08034452],
        [0.08497656]],

       [[0.09701243],
        [0.09433366],
        [0.09156187],
        ...,
        [0.08034452],
        [0.08497656],
        [0.08627874]],

       [[0.09433366],
        [0.09156187],
        [0.07984225],
        ...,
        [0.08497656],
        [0.08627874],
        [0.08471612]],

       ...,

       [[0.92106928],
        [0.92438053],
        [0.93048218],
        ...,
        [0.95475854],
        [0.95204256],
        [0.95163331]],

       [[0.92438053],
        [0.93048218],
        [0.9299055 ],
        ...,
        [0.95204256],
        [0.95163331],
        [0.95725128]],

       [[0.93048218],
        [0.9299055 ],
        [0.93113327],
        ...,
        [0.95163331],
        [0.95725128],
        [0.93796041]]])

# Part 2 - Building the RNN 

### 1. Importing the Keras libraries and packages

In [10]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout

Using TensorFlow backend.


### 2. Initialising the RNN

In [11]:
regressor = Sequential()

### 3. Adding the first LSTM layer and some Dropout regularisation

Parameters:
    1. The number of units. The the number of LSTM, or memory units you want to have in this LSTM layer.
        - To increase the dimensionality, we need to have a large number of neurons (50).
    2. Return sequences: 
        - true: We're building a stacked LSTM which therefore have several layers. Set it to true when you are going to add other layer. Default value.
        - false: Set it to false when you are not going to add other layer.
    3. input shape: 3D
        - observations: automatically taken into account 
        - time steps, 
        - indicators
        - only need to add in first layer

In [12]:
regressor.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
regressor.add(Dropout(0.2))

### 4. Adding the second LSTM layer and some Dropout regularisation

In [13]:
regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.2))

### 5. Adding the third LSTM layer and some Dropout regularisation

In [14]:
regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.2))

### 6. Adding the fourth (last) LSTM layer and some Dropout regularisation

In [15]:
regressor.add(LSTM(units=50, return_sequences=False)) 
regressor.add(Dropout(0.2))

### 7. Adding the output layer
units: dim of the output layer (stock price)

In [16]:
regressor.add(Dense(units = 1))

### 8. Compiling the RNN
The most relevant two for RNN:
- RMSprop: Keras recommended RMSprop is usually a good choice for RNN
- Adam: It always performs some relevance updates of the weight

In [17]:
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

### 9. Fitting the RNN to the Training set
25, 50: wasn't convergence
100: observed some convergence

In [None]:
regressor.fit(X_train, y_train, epochs = 100)