Predict the Open price

# Part1 - Data Preprocessing

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### 1. import the training set

In [27]:
dataset_train = pd.read_csv('Google_Stock_Price_Train.csv')

#create a numpy array - open price
training_set = dataset_train.iloc[:, 1:2].values
training_set

array([[325.25],
       [331.27],
       [329.83],
       ...,
       [793.7 ],
       [783.33],
       [782.75]])

### 2. Feature Scaling

In [3]:
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set)
training_set_scaled

array([[0.08581368],
       [0.09701243],
       [0.09433366],
       ...,
       [0.95725128],
       [0.93796041],
       [0.93688146]])

### 3. Creating a data structure with 60 timesteps and 1 output

60 timesteps: 
- At each time t the RNN is going to look at the previous 60 stock prices
- Past information
- 60 gets from many Experiments

In [4]:
X_train = []
y_train = []
for i in range(60, 1258):
    X_train.append(training_set_scaled[i-60:i]) # previous 60 stock prices at time t(59)
    y_train.append(training_set_scaled[i, 0]) # predict the price at t+1(60)
X_train, y_train = np.array(X_train), np.array(y_train)
X_train.shape[1]

60

In [5]:
y_train

array([0.08627874, 0.08471612, 0.07454052, ..., 0.95725128, 0.93796041,
       0.93688146])

### 4. Reshape

keras.layers.RNN
input shapes: (batch_size, timesteps, input_dim)
- batch size: total number of stock price that we have from 2012-2016
- timesteps: 60
- input_dim: you also can add new predictors like close stock or stock price from other correlated companies
    - indicators
    - our case: 1

In [6]:
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_train

array([[[0.08581368],
        [0.09701243],
        [0.09433366],
        ...,
        [0.07846566],
        [0.08034452],
        [0.08497656]],

       [[0.09701243],
        [0.09433366],
        [0.09156187],
        ...,
        [0.08034452],
        [0.08497656],
        [0.08627874]],

       [[0.09433366],
        [0.09156187],
        [0.07984225],
        ...,
        [0.08497656],
        [0.08627874],
        [0.08471612]],

       ...,

       [[0.92106928],
        [0.92438053],
        [0.93048218],
        ...,
        [0.95475854],
        [0.95204256],
        [0.95163331]],

       [[0.92438053],
        [0.93048218],
        [0.9299055 ],
        ...,
        [0.95204256],
        [0.95163331],
        [0.95725128]],

       [[0.93048218],
        [0.9299055 ],
        [0.93113327],
        ...,
        [0.95163331],
        [0.95725128],
        [0.93796041]]])

# Part 2 - Building the RNN 

### 1. Importing the Keras libraries and packages

In [16]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout

### 2. Initialising the RNN

In [17]:
regressor = Sequential()

### 3. Adding the first LSTM layer and some Dropout regularisation

Parameters:
    1. The number of units. The the number of LSTM, or memory units you want to have in this LSTM layer.
        - To increase the dimensionality, we need to have a large number of neurons (50).
    2. Return sequences: 
        - true: We're building a stacked LSTM which therefore have several layers. Set it to true when you are going to add other layer. Default value.
        - false: Set it to false when you are not going to add other layer.
    3. input shape: 3D
        - observations: automatically taken into account 
        - time steps, 
        - indicators
        - only need to add in first layer

In [18]:
regressor.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
regressor.add(Dropout(0.2))

### 4. Adding the second LSTM layer and some Dropout regularisation

In [19]:
regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.2))

### 5. Adding the third LSTM layer and some Dropout regularisation

In [20]:
regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.2))

### 6. Adding the fourth (last) LSTM layer and some Dropout regularisation

In [21]:
regressor.add(LSTM(units=50, return_sequences=False)) 
regressor.add(Dropout(0.2))

### 7. Adding the output layer
units: dim of the output layer (stock price)

In [22]:
regressor.add(Dense(units = 1))

### 8. Compiling the RNN
The most relevant two for RNN:
- RMSprop: Keras recommended RMSprop is usually a good choice for RNN
- Adam: It always performs some relevance updates of the weight

In [23]:
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

### 9. Fitting the RNN to the Training set
25, 50: wasn't convergence
100: observed some convergence

In [24]:
regressor.fit(X_train, y_train, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x7f104808fc18>

the loss progressively decrease. In the end the loss is around: 0.0015

# Part 3 - Making the predictions and visualising the results

### 1. Get the real stock price of 2017

In [25]:
dataset_test = pd.read_csv("Google_Stock_Price_Test.csv")
real_stock_price = dataset_test.iloc[:, 1:2].values
real_stock_price

array([[778.81],
       [788.36],
       [786.08],
       [795.26],
       [806.4 ],
       [807.86],
       [805.  ],
       [807.14],
       [807.48],
       [807.08],
       [805.81],
       [805.12],
       [806.91],
       [807.25],
       [822.3 ],
       [829.62],
       [837.81],
       [834.71],
       [814.66],
       [796.86]])

### 2. Get the predicted stock price of 2017

In [28]:
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)  #0: Concat vertically
dataset_total

0       325.25
1       331.27
2       329.83
3       328.34
4       322.04
5       313.70
6       310.59
7       314.43
8       311.96
9       314.81
10      312.14
11      319.30
12      294.16
13      291.91
14      292.07
15      287.68
16      284.92
17      284.32
18      287.95
19      290.41
20      291.38
21      291.34
22      294.23
23      296.39
24      302.44
25      303.18
26      304.87
27      302.81
28      304.11
29      304.63
         ...  
1248    800.40
1249    790.22
1250    796.76
1251    795.84
1252    792.36
1253    790.90
1254    790.68
1255    793.70
1256    783.33
1257    782.75
0       778.81
1       788.36
2       786.08
3       795.26
4       806.40
5       807.86
6       805.00
7       807.14
8       807.48
9       807.08
10      805.81
11      805.12
12      806.91
13      807.25
14      822.30
15      829.62
16      837.81
17      834.71
18      814.66
19      796.86
Name: Open, Length: 1278, dtype: float64

In [32]:
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs) #apply the scaler as the one used in training set
X_test = []
for i in range(60, 80): #only 20 financial days in test_set
    X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = regressor.predict(X_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price) #scaling back
predicted_stock_price

array([[789.21643],
       [786.3139 ],
       [786.3156 ],
       [787.7035 ],
       [791.243  ],
       [797.49445],
       [803.4593 ],
       [806.21185],
       [806.7048 ],
       [806.2125 ],
       [805.5395 ],
       [804.89   ],
       [804.4417 ],
       [804.8159 ],
       [805.74634],
       [810.37683],
       [817.5916 ],
       [825.5772 ],
       [830.2358 ],
       [826.3746 ]], dtype=float32)