Use RNN to predict Microsoft's stock prices and volumes in the following 2 weeks (11/08/2021 to 11/19/2021). You will try all the techniques we learned in this week and compare the results with different models you come up with.

The following is the MSFT historical daily prices for the past 5 years until 11/5/2021. Assume you don't have any more data after this date.

There are 6 columns of this table. Please ignore the "Adj Close" column. You will predict all the other 5 columns. Here is an example of the most recent 3 days in your available data.

Date	Open	High	Low	Close*	Adj Close**	Volume
Nov 05, 2021	338.51	338.79	334.42	336.06	336.06	22,564,000
Nov 04, 2021	332.89	336.54	329.51	336.44	336.44	23,992,200
Nov 03, 2021	333.90	334.90	330.65	334.00	334.00	21,500,100

In [2]:
from google.colab import files
 
 
uploaded = files.upload()

Saving MSFT-1.csv to MSFT-1.csv


In [3]:
import pandas as pd
import io
 
df = pd.read_csv(io.BytesIO(uploaded['MSFT-1.csv']))
print(df)

            Date        Open        High         Low       Close   Adj Close  \
0      11/7/2016   59.779999   60.520000   59.779999   60.419998   55.902321   
1      11/8/2016   60.549999   60.779999   60.150002   60.470001   55.948589   
2      11/9/2016   60.000000   60.590000   59.200001   60.169998   55.671009   
3     11/10/2016   60.480000   60.490002   57.630001   58.700001   54.310928   
4     11/11/2016   58.230000   59.119999   58.009998   59.020000   54.607002   
...          ...         ...         ...         ...         ...         ...   
1254   11/1/2021  331.359985  331.489990  326.369995  329.369995  329.369995   
1255   11/2/2021  330.309998  333.450012  330.000000  333.130005  333.130005   
1256   11/3/2021  333.899994  334.899994  330.649994  334.000000  334.000000   
1257   11/4/2021  332.890015  336.540009  329.510010  336.440002  336.440002   
1258   11/5/2021  338.510010  338.790009  334.420013  336.059998  336.059998   

        Volume  
0     31664800  
1    

In [4]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,11/7/2016,59.779999,60.52,59.779999,60.419998,55.902321,31664800
1,11/8/2016,60.549999,60.779999,60.150002,60.470001,55.948589,22935400
2,11/9/2016,60.0,60.59,59.200001,60.169998,55.671009,49632500
3,11/10/2016,60.48,60.490002,57.630001,58.700001,54.310928,57822400
4,11/11/2016,58.23,59.119999,58.009998,59.02,54.607002,38767800


In [5]:
close = df["Close"]

In [6]:
close.head()

0    60.419998
1    60.470001
2    60.169998
3    58.700001
4    59.020000
Name: Close, dtype: float64

In [7]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

In [8]:
scaler = MinMaxScaler(feature_range=(0,1))
close = scaler.fit_transform(np.array(close).reshape(-1,1))

In [9]:
close[:5]

array([[0.00826387],
       [0.00844353],
       [0.00736562],
       [0.00208394],
       [0.00323369]])

In [10]:
def splitData(data, time_step=1):
    X_data, Y_data = [], []
    for i in range(len(data)-time_step-1):
        a = data[i:(i+time_step), 0]   
        X_data.append(a)
        Y_data.append(data[i + time_step, 0])
    return np.array(X_data), np.array(Y_data)

In [11]:
X, y = splitData(close, 50)

In [12]:
X.shape, y.shape

((1208, 50), (1208,))

In [13]:
from tensorflow import keras

In [14]:
model = keras.models.Sequential([
    keras.layers.LSTM(100, return_sequences=True, input_shape=[50, 1]),
    keras.layers.LSTM(100, return_sequences=True),
    keras.layers.LSTM(100),
    keras.layers.Dense(1)
])
model.compile(loss='mean_squared_error',optimizer='adam', metrics=['mean_squared_logarithmic_error'])

In [15]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 50, 100)           40800     
                                                                 
 lstm_1 (LSTM)               (None, 50, 100)           80400     
                                                                 
 lstm_2 (LSTM)               (None, 100)               80400     
                                                                 
 dense (Dense)               (None, 1)                 101       
                                                                 
Total params: 201,701
Trainable params: 201,701
Non-trainable params: 0
_________________________________________________________________


In [16]:
model.fit(X,y,epochs=50,batch_size=64,verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f527007b210>

In [17]:
df.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
1254,11/1/2021,331.359985,331.48999,326.369995,329.369995,329.369995,27073200
1255,11/2/2021,330.309998,333.450012,330.0,333.130005,333.130005,26487100
1256,11/3/2021,333.899994,334.899994,330.649994,334.0,334.0,21500100
1257,11/4/2021,332.890015,336.540009,329.51001,336.440002,336.440002,23992200
1258,11/5/2021,338.51001,338.790009,334.420013,336.059998,336.059998,22564000


In [18]:
from numpy import array
def prediction(data, n_steps=50):
    output=[]
    i=0
    temp1=data.reshape(1,-1)
    temp2=list(temp1)
    temp2=temp2[0].tolist()
    while(i<10):
        if(len(temp2)>50):
            temp1=np.array(temp2[1:])
            temp1=temp1.reshape(1,-1)
            temp1 = temp1.reshape((1, n_steps, 1))
            res = model.predict(temp1, verbose=0)
            temp2.extend(res[0].tolist())
            temp2=temp2[1:]
            output.extend(res.tolist())
        else:
            temp1 = temp1.reshape((1, n_steps,1))
            res = model.predict(temp1, verbose=0)
            temp2.extend(res[0].tolist())
            output.extend(res.tolist())
        i+=1
    return output

In [19]:
close_output = prediction(close[-50:], 50)

In [20]:
close_output

[[1.0181491374969482],
 [1.024025559425354],
 [1.0296193361282349],
 [1.0349591970443726],
 [1.0399352312088013],
 [1.0444610118865967],
 [1.048500895500183],
 [1.0520656108856201],
 [1.0551891326904297],
 [1.0579276084899902]]

In [21]:
close_output=scaler.inverse_transform(close_output)
close_output

array([[341.49127   ],
       [343.12679577],
       [344.68365572],
       [346.16984583],
       [347.55477567],
       [348.81439096],
       [349.93877138],
       [350.93090298],
       [351.80024158],
       [352.56241417]])

In [22]:
print('Output for Close values:')
for i,v in enumerate(close_output):
    print(f'11/{i+8}/ {v[0]}')

Output for Close values:
11/8/ 341.49127000259807
11/9/ 343.1267957713412
11/10/ 344.6836557200683
11/11/ 346.16984582626736
11/12/ 347.5547756698393
11/13/ 348.81439096166065
11/14/ 349.93877138111367
11/15/ 350.9309029778826
11/16/ 351.8002415759678
11/17/ 352.5624141687169


In [23]:
open = df["Open"]
scaler = MinMaxScaler(feature_range=(0,1))
open = scaler.fit_transform(np.array(open).reshape(-1,1))
X, y = splitData(open, 50)
model1 = keras.models.Sequential([
    keras.layers.LSTM(100, return_sequences=True, input_shape=[50, 1]),
    keras.layers.LSTM(100, return_sequences=True),
    keras.layers.LSTM(100),
    keras.layers.Dense(1)
])
model1.compile(loss='mean_squared_error',optimizer='adam', metrics=['mean_squared_logarithmic_error'])
model1.summary()
model1.fit(X,y,epochs=50,batch_size=64,verbose=0)
open_output = prediction(open[-50:], 50)
open_output=scaler.inverse_transform(open_output)
print('Output for Open values:')
for i,v in enumerate(open_output):
    print(f'11/{i+8}/ {v[0]}')

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_3 (LSTM)               (None, 50, 100)           40800     
                                                                 
 lstm_4 (LSTM)               (None, 50, 100)           80400     
                                                                 
 lstm_5 (LSTM)               (None, 100)               80400     
                                                                 
 dense_1 (Dense)             (None, 1)                 101       
                                                                 
Total params: 201,701
Trainable params: 201,701
Non-trainable params: 0
_________________________________________________________________
Output for Open values:
11/8/ 339.7998461094046
11/9/ 342.00690791758063
11/10/ 343.9593370195663
11/11/ 345.70427772036555
11/12/ 347.2704309115434
11/13/ 348.6714620932746
11/14/

In [24]:
high = df["High"]
scaler = MinMaxScaler(feature_range=(0,1))
high = scaler.fit_transform(np.array(high).reshape(-1,1))
X, y = splitData(high, 50)
model2 = keras.models.Sequential([
    keras.layers.LSTM(100, return_sequences=True, input_shape=[50, 1]),
    keras.layers.LSTM(100, return_sequences=True),
    keras.layers.LSTM(100),
    keras.layers.Dense(1)
])
model2.compile(loss='mean_squared_error',optimizer='adam', metrics=['mean_squared_logarithmic_error'])
model2.summary()
model2.fit(X,y,epochs=50,batch_size=64,verbose=0)
high_output = prediction(high[-50:], 50)
high_output=scaler.inverse_transform(high_output)
print('Output for High values:')
for i,v in enumerate(high_output):
    print(f'11/{i+8}/ {v[0]}')

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_6 (LSTM)               (None, 50, 100)           40800     
                                                                 
 lstm_7 (LSTM)               (None, 50, 100)           80400     
                                                                 
 lstm_8 (LSTM)               (None, 100)               80400     
                                                                 
 dense_2 (Dense)             (None, 1)                 101       
                                                                 
Total params: 201,701
Trainable params: 201,701
Non-trainable params: 0
_________________________________________________________________
Output for High values:
11/8/ 342.5783244183973
11/9/ 344.2819710052507
11/10/ 345.91686219973934
11/11/ 347.4705606782201
11/12/ 348.91315684469356
11/13/ 350.22581132152317
11/14

In [25]:
low = df["Low"]
scaler = MinMaxScaler(feature_range=(0,1))
low = scaler.fit_transform(np.array(low).reshape(-1,1))
X, y = splitData(low, 50)
model3 = keras.models.Sequential([
    keras.layers.LSTM(100, return_sequences=True, input_shape=[50, 1]),
    keras.layers.LSTM(100, return_sequences=True),
    keras.layers.LSTM(100),
    keras.layers.Dense(1)
])
model3.compile(loss='mean_squared_error',optimizer='adam', metrics=['mean_squared_logarithmic_error'])
model3.summary()
model3.fit(X,y,epochs=50,batch_size=64,verbose=0)
low_output = prediction(low[-50:], 50)
low_output=scaler.inverse_transform(low_output)
print('Output for Low values:')
for i,v in enumerate(low_output):
    print(f'11/{i+8}/ {v[0]}')


Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_9 (LSTM)               (None, 50, 100)           40800     
                                                                 
 lstm_10 (LSTM)              (None, 50, 100)           80400     
                                                                 
 lstm_11 (LSTM)              (None, 100)               80400     
                                                                 
 dense_3 (Dense)             (None, 1)                 101       
                                                                 
Total params: 201,701
Trainable params: 201,701
Non-trainable params: 0
_________________________________________________________________
Output for Low values:
11/8/ 336.78792150441717
11/9/ 338.6583158240225
11/10/ 340.37161605047555
11/11/ 341.9527325825643
11/12/ 343.4018636462738
11/13/ 344.71653141679093
11/14/

In [26]:
volume = df["Volume"]
scaler = MinMaxScaler(feature_range=(0,1))
volume = scaler.fit_transform(np.array(volume).reshape(-1,1))
X, y = splitData(volume, 50)
model4 = keras.models.Sequential([
    keras.layers.LSTM(100, return_sequences=True, input_shape=[50, 1]),
    keras.layers.LSTM(100, return_sequences=True),
    keras.layers.LSTM(100),
    keras.layers.Dense(1)
])
model4.compile(loss='mean_squared_error',optimizer='adam', metrics=['mean_squared_logarithmic_error'])
model4.summary()
model4.fit(X,y,epochs=50,batch_size=64,verbose=0)
volume_output = prediction(volume[-50:], 50)
volume_output=scaler.inverse_transform(volume_output)
print('Output for Volume values:')
for i,v in enumerate(volume_output):
    print(f'11/{i+8}/ {v[0]}')

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_12 (LSTM)              (None, 50, 100)           40800     
                                                                 
 lstm_13 (LSTM)              (None, 50, 100)           80400     
                                                                 
 lstm_14 (LSTM)              (None, 100)               80400     
                                                                 
 dense_4 (Dense)             (None, 1)                 101       
                                                                 
Total params: 201,701
Trainable params: 201,701
Non-trainable params: 0
_________________________________________________________________
Output for Volume values:
11/8/ 28862004.921412468
11/9/ 27914529.89527434
11/10/ 27247825.166000426
11/11/ 26844872.33619243
11/12/ 26640801.625093818
11/13/ 26563246.55584842
11/