<a href="https://colab.research.google.com/github/JoshABurk/RNN-Stock-Exchange-Project/blob/main/Semester_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
# **Semester Project**

Due: December 15, 2022

---

---
# **Objective**

The objective of this project is build a classifier that can predict whether the GME stock price will rise or fall on a day to day basis.

# **Explanation**

In trading, There is a unit called a candlestick which represents the data of a stock per a specific unit of time. Traders use these candlesticks to identify patters and predict upward or downward trends. For this project, each data point represents a candlestick that represents 1 day of the GME stock. A candlestick who's closing price is higher then opening price means the stock raised in value and vice versa. Using a RNN will create a machine that can predict the properties of the next candlestick in a sequence.

---

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

import pandas as pd
import numpy as np
import math

import sklearn
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler


from tensorflow import keras
from keras.models import Sequential
from keras import Input
from keras.layers import Dense, SimpleRNN

import plotly
import plotly.express as px
import plotly.graph_objects as go

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


---
# **Loading Data**

the database GME_stock was obtained from kaggle

URL: https://www.kaggle.com/datasets/hananxx/gamestop-historical-stock-prices

---

In [None]:
df = pd.read_csv('/content/gdrive/My Drive/GME_stock.csv')## load in the data
print(df)

            date  open_price  high_price   low_price  close_price  \
0     2021-01-28  265.000000  483.000000  112.250000   193.600006   
1     2021-01-27  354.829987  380.000000  249.000000   347.510010   
2     2021-01-26   88.559998  150.000000   80.199997   147.979996   
3     2021-01-25   96.730003  159.179993   61.130001    76.790001   
4     2021-01-22   42.590000   76.760002   42.320000    65.010002   
...          ...         ...         ...         ...          ...   
4768  2002-02-20    9.600000    9.875000    9.525000     9.875000   
4769  2002-02-19    9.900000    9.900000    9.375000     9.550000   
4770  2002-02-15   10.000000   10.025000    9.850000     9.950000   
4771  2002-02-14   10.175000   10.195000    9.925000    10.000000   
4772  2002-02-13    9.625000   10.060000    9.525000    10.050000   

           volume  adjclose_price  
0      58815800.0      193.600006  
1      93396700.0      347.510010  
2     178588000.0      147.979996  
3     177874000.0       76.

---
# **Preparing the Data**

from the original 7 classes provided by the dataset, only open_price and close_price are needed to determine if the candlestick was positive or negative. I then build a new data frame to hold the sequence of positive an negatives to be used for pattern recognition through a reccurent neural network.

In [None]:
DataDf = pd.read_csv('/content/gdrive/My Drive/GME_stock.csv')
DataDf = DataDf.drop(columns = ["date", "high_price", "low_price", "adjclose_price"])## remove columns from the Data Frame
DataDf['% loss/gain'] = (DataDf['close_price']-DataDf['open_price'])/DataDf['open_price']## create a new column called % loss/gain and fill

df = pd.DataFrame()## build a new DF
data = []
for i in range(len(DataDf['% loss/gain'])):## the first collumn in the DF is a sequence  from 1 to df length
  data.append(i)
df['sequence'] = data

data = []
for i in (DataDf['% loss/gain']): ## 0 = big loss, 1 = small loss, 2 = small gain, 3 = big gain
  if i < -.1:## lost more than 10% stock value
    data.append(0)
  elif i < 0:## lost between 0 and 10% stock value
    data.append(1)
  elif i < .1:## gained between 0 and 10% stock value
    data.append(2)
  else:## gained more than 10% stock value
    data.append(3)
df['result'] = data

print(df)

      sequence  result
0            0       0
1            1       1
2            2       3
3            3       0
4            4       3
...        ...     ...
4768      4768       2
4769      4769       1
4770      4770       1
4771      4771       1
4772      4772       2

[4773 rows x 2 columns]


In [None]:
def prep_data(datain, time_step):
  y_indices = np.arange(start=time_step, stop=len(datain), step=time_step)
  y_tmp = datain[y_indices]

  rows_X = len(y_tmp)
  X_tmp = datain[range(time_step*rows_X)]
  X_tmp = np.reshape(X_tmp, (rows_X, time_step, 1))
  return X_tmp, y_tmp

X = df[['result']]
scaler = MinMaxScaler() ## normalize the data to fall within 0 to 1
X = scaler.fit_transform(X)

train_data, test_data = train_test_split(X, test_size=0.2, shuffle=False) ## splits the data into 80% training and 20% test, dont shuffle because we want the sequence.
time_step = 9## this is the number of previous candlesticks to take into account when predicting the next in the sequence
X_train, y_train = prep_data(train_data, time_step)
X_test, y_test = prep_data(test_data, time_step)

---
# **Building the RNN**

I build the rnn by specifying the input, hidden, and output layers. I then compile the RNN and fit the Training data to it running only 10 epochs for time sake.

---

In [None]:
## builds the rnn to have an input, hidden layer with tanh activation function, another tanh hidden layer, and a linear output
rnn = Sequential(name="RNN")
rnn.add(Input(shape=(time_step, 1), name='Input-Layer'))
rnn.add(SimpleRNN(units=1, activation='tanh', name='Hidden-Recurrent-Layer'))
rnn.add(Dense(units=1, activation='tanh', name='Hidden-Layer'))
rnn.add(Dense(units=1, activation = 'linear', name='Output-Layer'))

rnn.compile(optimizer='adam', loss='mean_squared_error', metrics = ['MeanSquaredError', 'MeanAbsoluteError'], run_eagerly=None, steps_per_execution=None)

rnn.fit(X_train,y_train,batch_size=1, epochs=10)

pred_train = rnn.predict(X_train)
pred_test = rnn.predict(X_test)


print('----------- Evaluation on Training Data ----------')
print("MSE: ", mean_squared_error(y_train, pred_train))

print('----------- Evaluation on Test Data ----------')
print("MSE: ", mean_squared_error(y_test, pred_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
----------- Evaluation on Training Data ----------
MSE:  0.03060752095960936
----------- Evaluation on Test Data ----------
MSE:  0.02866125911090333
