# Predicting Stock Market Using Deep Learning

We gonna make stock market predictions based on past data

In order to achieve a great result, we will use the best available technologies and methods through Deep Learning models

## Steps to achieve the prediction

1. Scrapping the Stock Market data and Technical indicator data
2. Performing Data Processing on Time Series data
3. Creating and Training a LSTM Sequential Model on Tensorflow

In [2]:
# How to see the GPU informations
!nvidia-smi

Sat Oct 15 19:53:25 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
# yfinance offers a threaded and Pythonic way to download market data from Yahoo!Ⓡ finance
!pip install yfinance

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting yfinance
  Downloading yfinance-0.1.77-py2.py3-none-any.whl (28 kB)
Collecting requests>=2.26
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
[K     |████████████████████████████████| 62 kB 1.5 MB/s 
Installing collected packages: requests, yfinance
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
Successfully installed requests-2.28.1 yfinance-0.1.77


In [4]:
# Imports
import yfinance as yf
import pandas as pd
import numpy as np
import tensorflow as tf

In [65]:
# Download the stock market data based on a ticker
"""
Ticker = TSLA(Tesla), Start = When start the stock Market, Interval = The frequency to fetch data
""" 
data = yf.download("TSLA" , start = "2018-01-01" , interval = '1d')

[*********************100%***********************]  1 of 1 completed


In [66]:
# How much data we have?
data.shape

(1206, 6)

In [67]:
# Show the first 3 lines
data.head(3)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-01-02 00:00:00-05:00,20.799999,21.474001,20.733334,21.368668,21.368668,65283000
2018-01-03 00:00:00-05:00,21.4,21.683332,21.036667,21.15,21.15,67822500
2018-01-04 00:00:00-05:00,20.858,21.236668,20.378668,20.974667,20.974667,149194500


## Understanding Trends with in the Data

In [68]:
# Sort the data points based on indexes just for confirmation
data.sort_index(inplace = True)

In [69]:
# Remove any duplicated index
data = data.loc[~data.index.duplicated(keep='first')]

In [70]:
# Show the last 3 lines
data.tail(3)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2022-10-12 00:00:00-04:00,215.330002,219.300003,211.509995,217.240005,217.240005,66860700
2022-10-13 00:00:00-04:00,208.300003,222.990005,206.220001,221.720001,221.720001,91483000
2022-10-14 00:00:00-04:00,224.009995,226.259995,204.160004,204.990005,204.990005,93898700


In [71]:
# Check missing values in the data
data.isnull().sum()

Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64

In [72]:
# Show some statistics about the dataset
data.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,1206.0,1206.0,1206.0,1206.0,1206.0,1206.0
mean,129.645696,132.635773,126.380986,129.573764,129.573764,133509700.0
std,119.904423,122.615973,116.822271,119.724988,119.724988,91548460.0
min,12.073333,12.445333,11.799333,11.931333,11.931333,29401800.0
25%,20.861334,21.237501,20.388168,20.866333,20.866333,75762750.0
50%,57.861334,59.409,56.265999,58.198,58.198,102571000.0
75%,237.822498,243.611668,233.620831,238.077503,238.077503,155307800.0
max,411.470001,414.496674,405.666656,409.970001,409.970001,914082000.0


### Lets Plot data with Plotly

In [73]:
import plotly.graph_objects as go
# Check the trend in Closing Values

fig = go.Figure()

fig.add_trace(go.Scatter(x = data.index, y = data['Close'], mode = 'lines'))
fig.update_layout(height = 500, width = 900,
                  xaxis_title = 'Date', yaxis_title = 'Close')
fig.show()

In [74]:
# Check the trend in Volume
fig.add_trace(go.Scatter(x = data.index, y = data['Volume'], mode = 'lines'))
fig.update_layout(height = 500, width = 900,
                  xaxis_title = 'Date', yaxis_title = 'Volume')
fig.show()

## Data Preparation

In [75]:
from sklearn.preprocessing import MinMaxScaler
import pickle # See and retrieve any python objects
from tqdm.notebook import tnrange # Grafic progress bar to see the loading process

In [76]:
# Filter only required data
data = data[['Close', 'Volume']]
data.head(3)

Unnamed: 0_level_0,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-01-02 00:00:00-05:00,21.368668,65283000
2018-01-03 00:00:00-05:00,21.15,67822500
2018-01-04 00:00:00-05:00,20.974667,149194500


In [85]:
 # Confirm the testing Set Length
test_length = data[(data.index >= '2021-06-01')].shape[0]
print('Dataset shape is: {}'.format(data.shape[0]))
print('Test data shape is: {}'.format(test_length))

Dataset shape is: 1206
Test data shape is: 348


In [86]:
def CreateFeatures_and_Targets(data, feature_length):
    X = []
    Y = []

    for i in tnrange(len(data) - feature_length): 
        X.append(data.iloc[i : i + feature_length,:].values)
        Y.append(data["Close"].values[i+feature_length])

    X = np.array(X)
    Y = np.array(Y)
    return X , Y

In [87]:
X , Y = CreateFeatures_and_Targets(data , 32)

  0%|          | 0/1174 [00:00<?, ?it/s]

In [88]:
# Print the Shape
X.shape, Y.shape

((1174, 32, 2), (1174,))

In [89]:
# Train Test Split
Xtrain , Xtest , Ytrain , Ytest = X[:-test_length] , X[-test_length:] , Y[:-test_length] , Y[-test_length:]

In [90]:
# Check Train Shape
Xtrain.shape, Ytrain.shape

((826, 32, 2), (826,))

In [91]:
# Check Test Shape
Xtest.shape, Ytest.shape

((348, 32, 2), (348,))

Create a Scaler to scale Vectors

In [92]:
class MultiDimesionalScaler():
  def __init__(self):
    self.scalers = []

  def fit_transform(self, X):
    total_dims = X.shape[2]
    for i in range(total_dims):
      Scaler = MinMaxScaler()
      X[:, :, i] = Scaler.fit_transform(X[:, :, i])
      self.scalers.append(Scaler)
    return X
    
  def transform(self, X):
    for i in range(X.shape[2]):
      X[:, :, i] = self.scalers[i].transform(X[:, :, i])
    return X

In [93]:
# Scale the features based on created MultiDimesionalScaler class
Feature_Scaler = MultiDimesionalScaler()
Xtrain = Feature_Scaler.fit_transform(Xtrain)
Xtest = Feature_Scaler.transform(Xtest)

In [94]:
# Scale the target based on MinMaxScale function
Target_Scaler = MinMaxScaler()
Ytrain = Target_Scaler.fit_transform(Ytrain.reshape(-1,1))
Ytest = Target_Scaler.transform(Ytest.reshape(-1,1))

In [95]:
# Python pickle module is used for serializing and de-serializing python object structures.
def save_object(obj, name : str):
  pickle_out = open(f'{name}.pck', 'wb')
  pickle.dump(obj, pickle_out)
  pickle_out.close()

def load_object(name: str):
  pickle_in = open(f'{name}.pck', 'rb')
  data = pickle.load(pickle_in)
  return data  

In [96]:
# Save your objects for future purposes
save_object(Feature_Scaler, "Feature_Scaler")
save_object(Target_Scaler, "Target_Scaler")

## Model Buiding

In [97]:
# Define Callbacks
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau

"""
ModelCheckpoint: Callback to save the Keras model or model weights at some frequency
ReduceLROnPlateau: Reduce learning rate when a metric has stopped improving
"""

save_best = ModelCheckpoint(
    "best_weights.h5",
     monitor='val_loss',
     save_best_only=True,
     save_weights_only=True
     )

reduce_lr = ReduceLROnPlateau(    
    monitor='val_loss',
    factor=0.25,
    patience=5,
    min_lr=0.00001,
    verbose = 1
    )       

In [110]:
# Define a Sequential Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, Bidirectional

model = Sequential()
model.add(Bidirectional(LSTM(
    512,
    return_sequences=True,
    recurrent_dropout=0.1,
    input_shape=(32,2)))
    )
model.add(LSTM(256, recurrent_dropout=0.1))
model.add(Dropout(0.3))
model.add(Dense(64, activation='elu'))
model.add(Dropout(0.3))
model.add(Dense(32, activation='elu'))
model.add(Dense(1, activation='linear'))#Final Layer



In [111]:
# Define the Optimizer and Compile
optimizer = tf.keras.optimizers.SGD(learning_rate = 0.002)
model.compile(loss='mse', optimizer=optimizer)

In [112]:
# Fit the data into the Sequantial Model
history = model.fit(Xtrain, Ytrain,
                    epochs=10,
                    batch_size=1,
                    verbose=1,
                    shuffle=False,
                    validation_data=(Xtest, Ytest),
                    callbacks=[reduce_lr, save_best])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [113]:
# Checking the model Structure 
model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bidirectional_5 (Bidirectio  (1, 32, 1024)            2109440   
 nal)                                                            
                                                                 
 lstm_11 (LSTM)              (1, 256)                  1311744   
                                                                 
 dropout_10 (Dropout)        (1, 256)                  0         
                                                                 
 dense_15 (Dense)            (1, 64)                   16448     
                                                                 
 dropout_11 (Dropout)        (1, 64)                   0         
                                                                 
 dense_16 (Dense)            (1, 32)                   2080      
                                                      

In [114]:
# Load the best weights
model.load_weights("best_weights.h5")

In [115]:
Predictions = model.predict(Xtest)



In [116]:
Predictions = Target_Scaler.inverse_transform(Predictions)
Actual = Target_Scaler.inverse_transform(Ytest)

In [117]:
Predictions.shape

(348, 1)

In [118]:
Predictions = np.squeeze(Predictions, axis = 1)
Actual = np.squeeze(Actual, axis=1)

In [119]:
# Check the performance with Root Mean Squared Error - RMSE
rmse = np.sqrt(np.mean(Predictions - Ytest)**2)
rmse

244.57095418882324

In [120]:
# Check the Predictions vs Actual
fig = go.Figure()

fig.add_trace(go.Scatter(x = data.index[-test_length:], y = Actual, mode = 'lines', name = 'Actual'))
fig.add_trace(go.Scatter(x = data.index[-test_length:], y = Predictions, mode = 'lines', name = 'Predicted'))
fig.show()

In [121]:
# Apply in the whole dataset
# Concatenating Features
Total_features = np.concatenate((Xtrain, Xtest), axis = 0)

In [122]:
# Concatenating Targets
Total_Targets = np.concatenate((Ytrain, Ytest), axis = 0)

In [123]:
# Trainning the model
Predictions = model.predict(Total_features)



In [124]:
# Inverting the Scaller
Predictions = Target_Scaler.inverse_transform(Predictions)
Actual = Target_Scaler.inverse_transform(Total_Targets)

In [125]:
# Convert an array to a vector
Predictions = np.squeeze(Predictions, axis = 1)
Actual = np.squeeze(Actual, axis = 1)

In [126]:
 # Check the trend in Volume tradded
 fig = go.Figure()

 fig.add_trace(go.Scatter(x = data.index, y = Actual, mode = 'lines', name = 'Actual'))
 fig.add_trace(go.Scatter(x = data.index, y = Predictions, mode = 'lines', name = 'Predicted'))
 fig.show()

### Real Time Prediction

In [153]:
# Function to make the real time prediction
def PredictStockPrice(Model, DataFrame, PreviousDate, feature_length = 32):
  idx_location = DataFrame.index.get_loc(PreviousDate)
  Features = DataFrame.iloc[idx_location - feature_length : idx_location,:].values
  Features = np.expand_dims(Features, axis = 0)
  Features = Feature_Scaler.transform(Features)
  Prediction = Model.predict(Features)
  Prediction = Target_Scaler.inverse_transform(Prediction)
  return Prediction[0][0]

real_time_pred = PredictStockPrice(model, data, '2022-10-12')  



In [154]:
# Get the actual stock from Yahoo Finance API
actual_test = yf.download("TSLA" , start = "2022-10-13", end = "2022-10-14")

[*********************100%***********************]  1 of 1 completed


In [156]:
print('The Actual stock related 2022-10-13 is: {}'.format(actual_test['Close'][0]))
print('Our prediction related 2022-10-13 is: {}'.format(real_time_pred))

The Actual stock related 2022-10-13 is: 221.72000122070312
Our prediction related 2022-10-13 is: 230.13604736328125
