# Stock Market Prediction using Machine Learning (Google Stock Data)

## Overview
This Python script is designed to analyze and predict Google stock prices using machine learning techniques. Below is a detailed explanation of its structure and functionality.
Stock Market Prediction using Machine Learning (Google Stock Data)

This script follows a structured approach:
- Importing necessary libraries
- Loading Google stock data 
- Data preprocessing and feature engineering
- Exploratory Data Analysis (EDA)
- Training Machine Learning models
- Making predictions and evaluating performance

## Importing Libraries

The script begins by importing essential libraries:

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout, Input

## Loading data

In [2]:
data = pd.read_csv("C:/Users/dell/OneDrive/Desktop/ML_Project/Stock_Market/datasetsandcodefilesstockmarketprediction/Google_train_data.csv")

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    1258 non-null   object 
 1   Open    1258 non-null   float64
 2   High    1258 non-null   float64
 3   Low     1258 non-null   float64
 4   Close   1258 non-null   object 
 5   Volume  1258 non-null   object 
dtypes: float64(3), object(3)
memory usage: 59.1+ KB


In [4]:
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,1/3/2012,325.25,332.83,324.97,663.59,7380500
1,1/4/2012,331.27,333.87,329.08,666.45,5749400
2,1/5/2012,329.83,330.75,326.89,657.21,6590300
3,1/6/2012,328.34,328.77,323.68,648.24,5405900
4,1/9/2012,322.04,322.29,309.46,620.76,11688800


In [5]:
data.tail()

Unnamed: 0,Date,Open,High,Low,Close,Volume
1253,12/23/2016,790.9,792.74,787.28,789.91,623400
1254,12/27/2016,790.68,797.86,787.66,791.55,789100
1255,12/28/2016,793.7,794.23,783.2,785.05,1153800
1256,12/29/2016,783.33,785.93,778.92,782.79,744300
1257,12/30/2016,782.75,782.78,770.41,771.82,1770000


## Data Preprocessing
- Changing Close column datatype into float from object
- Handling missing values

In [6]:
data['Close'] = pd.to_numeric(data.Close, errors = 'coerce')
data = data.dropna()
trainData = data.iloc[:, 4:5].values

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1149 entries, 0 to 1257
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    1149 non-null   object 
 1   Open    1149 non-null   float64
 2   High    1149 non-null   float64
 3   Low     1149 non-null   float64
 4   Close   1149 non-null   float64
 5   Volume  1149 non-null   object 
dtypes: float64(4), object(2)
memory usage: 62.8+ KB


In [8]:
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,1/3/2012,325.25,332.83,324.97,663.59,7380500
1,1/4/2012,331.27,333.87,329.08,666.45,5749400
2,1/5/2012,329.83,330.75,326.89,657.21,6590300
3,1/6/2012,328.34,328.77,323.68,648.24,5405900
4,1/9/2012,322.04,322.29,309.46,620.76,11688800


In [9]:
sc = MinMaxScaler(feature_range = (0, 1))
trainData = sc.fit_transform(trainData)
trainData.shape

(1149, 1)

In [10]:
X_train = []
y_train = []

for i in range (60, 1149): # 60 : timestep // 1149 : length of the data
    X_train.append(trainData[i - 60: i, 0])
    y_train.append(trainData[i, 0])

X_train, y_train = np.array(X_train), np.array(y_train)

In [11]:
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1)) # adding the batch_size axis
X_train.shape

(1089, 60, 1)

## Define LSTM model

In [12]:
model = Sequential()

#model.add(LSTM(units = 100, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(Input(shape=(X_train.shape[1], 1)))  # Define input shape separately
model.add(LSTM(units=100, return_sequences=True))  # No need for input_shape here

model.add(Dropout(0.2))

model.add(LSTM(units = 100, return_sequences = True))
model.add(Dropout(0.2))

model.add(LSTM(units = 100, return_sequences = True))
model.add(Dropout(0.2))

model.add(LSTM(units = 100, return_sequences = False))
model.add(Dropout(0.2))

model.add(Dense(units = 1))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')

In [None]:
hist = model.fit(X_train, y_train, epochs = 20, batch_size = 32, verbose = 2)

Epoch 1/20
35/35 - 8s - 220ms/step - loss: 0.0373
Epoch 2/20
35/35 - 3s - 88ms/step - loss: 0.0115
Epoch 3/20
35/35 - 3s - 91ms/step - loss: 0.0109
Epoch 4/20
35/35 - 3s - 88ms/step - loss: 0.0094
Epoch 5/20
35/35 - 3s - 95ms/step - loss: 0.0083
Epoch 6/20
35/35 - 3s - 91ms/step - loss: 0.0072
Epoch 7/20
35/35 - 3s - 96ms/step - loss: 0.0068
Epoch 8/20


In [None]:
plt.plot(hist.history['loss'])
plt.title('Training model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train'], loc = 'upper right')
plt.show()

In [None]:
testData = pd.read_csv("C:/Users/dell/OneDrive/Desktop/ML_Project/Stock_Market/datasetsandcodefilesstockmarketprediction/Google_test_data.csv")
testData['Close'] = pd.to_numeric(testData.Close, errors = 'coerce')
testData = testData.dropna()
testData = testData.iloc[:, 4:5]
y_test = testData.iloc[60:, 0:].values
#input array for the model
inputClosing = testData.iloc[:, 0:].values
inputClosing_scaled = sc.transform(inputClosing)
inputClosing_scaled.shape
X_test = []
length = len(testData)
timestep = 60
for i in range(timestep, length):
    X_test.append(inputClosing_scaled[i - timestep:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
X_test.shape

## Making Predictions & Evaluating Performance

In [None]:
y_pred = model.predict(X_test)
y_pred

In [None]:
predicted_price = sc.inverse_transform(y_pred)

## Plot between predicted and actual stock prices

In [None]:
plt.plot(y_test, color = 'red', label = 'Actual Stock Price')
plt.plot(predicted_price, color = 'green', label = 'Predicted Dtock Price')
plt.title('Google stock price prediction')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()