# Research on Stock Market Prediction

This notebook contains codes for the research on the comparative analysis of the machine learning and deep learning algorithms used for the prediction of stocks.

Root Mean Sqaure Error is used as the measure in this research.

The dataset I have used in this research is a dataset which contains the stock price and volume information that I have downloaded from Yahoo Finance (https://finance.yahoo.com).

URL : https://finance.yahoo.com/quote/TSLA/history?period1=1277769600&period2=1677542400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true

This contains data from Jun 29, 2010 to Mar 01, 2023

The performance of the below algorithms has been analyzed:
1. Linear Regression
2. Decision Tree Regressor
3. Random Forest Regressor
4. K Neighbors Regressor
5. Support Vector Regressor
6. Ada Boost Regressor
7. Bagging Regressor
8. Gradient Boosting Regressor
9. Extreme Gradient Boosting (XG Boost) Regressor
10. Light Gradient Boosting Machine (LGBM) Regressor
11. Multi Layer Perceptron (MLP) Regressor
12. Long Short Term Memory (LSTM) Network
13. Convolutional Neural Network (CNN)

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

In [2]:
# Load the data into a pandas dataframe
df = pd.read_csv('TSLA.csv')

In [3]:
# Count the number of null values in each column
null_counts = df.isnull().sum()

In [4]:
# Print the results
print(null_counts)

Date         0
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64


In [5]:
# Split the dataset into train and test sets
train_size = int(len(df) * 0.8)
train_df, test_df = df[:train_size], df[train_size:]

In [6]:
# Preprocess the data
scaler = MinMaxScaler()
train_df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']] = scaler.fit_transform(train_df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']])
test_df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']] = scaler.transform(test_df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train_df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']] = scaler.fit_transform(train_df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']] = scaler.transform(test_df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']])


## Perform grid search cross-validation for each model

### 1. Linear Regression

In [7]:
# Importing LinearRegression class from linear_model module
from sklearn.linear_model import LinearRegression

# Instantiating Linear Regressor
lr_model = LinearRegression()

# Creating the hyperparameter grid
lr_params = {
    'normalize': [True, False]
}

# Performing grid search using cross-validation
lr_grid = GridSearchCV(lr_model, lr_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
lr_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {lr_model.__class__.__name__}: {lr_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
lr_y_pred = lr_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
lr_rmse = np.sqrt(mean_squared_error(test_df['Close'], lr_y_pred))

# Print the RMSE score
print(f"RMSE for {lr_model.__class__.__name__}: {lr_rmse}")

Best parameters for LinearRegression: {'normalize': True}

RMSE for LinearRegression: 0.033065415695596054


If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LinearRegression())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)




### 2. Decision Tree Regressor

In [8]:
# Importing DecisionTreeRegressor class from tree module
from sklearn.tree import DecisionTreeRegressor

# Instantiating Decision Tree Regressor
dt_model = DecisionTreeRegressor()

# Creating the hyperparameter grid
dt_params = {
    'splitter': ['best', 'random'], 
    'max_depth': [None, 5, 10, 20], 
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4], 
    'max_features': ['auto', 'sqrt', 'log2', None]
}

# Performing grid search using cross-validation
dt_grid = GridSearchCV(dt_model, dt_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
dt_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {dt_model.__class__.__name__}: {dt_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
dt_y_pred = dt_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
dt_rmse = np.sqrt(mean_squared_error(test_df['Close'], dt_y_pred))

# Print the RMSE score
print(f"RMSE for {dt_model.__class__.__name__}: {dt_rmse}")

Best parameters for DecisionTreeRegressor: {'max_depth': 10, 'max_features': 'log2', 'min_samples_leaf': 1, 'min_samples_split': 5, 'splitter': 'best'}

RMSE for DecisionTreeRegressor: 1.4088338987803934


### 3. Random Forest Regressor

In [9]:
# Importing RandomForestRegressor class from ensemble module
from sklearn.ensemble import RandomForestRegressor

# Instantiating Random Forest Regressor
rf_model =RandomForestRegressor()

# Creating the hyperparameter grid
rf_params = {
    'n_estimators': [100, 300, 500],
    'max_depth': [5, 10, 15, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2', None]
}

# Performing grid search using cross-validation
rf_grid = GridSearchCV(rf_model, rf_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
rf_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {rf_model.__class__.__name__}: {rf_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
rf_y_pred = rf_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
rf_rmse = np.sqrt(mean_squared_error(test_df['Close'], rf_y_pred))

# Print the RMSE score
print(f"RMSE for {rf_model.__class__.__name__}: {rf_rmse}")

Best parameters for RandomForestRegressor: {'max_depth': 15, 'max_features': 'log2', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}

RMSE for RandomForestRegressor: 1.3749296840169858


### 4. K Neighbors Regressor

In [10]:
# Importing KNeighborsRegressor class from neighbors module
from sklearn.neighbors import KNeighborsRegressor

# Instantiating K Neighbors Regressor
knn_model = KNeighborsRegressor()

# Creating the hyperparameter grid
knn_params = {
    'n_neighbors': [3, 5, 7, 9, 11],
    'weights': ['uniform', 'distance'],
    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
    'leaf_size': [10, 20, 30, 40, 50],
    'p': [1, 2]
}

# Performing grid search using cross-validation
knn_grid = GridSearchCV(knn_model, knn_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
knn_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {knn_model.__class__.__name__}: {knn_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
knn_y_pred = knn_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
knn_rmse = np.sqrt(mean_squared_error(test_df['Close'], knn_y_pred))

# Print the RMSE score
print(f"RMSE for {knn_model.__class__.__name__}: {knn_rmse}")

Best parameters for KNeighborsRegressor: {'algorithm': 'brute', 'leaf_size': 10, 'n_neighbors': 3, 'p': 2, 'weights': 'distance'}

RMSE for KNeighborsRegressor: 1.3626707270422802


### 5. Support Vector Regressor

In [11]:
# Importing SVR class from svm module
from sklearn.svm import SVR

# Instantiating Support Vector Regressor
svr_model = SVR()

# Creating the hyperparameter grid
svr_params = {
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'C': [0.1, 1, 10],
    'gamma': ['scale', 'auto']
}

# Performing grid search using cross-validation
svr_grid = GridSearchCV(svr_model, svr_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
svr_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {svr_model.__class__.__name__}: {svr_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
svr_y_pred = svr_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
svr_rmse = np.sqrt(mean_squared_error(test_df['Close'], svr_y_pred))

# Print the RMSE score
print(f"RMSE for {svr_model.__class__.__name__}: {svr_rmse}")

Best parameters for SVR: {'C': 10, 'gamma': 'auto', 'kernel': 'poly'}

RMSE for SVR: 10.897913972812146


### 6. Ada Boost Regressor 

In [12]:
# Importing AdaBoostRegressor class from ensemble module
from sklearn.ensemble import AdaBoostRegressor

# Instantiating Ada Boost Regressor
ada_model = AdaBoostRegressor()

# Creating the hyperparameter grid
ada_params = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 1],
    'loss': ['linear', 'square', 'exponential']
}

# Performing grid search using cross-validation
ada_grid = GridSearchCV(ada_model, ada_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
ada_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {ada_model.__class__.__name__}: {ada_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
ada_y_pred = ada_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
ada_rmse = np.sqrt(mean_squared_error(test_df['Close'], ada_y_pred))

# Print the RMSE score
print(f"RMSE for {ada_model.__class__.__name__}: {ada_rmse}")

Best parameters for AdaBoostRegressor: {'learning_rate': 1, 'loss': 'square', 'n_estimators': 100}

RMSE for AdaBoostRegressor: 1.3732635180712136


### 7. Bagging Regressor

In [13]:
# Importing BaggingRegressor class from ensemble module
from sklearn.ensemble import BaggingRegressor

# Instantiating Bagging Regressor
bag_model = BaggingRegressor()

# Creating the hyperparameter grid
bag_params = {
    'n_estimators': [10, 50, 100],
    'max_samples': [0.5, 0.8, 1],
    'max_features': [0.5, 0.8, 1],
    'bootstrap': [True, False],
    'bootstrap_features': [True, False]
}

# Performing grid search using cross-validation
bag_grid = GridSearchCV(bag_model, bag_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
bag_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {bag_model.__class__.__name__}: {bag_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
bag_y_pred = bag_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
bag_rmse = np.sqrt(mean_squared_error(test_df['Close'], bag_y_pred))

# Print the RMSE score
print(f"RMSE for {bag_model.__class__.__name__}: {bag_rmse}")

Best parameters for BaggingRegressor: {'bootstrap': True, 'bootstrap_features': False, 'max_features': 0.8, 'max_samples': 0.5, 'n_estimators': 10}

RMSE for BaggingRegressor: 1.361275825707145


### 8. Gradient Boosting Regressor

In [14]:
# Importing GradientBoostingRegressor class from ensemble module
from sklearn.ensemble import GradientBoostingRegressor

# Instantiating Gradient Boosting Regressor
gbr_model = GradientBoostingRegressor()

# Creating the hyperparameter grid
gbr_params = {
    'learning_rate': [0.01, 0.1, 1],
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'subsample': [0.5, 0.8, 1],
    'max_features': ['auto', 'sqrt', 'log2']
}

# Performing grid search using cross-validation
gbr_grid = GridSearchCV(gbr_model, gbr_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
gbr_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {gbr_model.__class__.__name__}: {gbr_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
gbr_y_pred = gbr_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
gbr_rmse = np.sqrt(mean_squared_error(test_df['Close'], gbr_y_pred))

# Print the RMSE score
print(f"RMSE for {gbr_model.__class__.__name__}: {gbr_rmse}")

Best parameters for GradientBoostingRegressor: {'learning_rate': 0.1, 'max_depth': 7, 'max_features': 'auto', 'min_samples_leaf': 4, 'min_samples_split': 10, 'n_estimators': 100, 'subsample': 0.8}

RMSE for GradientBoostingRegressor: 1.3608467121287828


### 9. Extreme Gradient Boosting (XG Boost) Regressor

In [15]:
# Importing XGBRegressor class from xgboost module
from xgboost import XGBRegressor

# Instantiating XG Boost Regressor
xgb_model = XGBRegressor(objective='reg:squarederror')

# Creating the hyperparameter grid
xgb_params = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 1],
    'subsample': [0.5, 0.8, 1],
    'colsample_bytree': [0.5, 0.8, 1],
    'gamma': [0, 1, 5]
}

# Performing grid search using cross-validation
xgb_grid = GridSearchCV(xgb_model, xgb_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
xgb_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {xgb_model.__class__.__name__}: {xgb_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
xgb_y_pred = xgb_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
xgb_rmse = np.sqrt(mean_squared_error(test_df['Close'], xgb_y_pred))

# Print the RMSE score
print(f"RMSE for {xgb_model.__class__.__name__}: {xgb_rmse}")

Best parameters for XGBRegressor: {'colsample_bytree': 0.8, 'gamma': 0, 'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 200, 'subsample': 0.8}

RMSE for XGBRegressor: 1.3803745310066542


### 10. Light Gradient Boosting Machine (LGBM) Regressor

In [17]:
# Importing LGBMRegressor class from lightgbm module
from lightgbm import LGBMRegressor

# Instantiating LGBM Regressor
lgb_model = LGBMRegressor()

# Creating the hyperparameter grid
lgb_params = {
    'boosting_type': ['gbdt', 'dart', 'goss'],
    'num_leaves': [10, 20, 30],
    'learning_rate': [0.01, 0.1, 0.5],
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'min_child_samples': [10, 20, 30],
    'subsample': [0.5, 0.8, 1],
    'colsample_bytree': [0.5, 0.8, 1],
    'reg_alpha': [0, 0.1, 1],
    'reg_lambda': [0, 0.1, 1]
}

# Performing grid search using cross-validation
lgb_grid = GridSearchCV(lgb_model, lgb_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
lgb_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {lgb_model.__class__.__name__}: {lgb_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
lgb_y_pred = lgb_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
lgb_rmse = np.sqrt(mean_squared_error(test_df['Close'], lgb_y_pred))

# Print the RMSE score
print(f"RMSE for {lgb_model.__class__.__name__}: {lgb_rmse}")


Best parameters for LGBMRegressor: {'boosting_type': 'goss', 'colsample_bytree': 0.5, 'learning_rate': 0.5, 'max_depth': 7, 'min_child_samples': 10, 'n_estimators': 200, 'num_leaves': 10, 'reg_alpha': 0, 'reg_lambda': 0, 'subsample': 0.5}

RMSE for LGBMRegressor: 1.3971426365006197


## Neural Networks

### 11. Multi Layer Perceptron (MLP) Regressor

In [18]:
# Importing MLPRegressor class from neural_network module
from sklearn.neural_network import MLPRegressor

# Instantiating MLP Regressor
mlp_model = MLPRegressor(max_iter=1000)

# Creating the hyperparameter grid
mlp_params = {
    'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 100)],
    'activation': ['logistic', 'tanh', 'relu'],
    'solver': ['lbfgs', 'adam'],
    'alpha': [0.0001, 0.001, 0.01]
}

# Performing grid search using cross-validation
mlp_grid = GridSearchCV(mlp_model, mlp_params, scoring='neg_root_mean_squared_error', cv=5, n_jobs=-1)

# Fitting the GridSearchCV object to the training data
mlp_grid.fit(train_df[['Open', 'High', 'Low', 'Volume']], train_df['Close'])

# Print the best hyperparameters
print(f"Best parameters for {mlp_model.__class__.__name__}: {mlp_grid.best_params_}\n")

# Predicting the target variable values for the test dataset
mlp_y_pred = mlp_grid.predict(test_df[['Open', 'High', 'Low', 'Volume']])

# Calculating the RMSE score between prdicted and actual values
mlp_rmse = np.sqrt(mean_squared_error(test_df['Close'], mlp_y_pred))

# Print the RMSE score
print(f"RMSE for {mlp_model.__class__.__name__}: {mlp_rmse}")


Best parameters for MLPRegressor: {'activation': 'tanh', 'alpha': 0.0001, 'hidden_layer_sizes': (100,), 'solver': 'adam'}

RMSE for MLPRegressor: 0.4668113910359436


### 12. Long Short Term Memory (LSTM) Network

This is a type of Recurrent Neural Network (RNN)

In [19]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

In [20]:
# Load the dataset
df = pd.read_csv('TSLA.csv')

In [21]:
# Extract the relevant features and target variable
X = df[['Open', 'High', 'Low', 'Volume']].values
y = df['Close'].values

In [22]:
# Scale the data
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
y_scaled = scaler.fit_transform(y.reshape(-1, 1))


In [23]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size=0.2, random_state=42)

In [24]:
# Define the hyperparameters to tune
epochs = [50, 100]
batch_sizes = [16, 32]
num_neurons = [64, 128]

In [25]:
# Define a function to build the LSTM network
def build_model(num_neurons):
    model = Sequential()
    model.add(LSTM(num_neurons, input_shape=(X_train.shape[1], X_train.shape[2]), return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(num_neurons, return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(num_neurons))
    model.add(Dropout(0.2))
    model.add(Dense(1))
    model.compile(loss='mse', optimizer=Adam(learning_rate=0.001))
    return model


In [26]:
# Reshape the input data
timesteps = 1
X_train = X_train.reshape((X_train.shape[0], timesteps, X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0], timesteps, X_test.shape[1]))


In [27]:
# Use GridSearchCV to perform a grid search over the hyperparameters
model = KerasRegressor(build_fn=build_model, verbose=0)

param_grid = dict(num_neurons=num_neurons, batch_size=batch_sizes, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
grid_result = grid.fit(X_train, y_train)

# Print the best hyperparameters
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))


  model = KerasRegressor(build_fn=build_model, verbose=0)


Best: -0.000094 using {'batch_size': 16, 'epochs': 100, 'num_neurons': 64}


In [28]:
# Train the final LSTM model
best_model = build_model(grid_result.best_params_['num_neurons'])
history = best_model.fit(X_train, y_train, epochs=grid_result.best_params_['epochs'], batch_size=grid_result.best_params_['batch_size'], validation_data=(X_test, y_test), callbacks=[EarlyStopping(patience=3), ModelCheckpoint('best_model.h5', save_best_only=True)])

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100


In [29]:
# Load the best model
best_model.load_weights('best_model.h5')

In [30]:
# Make predictions on the testing set
y_pred = best_model.predict(X_test)



In [31]:
# Rescale the predictions and actual values
y_pred_rescaled = scaler.inverse_transform(y_pred)
y_test_rescaled = scaler.inverse_transform(y_test)

In [32]:
# Calculate the root mean squared error
rmse = np.sqrt(mean_squared_error(y_test_rescaled, y_pred_rescaled))
print('RMSE:', rmse)

RMSE: 4.244816510145789


### 13. Convolutional Neural Network (CNN)

In [33]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv1D, MaxPooling1D
from keras.optimizers import Adam
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV

In [34]:
# Load the data
df = pd.read_csv('TSLA.csv')

In [35]:
# Set the target variable and features
target_var = 'Close'
feature_cols = ['Open', 'High', 'Low', 'Volume']

In [36]:
# Split the data into training and testing sets
train_size = int(len(df) * 0.8)
train_data, test_data = df.iloc[0:train_size], df.iloc[train_size:len(df)]

In [37]:
# Scale the data using MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_train_data = scaler.fit_transform(train_data[feature_cols])
scaled_test_data = scaler.transform(test_data[feature_cols])

In [38]:
# Define the CNN model
def create_model(learning_rate=0.001, filters=32, kernel_size=3, pool_size=2, dropout_rate=0.25):
    model = Sequential()
    model.add(Conv1D(filters=filters, kernel_size=kernel_size, activation='relu', input_shape=(len(feature_cols), 1)))
    model.add(MaxPooling1D(pool_size=pool_size))
    model.add(Flatten())
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, activation='linear'))

    optimizer = Adam(lr=learning_rate)
    model.compile(loss='mse', optimizer=optimizer)

    return model

In [39]:
# Wrap the Keras model in a scikit-learn regressor
model = KerasRegressor(build_fn=create_model, verbose=0)

  model = KerasRegressor(build_fn=create_model, verbose=0)


In [40]:
# Define the hyperparameters to tune
params = {'learning_rate': [0.001, 0.01],
          'filters': [32, 64],
          'kernel_size': [3, 5],
          'pool_size': [2, 4],
          'dropout_rate': [0.25, 0.5]}

In [41]:
# Use GridSearchCV to find the best hyperparameters
grid = GridSearchCV(estimator=model, param_grid=params, scoring='neg_mean_squared_error', n_jobs=-1)
grid_result = grid.fit(np.expand_dims(scaled_train_data, axis=2), train_data[target_var])

120 fits failed out of a total of 160.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
2 fits failed with the following error:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\94776\AppData\Roaming\Python\Python39\site-packages\keras\wrappers\scikit_learn.py", line 164, in fit
    self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
  File "C:\Users\94776\AppData\Local\Temp\ipykernel_25000\754990038.py", line 5, in create_model
  File "C:\Users\94776\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\trackable\base.py", line 205, in _method_

In [42]:
# Print the best hyperparameters
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))


Best: -8.313666 using {'dropout_rate': 0.25, 'filters': 64, 'kernel_size': 3, 'learning_rate': 0.01, 'pool_size': 2}


In [43]:
# Predict on the test set using the best model
best_model = grid.best_estimator_.model
y_pred = best_model.predict(np.expand_dims(scaled_test_data, axis=2))
y_test = test_data[target_var]



In [44]:
# Calculate the RMSE score
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print('RMSE:', rmse)

RMSE: 11.539836541207455
