# LSTM Stock Predictor Using Closing Prices
In this notebook, you will build and train a custom LSTM RNN that uses a 10 day window of Carnival Stock prices closing prices to predict the 11th day closing price.

We are:

Preparing the data for training and testing Building and train a custom LSTM RNN Evaluating the performance of the model

# Data Preparation
In this section, you will need to prepare the training and testing data for the model. The model will use a rolling 10 day window to predict the 11th day closing price.

You will need to:

Use the window_data function to generate the X and y values for the model. Split the data into 70% training and 30% testing Apply the MinMaxScaler to the X and y values Reshape the X_train and X_test data for the model. Note: The required input format for the LSTM is: reshape((X_train.shape[0], X_train.shape[1], 1))

In [3]:
import numpy as np
import pandas as pd
import hvplot.pandas
import yfinance as yf

In [4]:
# Setting the random seed for reproducibility
# Note: It is good practice to comment this out and run multiple experiments to evaluate your model
# ask Jeff random seed 1 vs set seed 2
from numpy.random import seed
seed(1)
from tensorflow import random
random.set_seed(2)

In [None]:
# Loading the Volatility index data
df = pd.read_csv('stock_data/vix.csv', index_col="Date", infer_datetime_format=True, parse_dates=True)
df = df.drop(columns= "Open")
df = df.rename(columns={'Close': 'vix_value'})
df['vix_value']=df['vix_value'].astype(int) 
df.index = df.index.normalize()
df.head()

In [None]:
# Slicing the historical Volatility index data to Covid dates
df_sliced = df.loc['2020-03-02':'2021-12-30']
df_sliced.tail()

In [None]:
# Loading the historical closing prices for Carnival
df2 = pd.read_csv('stock_data/ccl.csv', index_col="Date", infer_datetime_format=True, parse_dates=True)['Close']
df2 = df2.sort_index()
df2.index = df2.index.normalize()
df2.tail()

In [None]:
# Slicing the historical closing prices for Carnival Cruises to Covid dates
df2_sliced = df2['2020-03-02':'2020-12-30']
df2_sliced.head()

In [None]:
# Joining the data into a single DataFrame
df_sliced = df_sliced.join(df2_sliced, how="inner")
df_sliced.tail()

In [None]:
df_sliced.head()

In [None]:
# This function accepts the column number for the features (X) and the target (y)
# It chunks the data up with a rolling window of Xt-n to predict Xt
# It returns a numpy array of X any y
def window_data(df_sliced, window, feature_col_number, target_col_number):
    X = []
    y = []
    for i in range(len(df_sliced) - window - 1):
        features = df_sliced.iloc[i:(i + window), feature_col_number]
        target = df_sliced.iloc[(i + window), target_col_number]
        X.append(features)
        y.append(target)
    return np.array(X), np.array(y).reshape(-1, 1)

In [None]:
# Predicting Closing Prices using a 200 day window of previous closing prices
# Then, experimenting with window sizes anywhere from 1 to 10 and see how the model performance changes
window_size = 5

# Column index 0 is the 'vix_value' column
# Column index 1 is the `Close` column
feature_column = 1
target_column = 1
X, y = window_data(df_sliced, window_size, feature_column, target_column)

In [None]:
# Using 70% of the data for training and the remaineder for testing
split = int(0.7 * len(X))

X_train = X[: split]
X_test = X[split:]

y_train = y[: split]
y_test = y[split:]

In [None]:
from sklearn.preprocessing import MinMaxScaler
# Using the MinMaxScaler to scale data between 0 and 1.
# Creating a MinMaxScaler object
scaler = MinMaxScaler()
# Fitting the MinMaxScaler object with the features data X
scaler.fit(X)

# Scaling the features training and testing sets
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Fitting the MinMaxScaler object with the target data Y
scaler.fit(y)

# Scaling the target training and testing sets
y_train_scaled = scaler.transform(y_train)
y_test_scaled = scaler.transform(y_test)

In [None]:
# Reshaping the features for the model
X_train_scaled = X_train_scaled.reshape((X_train_scaled.shape[0], X_train_scaled.shape[1], 1))
X_test_scaled = X_test_scaled.reshape((X_test_scaled.shape[0], X_test_scaled.shape[1], 1))

# Printing some sample data after reshaping the datasets
print (f"X_train sample values:\n{X_train_scaled[:3]} \n")
print (f"X_test sample values:\n{X_test_scaled[:3]}")

# Build and Train the LSTM RNN
In this section, you will design a custom LSTM RNN and fit (train) it using the training data.

You will need to:

Define the model architecture Compile the model Fit the model to the training data

# Hints:

You will want to use the same model architecture and random seed for both notebooks. This is necessary to accurately compare the performance of the Vix model vs the closing price model.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

In [None]:
# Building the LSTM model. 
# The return sequences need to be set to True if you are adding additional LSTM layers, but 
# No need to do this for the final layer. 
# Note: The dropouts help prevent overfitting
# Note: The input shape is the number of time steps and the number of indicators
# Note: Batching inputs has a different input shape of Samples/TimeSteps/Features

# Defining the LSTM RNN model.
model = Sequential()

# Initial model setup
number_units = 30
dropout_fraction = 0.2

# Layer 1
model.add(LSTM(
    units=number_units,
    return_sequences=True,
    input_shape=(X_train_scaled.shape[1], 1)))
model.add(Dropout(dropout_fraction))

# Layer 2
model.add(LSTM(units=number_units, return_sequences=True))
model.add(Dropout(dropout_fraction))

# Layer 3
model.add(LSTM(units=number_units))
model.add(Dropout(dropout_fraction))

# Output layer
model.add(Dense(1))

In [None]:
# Compiling the model
model.compile(optimizer="adam", loss="mean_squared_error")

In [None]:
# Summarizing the model
model.summary()

In [None]:
# Training the model
# Using at least 10 epochs
# Do not shuffle the data
# Experiementing with the batch size, but a smaller batch size is recommended
model.fit(X_train_scaled, y_train_scaled, epochs=10, shuffle=False, batch_size=10, verbose=1)

### Model Performance
In this section, you will evaluate the model using the test data.

You will need to:

1. Evaluate the model using the X_test and y_test data.
2. Use the X_test data to make predictions
3. Create a DataFrame of Real (y_test) vs predicted values.
4. Plot the Real vs predicted values as a line chart

### Hints
Remember to apply the inverse_transform function to the predicted and y_test values to recover the actual closing prices.

In [None]:
# Evaluating the model
model.evaluate(X_test_scaled, y_test_scaled)

In [None]:
# Making some predictions
predicted = model.predict(X_test_scaled)

In [None]:
# Recovering the original prices instead of the scaled version
predicted_prices = scaler.inverse_transform(predicted)
real_prices = scaler.inverse_transform(y_test_scaled.reshape(-1, 1))

In [None]:
# Creating a DataFrame of Real and Predicted values
stocks = pd.DataFrame({
    "Real": real_prices.ravel(),
    "Predicted": predicted_prices.ravel()
}, index = df_sliced.index[-len(real_prices): ]) 
stocks.head()

In [None]:
# Plotting the real vs predicted values as a line chart
stocks.plot(title="Real Vs. Predicted Prices")