# LSTM Stock Predictor Using Closing Prices

In this notebook, you will build and train a custom LSTM RNN that uses a 10 day window of Bitcoin closing prices to predict the 11th day closing price. 

You will need to:

1. Prepare the data for training and testing
2. Build and train a custom LSTM RNN
3. Evaluate the performance of the model

## Data Preparation

In this section, you will need to prepare the training and testing data for the model. The model will use a rolling 10 day window to predict the 11th day closing price.

You will need to:
1. Use the `window_data` function to generate the X and y values for the model.
2. Split the data into 70% training and 30% testing
3. Apply the MinMaxScaler to the X and y values
4. Reshape the X_train and X_test data for the model. Note: The required input format for the LSTM is:

```python
reshape((X_train.shape[0], X_train.shape[1], 1))
```

In [5]:
!pip3 install hvplot




In [6]:
import numpy as np
import pandas as pd
import hvplot.pandas
import bokeh
%matplotlib inline
%matplotlib notebook

In [7]:
# Set the random seed for reproducibility
# Note: This is for the homework solution, but it is good practice to comment this out and run multiple experiments to evaluate your model
from numpy.random import seed
seed(1)
from tensorflow import random
random.set_seed(2)

In [8]:
# Load the fear and greed sentiment data for Bitcoin
df = pd.read_csv('btc_sentiment (1).csv')
#df = df.drop(columns="fng_classification")
df.head()

Unnamed: 0,date,fng_value,fng_classification
0,29-07-2019,19,Extreme Fear
1,28-07-2019,16,Extreme Fear
2,27-07-2019,47,Neutral
3,26-07-2019,24,Extreme Fear
4,25-07-2019,42,Fear


In [9]:
# Load the historical closing prices for Bitcoin
df2 = pd.read_csv('btc_historic (1).csv')
df2 = df2.sort_index()
df2.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
569,2019-07-25,9772.139648,10184.429688,9744.700195,9882.429688,9882.429688,403576364
570,2019-07-26,9882.429688,9890.049805,9668.519531,9847.450195,9847.450195,312717110
571,2019-07-27,9847.450195,10202.950195,9310.469727,9478.320313,9478.320313,512612117
572,2019-07-28,9478.320313,9591.519531,9135.639648,9531.769531,9531.769531,267243770
573,2019-07-29,9531.730469,9717.69043,9386.900391,9529.889648,9529.889648,277409600


In [10]:
# Join the data into a single DataFrame
df = df.join(df2, how="inner")
df.tail()

Unnamed: 0,date,fng_value,fng_classification,Date,Open,High,Low,Close,Adj Close,Volume
538,05-02-2018,11,Extreme Fear,2019-06-24,10855.990234,11100.919922,10555.709961,11035.740234,11035.740234,565444720
539,04-02-2018,24,Fear,2019-06-25,11035.740234,11778.219727,10992.370117,11740.339844,11740.339844,953962631
540,03-02-2018,40,Fear,2019-06-26,11740.339844,13826.759766,11679.099609,12913.280273,12913.280273,2685872365
541,02-02-2018,15,Extreme Fear,2019-06-27,12913.280273,13314.049805,10335.339844,11154.089844,11154.089844,2345027203
542,01-02-2018,30,Fear,2019-06-28,11154.089844,12433.0,10772.75,12355.05957,12355.05957,1408438810


In [11]:
# This function accepts the column number for the features (X) and the target (y)
# It chunks the data up with a rolling window of Xt-n to predict Xt
# It returns a numpy array of X any y
def window_data(df, window, feature_col_number, target_col_number):
    X = []
    y = []
    for i in range(len(df) - window - 1):
        features = df.iloc[i:(i + window), feature_col_number]
        target = df.iloc[(i + window), target_col_number]
        X.append(features)
        y.append(target)
    return np.array(X), np.array(y).reshape(-1, 1)

In [12]:
# Predict Closing Prices using a 10 day window of previous closing prices
# Then, experiment with window sizes anywhere from 1 to 10 and see how the model performance changes
window_size = 10

# Column index 0 is the 'fng_value' column
# Column index 1 is the `Close` column
feature_column = 1
target_column = 1
X, y = window_data(df, window_size, feature_column, target_column)

In [13]:
# Use 70% of the data for training and the remaineder for testing
split = int(0.7 * len(X))
X_train = X[: split]
X_test = X[split:]
y_train = y[: split]
y_test = y[split:]

### Scaling Data with `MinMaxScaler`

In [14]:
# Use the MinMaxScaler to scale data between 0 and 1.
from sklearn.preprocessing import MinMaxScaler

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Fit the MinMaxScaler object with the training feature data X_train
scaler.fit(X_train)

# Scale the features training and testing sets
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Fit the MinMaxScaler object with the training target data y_train
scaler.fit(y_train)

# Scale the target training and testing sets
y_train = scaler.transform(y_train)
y_test = scaler.transform(y_test)

In [15]:
# Reshape the features for the model
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
print (f"X_train sample values:\n{X_train[:5]} \n")
print (f"X_test sample values:\n{X_test[:5]}")

X_train sample values:
[[[0.11627907]
  [0.08139535]
  [0.44186047]
  [0.1744186 ]
  [0.38372093]
  [0.12790698]
  [0.36046512]
  [0.38372093]
  [0.38372093]
  [0.29069767]]

 [[0.08139535]
  [0.44186047]
  [0.1744186 ]
  [0.38372093]
  [0.12790698]
  [0.36046512]
  [0.38372093]
  [0.38372093]
  [0.29069767]
  [0.38372093]]

 [[0.44186047]
  [0.1744186 ]
  [0.38372093]
  [0.12790698]
  [0.36046512]
  [0.38372093]
  [0.38372093]
  [0.29069767]
  [0.38372093]
  [0.36046512]]

 [[0.1744186 ]
  [0.38372093]
  [0.12790698]
  [0.36046512]
  [0.38372093]
  [0.38372093]
  [0.29069767]
  [0.38372093]
  [0.36046512]
  [0.11627907]]

 [[0.38372093]
  [0.12790698]
  [0.36046512]
  [0.38372093]
  [0.38372093]
  [0.29069767]
  [0.38372093]
  [0.36046512]
  [0.11627907]
  [0.29069767]]] 

X_test sample values:
[[[0.46511628]
  [0.40697674]
  [0.43023256]
  [0.39534884]
  [0.44186047]
  [0.40697674]
  [0.38372093]
  [0.34883721]
  [0.31395349]
  [0.26744186]]

 [[0.40697674]
  [0.43023256]
  [0.395348

## Build and Train the LSTM RNN

In this section, you will design a custom LSTM RNN and fit (train) it using the training data.

You will need to:
1. Define the model architecture
2. Compile the model
3. Fit the model to the training data

### Hints:
You will want to use the same model architecture and random seed for both notebooks. This is necessary to accurately compare the performance of the FNG model vs the closing price model.

In [16]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

In [17]:
# Define the LSTM RNN model.
model = Sequential()

number_units = 5
dropout_fraction = 0.7

# Layer 1
model.add(LSTM(
    units=number_units,
    return_sequences=True,
    input_shape=(X_train.shape[1], 1))
    )
model.add(Dropout(dropout_fraction))
# Layer 2
model.add(LSTM(units=number_units, return_sequences=True))
model.add(Dropout(dropout_fraction))
# Layer 3
model.add(LSTM(units=number_units))
model.add(Dropout(dropout_fraction))
# Output layer
model.add(Dense(1))

In [18]:
# Compile the model
model.compile(optimizer="adam", loss="mean_squared_error")

In [19]:
# Summarize the model
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 10, 5)             140       
                                                                 
 dropout (Dropout)           (None, 10, 5)             0         
                                                                 
 lstm_1 (LSTM)               (None, 10, 5)             220       
                                                                 
 dropout_1 (Dropout)         (None, 10, 5)             0         
                                                                 
 lstm_2 (LSTM)               (None, 5)                 220       
                                                                 
 dropout_2 (Dropout)         (None, 5)                 0         
                                                                 
 dense (Dense)               (None, 1)                 6

In [20]:
# Train the model
# Use at least 10 epochs
# Do not shuffle the data
# Experiement with the batch size, but a smaller batch size is recommended
# Train the model
model.fit(X_train, y_train, epochs=10, shuffle=False, batch_size=1, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f06732654d0>

## Model Performance

In this section, you will evaluate the model using the test data. 

You will need to:
1. Evaluate the model using the `X_test` and `y_test` data.
2. Use the X_test data to make predictions
3. Create a DataFrame of Real (y_test) vs predicted values. 
4. Plot the Real vs predicted values as a line chart

### Hints
Remember to apply the `inverse_transform` function to the predicted and y_test values to recover the actual closing prices.

In [21]:
# Evaluate the model
model.evaluate(X_test, y_test)



0.02418304607272148

In [22]:
# Make some predictions
predicted_prices = model.predict(X_test)
# Recover the original prices instead of the scaled version
predicted_prices = scaler.inverse_transform(predicted_prices)
real_prices = scaler.inverse_transform(y_test.reshape(-1, 1))

In [23]:
stocks = pd.DataFrame({
    "Real": real_prices.ravel(),
    "Predicted": predicted_prices.ravel()
}, index = df.index[-len(real_prices): ]) 
stocks.head()

Unnamed: 0,Real,Predicted
383,29.0,34.005569
384,29.0,33.658531
385,33.0,33.296658
386,29.0,32.908203
387,37.0,32.553223


In [None]:
# Plot the real vs predicted values as a line chart
import matplotlib.pyplot as plt

stocks.plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x7fda1a72d8d0>

In [None]:
stocks.to_csv("stocks.csv")