# LSTM Stock Predictor Using Fear and Greed Index

In this notebook, you will build and train a custom LSTM RNN that uses a 10 day window of Bitcoin fear and greed index values to predict the 11th day closing price. 

You will need to:

1. Prepare the data for training and testing
2. Build and train a custom LSTM RNN
3. Evaluate the performance of the model

## Data Preparation

In this section, you will need to prepare the training and testing data for the model. The model will use a rolling 10 day window to predict the 11th day closing price.

You will need to:
1. Use the `window_data` function to generate the X and y values for the model.
2. Split the data into 70% training and 30% testing
3. Apply the MinMaxScaler to the X and y values
4. Reshape the X_train and X_test data for the model. Note: The required input format for the LSTM is:

```python
reshape((X_train.shape[0], X_train.shape[1], 1))
```

In [35]:
import numpy as np
import pandas as pd
import hvplot.pandas

from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

%matplotlib inline

In [36]:
# Set the random seed for reproducibility
# Note: This is for the homework solution, but it is good practice to comment this out and run multiple experiments to evaluate your model
from numpy.random import seed
seed(1)
from tensorflow import random
random.set_seed(2)

In [37]:
# Load the fear and greed sentiment data for Bitcoin
df = pd.read_csv('btc_sentiment.csv', index_col="date", infer_datetime_format=True, parse_dates=True)
df = df.drop(columns="fng_classification")

df.head()

Unnamed: 0_level_0,fng_value
date,Unnamed: 1_level_1
2019-07-29,19
2019-07-28,16
2019-07-27,47
2019-07-26,24
2019-07-25,42


In [38]:
# Load the historical closing prices for Bitcoin
df2 = pd.read_csv('btc_historic.csv', index_col="Date", infer_datetime_format=True, parse_dates=True)['Close']
df2 = df2.sort_index(ascending=False)
df2.head()

Date
2019-07-29    9529.889648
2019-07-28    9531.769531
2019-07-27    9478.320313
2019-07-26    9847.450195
2019-07-25    9882.429688
Name: Close, dtype: float64

In [39]:
# Join the data into a single DataFrame
df = df.join(df2, how="inner")
df.tail()

Unnamed: 0,fng_value,Close
2019-07-25,42,9882.429688
2019-07-26,24,9847.450195
2019-07-27,47,9478.320313
2019-07-28,16,9531.769531
2019-07-29,19,9529.889648


In [40]:
df.head()

Unnamed: 0,fng_value,Close
2018-02-01,30,9114.719727
2018-02-02,15,8870.820313
2018-02-03,40,9251.269531
2018-02-04,24,8218.049805
2018-02-05,11,6937.080078


In [41]:
# This function accepts the column number for the features (X) and the target (y)
# It chunks the data up with a rolling window of Xt-n to predict Xt
# It returns a numpy array of X any y
def window_data(df, window, feature_col_number, target_col_number):
    X = []
    y = []
    for i in range(len(df) - window - 1):
        features = df.iloc[i : (i + window), feature_col_number]
        target = df.iloc[(i + window), target_col_number]
        X.append(features)
        y.append(target)
    return np.array(X), np.array(y).reshape(-1, 1)

In [42]:
# Predict Closing Prices using a 10 day window of fear and greed index values and a target of the 11th day closing price
# Try a window size anywhere from 1 to 10 and see how the model performance changes
window_size = 3

# Column index 1 is the `Close` column

feature_column = 0
target_column = 1
X, y = window_data(df, window_size, feature_column, target_column)
print(X[0:2])
X.shape

[[30 15 40]
 [15 40 24]]


(539, 3)

In [43]:
# Use 70% of the data for training and the remainder for testing
# YOUR CODE HERE!
split = int(0.7 * len(X))
X_train = X[: split - 1]
X_test = X[split:]
y_train = y[: split - 1]
y_test = y[split:]

print(len(y_train)/(len(y_train)+len(y_test)))

0.6988847583643123


In [44]:
# Use MinMaxScaler to scale the data between 0 and 1. 
# YOUR CODE HERE!
scaler = MinMaxScaler()
scaler.fit(X)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
scaler.fit(y)
y_train = scaler.transform(y_train)
y_test = scaler.transform(y_test)

In [45]:
# Reshape the features for the model
# YOUR CODE HERE!
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
X_test[:2]

array([[[0.40229885],
        [0.40229885],
        [0.37931034]],

       [[0.40229885],
        [0.37931034],
        [0.34482759]]])

---

## Build and Train the LSTM RNN

In this section, you will design a custom LSTM RNN and fit (train) it using the training data.

You will need to:
1. Define the model architecture
2. Compile the model
3. Fit the model to the training data

### Hints:
You will want to use the same model architecture and random seed for both notebooks. This is necessary to accurately compare the performance of the FNG model vs the closing price model. 

In [46]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

In [47]:
# Build the LSTM model. 
# The return sequences need to be set to True if you are adding additional LSTM layers, but 
# You don't have to do this for the final layer. 
# YOUR CODE HERE!
model = Sequential()

number_units = 7
dropout_fraction = .2

# Layer 1
model.add(LSTM(
    units=number_units,
    return_sequences=True,
    input_shape=(X_train.shape[1], 1))
    )
model.add(Dropout(dropout_fraction))
# Layer 2
model.add(LSTM(units=number_units, return_sequences=True))
model.add(Dropout(dropout_fraction))
# Layer 3
model.add(LSTM(units=number_units))
model.add(Dropout(dropout_fraction))
# Output layer
model.add(Dense(1))

In [48]:
# Compile the model
# YOUR CODE HERE!
model.compile(optimizer="adam", loss="mean_squared_error", metrics="mean_absolute_percentage_error")

In [49]:
# Summarize the model
# YOUR CODE HERE!
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_3 (LSTM)                (None, 3, 7)              252       
_________________________________________________________________
dropout_3 (Dropout)          (None, 3, 7)              0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 3, 7)              420       
_________________________________________________________________
dropout_4 (Dropout)          (None, 3, 7)              0         
_________________________________________________________________
lstm_5 (LSTM)                (None, 7)                 420       
_________________________________________________________________
dropout_5 (Dropout)          (None, 7)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                

In [50]:
# Train the model
# Use at least 10 epochs
# Do not shuffle the data
# Experiement with the batch size, but a smaller batch size is recommended
# YOUR CODE HERE!
model.fit(X_train, y_train, epochs=20, shuffle=False, batch_size=1, verbose=True)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x28550c905c8>

---

## Model Performance

In this section, you will evaluate the model using the test data. 

You will need to:
1. Evaluate the model using the `X_test` and `y_test` data.
2. Use the X_test data to make predictions
3. Create a DataFrame of Real (y_test) vs predicted values. 
4. Plot the Real vs predicted values as a line chart

### Hints
Remember to apply the `inverse_transform` function to the predicted and y_test values to recover the actual closing prices.

In [51]:
# Evaluate the model
# YOUR CODE HERE!
model.evaluate(X_test, y_test, verbose=True)



[0.09389527142047882, 102.67762756347656]

In [52]:
# Make some predictions
# YOUR CODE HERE!
predicted = model.predict(X_test)
predicted.shape

(162, 1)

In [53]:
# Recover the original prices instead of the scaled version
predicted_prices = scaler.inverse_transform(predicted)
real_prices = scaler.inverse_transform(y_test.reshape(-1, 1))

In [54]:
# Create a DataFrame of Real and Predicted values
stocks = pd.DataFrame({
    "Real": real_prices.ravel(),
    "Predicted": predicted_prices.ravel()
})
stocks.head()

Unnamed: 0,Real,Predicted
0,3670.919922,5374.427246
1,3912.570068,5311.458008
2,3924.23999,5409.569824
3,3974.050049,5711.539551
4,3937.040039,6007.106445


In [55]:
# Plot the real vs predicted values as a line chart
# YOUR CODE HERE!
stocks.hvplot(y=['Real', 'Predicted'])

In [56]:
windows_l=[]
windows_a=[]
for win in range(1,11):
    print(f'window {win}')
    nodes_l=[]
    nodes_a=[]
    for node in range(1,32,3):
        epo_l=[]
        epo_a=[]
        for epo in range(10,22,2):
            print(f'node {node}')
            window_size = win

            feature_column = 1
            target_column = 1
            X, y = window_data(df, window_size, feature_column, target_column)
            split = int(0.7 * len(X))
            X_train = X[: split - 1]
            X_test = X[split:]
            y_train = y[: split - 1]
            y_test = y[split:]
            scaler = MinMaxScaler()
            scaler.fit(X)
            X_train = scaler.transform(X_train)
            X_test = scaler.transform(X_test)
            scaler.fit(y)
            y_train = scaler.transform(y_train)
            y_test = scaler.transform(y_test)
            X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
            X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
            model = Sequential()

            number_units = node
            dropout_fraction = .2

            # Layer 1
            model.add(LSTM(
                units=number_units,
                return_sequences=True,
                input_shape=(X_train.shape[1], 1))
                )
            model.add(Dropout(dropout_fraction))
            # Layer 2
            model.add(LSTM(units=number_units, return_sequences=True))
            model.add(Dropout(dropout_fraction))
            # Layer 3
            # model.add(LSTM(units=number_units))
            # model.add(Dropout(dropout_fraction))
            # Output layer
            model.add(Dense(1))
            model.compile(optimizer="adam", loss="mean_squared_error", metrics="mean_absolute_percentage_error")
            model.fit(X_train, y_train, epochs=epo, shuffle=False, batch_size=1, verbose=False)
            eval=model.evaluate(X_test, y_test, verbose=False)
            epo_l.append(eval[0])
            epo_a.append(eval[1])
        nodes_l.append(epo_l)
        nodes_a.append(epo_a)
        print('node done')
    print('window done')
    windows_l.append(nodes_l)
    windows_a.append(nodes_a)
print('done')

window 1
node 1
node 1
node 1
node 1
node 1
node 1
node done
node 4
node 4
node 4
node 4
node 4
node 4
node done
node 7
node 7
node 7
node 7
node 7
node 7
node done
node 10
node 10
node 10
node 10
node 10
node 10
node done
node 13
node 13
node 13
node 13
node 13
node 13
node done
node 16
node 16
node 16
node 16
node 16
node 16
node done
node 19
node 19
node 19
node 19
node 19
node 19
node done
node 22
node 22
node 22
node 22
node 22
node 22
node done
node 25
node 25
node 25
node 25
node 25
node 25
node done
node 28
node 28
node 28
node 28
node 28
node 28
node done
node 31
node 31
node 31
node 31
node 31
node 31
node done
window done
window 2
node 1
node 1
node 1
node 1
node 1
node 1
node done
node 4
node 4
node 4
node 4
node 4
node 4
node done
node 7
node 7
node 7
node 7
node 7
node 7
node done
node 10
node 10
node 10
node 10
node 10
node 10
node done
node 13
node 13
node 13
node 13
node 13
node 13
node done
node 16
node 16
node 16
node 16
node 16
node 16
node done
node 19
node 19
node

In [57]:
ep12=[]

ep12l=[]

for win in range(10):
    ep12nl=[]
    ep12n=[]
    for nod in range(11):
        ep12nl.append(windows_l[win][nod][4])
        ep12n.append(windows_a[win][nod][4])

    ep12.append(ep12n)
    ep12l.append(ep12nl)

dfep12l = pd.DataFrame(ep12l, columns = ['nod1','nod3','nod7','nod10','nod13','nod16','nod19','nod22','nod25','nod28','nod31']) 
dfep12l.index = ['win1','win2','win3','win4','win5','win6','win7','win8','win9','win10']

dfep12 = pd.DataFrame(ep12, columns = ['nod1','nod3','nod7','nod10','nod13','nod16','nod19','nod22','nod25','nod28','nod31']) 
dfep12.index =['win1','win2','win3','win4','win5','win6','win7','win8','win9','win10']

dfep12l.head()

Unnamed: 0,nod1,nod3,nod7,nod10,nod13,nod16,nod19,nod22,nod25,nod28,nod31
win1,0.031587,0.009877,0.00822,0.009578,0.007997,0.010819,0.017553,0.015656,0.018377,0.020025,0.015462
win2,0.035456,0.013601,0.016287,0.016713,0.016854,0.02042,0.022405,0.020637,0.022344,0.024934,0.024456
win3,0.038258,0.017235,0.016231,0.019685,0.02372,0.021679,0.025248,0.026409,0.02371,0.025964,0.027181
win4,0.03116,0.019602,0.020786,0.022843,0.023672,0.024315,0.025409,0.027528,0.028777,0.027296,0.026633
win5,0.042963,0.024416,0.023524,0.026624,0.024092,0.028687,0.029584,0.030673,0.030187,0.032977,0.033474


In [58]:
import hvplot.pandas
dfep12l.hvplot()

In [59]:
dfep12.hvplot()

In [60]:
win4=[]

win4l=[]

for nod in range(11):
    win4nl=[]
    win4n=[]
    for ep in range(6):
        win4nl.append(windows_l[2][nod][ep])
        win4n.append(windows_a[2][nod][ep])

    win4.append(win4n)

    win4l.append(win4nl)

dfwin4l = pd.DataFrame(win4l, columns = ['10 epoch', '12 epoch','14 epoch','16 epoch','18 epoch','20 epoch']) 
dfwin4l.index = ['nod1','nod3','nod7','nod10','nod13','nod16','nod19','nod22','nod25','nod28','nod31']

dfwin4 = pd.DataFrame(win4, columns = ['10 epoch', '12 epoch','14 epoch','16 epoch','18 epoch','20 epoch']) 
dfwin4.index =['nod1','nod3','nod7','nod10','nod13','nod16','nod19','nod22','nod25','nod28','nod31']

dfwin4l.head()

Unnamed: 0,10 epoch,12 epoch,14 epoch,16 epoch,18 epoch,20 epoch
nod1,0.039201,0.043403,0.037366,0.042008,0.038258,0.024419
nod3,0.025939,0.023224,0.020354,0.021596,0.017235,0.023236
nod7,0.026413,0.020262,0.020097,0.020024,0.016231,0.018392
nod10,0.025276,0.019286,0.018847,0.021947,0.019685,0.019359
nod13,0.024319,0.026055,0.020863,0.023653,0.02372,0.019494


In [61]:
dfwin4.hvplot()

In [62]:
dfwin4l.hvplot()

In [63]:
nod3=[]

nod3l=[]

for win in range(10):
    nod3nl=[]
    nod3n=[]
    for ep in range(6):
        nod3nl.append(windows_l[win][2][ep])
        nod3n.append(windows_a[win][2][ep])

    nod3.append(nod3n)
    nod3l.append(nod3nl)

dfnod3l = pd.DataFrame(nod3l, columns = ['10 epoch', '12 epoch','14 epoch','16 epoch','18 epoch','20 epoch']) 
dfnod3l.index = ['win1','win2','win3','win4','win5','win6','win7','win8','win9','win10']

dfnod3 = pd.DataFrame(nod3, columns = ['10 epoch', '12 epoch','14 epoch','16 epoch','18 epoch','20 epoch']) 
dfnod3.index =['win1','win2','win3','win4','win5','win6','win7','win8','win9','win10']

dfnod3l.head()

Unnamed: 0,10 epoch,12 epoch,14 epoch,16 epoch,18 epoch,20 epoch
win1,0.014556,0.008981,0.009609,0.007897,0.00822,0.009678
win2,0.020726,0.014824,0.015733,0.011628,0.016287,0.014725
win3,0.026413,0.020262,0.020097,0.020024,0.016231,0.018392
win4,0.02621,0.026723,0.0225,0.021959,0.020786,0.023384
win5,0.03689,0.030993,0.025204,0.024718,0.023524,0.022902


In [64]:
dfnod3l.hvplot()

In [65]:
dfnod3.hvplot()