# Pokemon Price Predictor Using Closing Prices Dataset

In this notebook, we will build and train a custom LSTM RNN that uses a 4 year window of Pokemon Boxset closing prices to predict the next month's closing price. 

1. Prepare the data for training and testing
2. Build and train a custom LSTM RNN
3. Evaluate the performance of the model

## Data Preparation

In this section, you will need to prepare the training and testing data for the model. The model will use a rolling 10 day window to predict the 11th day closing price.

You will need to:
1. Use the `window_data` function to generate the X and y values for the model.
2. Split the data into 70% training and 30% testing
3. Apply the MinMaxScaler to the X and y values
4. Reshape the X_train and X_test data for the model. Note: The required input format for the LSTM is:

```python
reshape((X_train.shape[0], X_train.shape[1], 1))
```

In [9]:
import numpy as np
import pandas as pd
import hvplot.pandas

In [10]:
# Set the random seed for reproducibility
# Note: This is for the homework solution, but it is good practice to comment this out and run multiple experiments to evaluate your model
from numpy.random import seed
seed(1)
from tensorflow import random
random.set_seed(2)

In [11]:
# Load the historical closing prices for Pokemon Dataset
df = pd.read_csv('pokemon_cardset.csv', index_col="Date", infer_datetime_format=True, parse_dates=True)
df = df.sort_index()
df.head()

Unnamed: 0_level_0,Base Set 1st Ed.,Base Set 1st Ed. Holos,Base Set Shadowless,Base Set Shadowless Holos,Base Set Unlimited,Base Set Unlimited Holos,Jungle 1st Edition,Jungle 1st Ed. Holo,Jungle Unlimited,Jungle Unlimited Holos,...,EX Legend Maker EX Holos Only,EX Holon Phantoms Holos Exc. Reverses,EX Holon Phantoms EX Holos Only,EX Crystal Guardians Holos Exc. Reverses,EX Crystal Guardians EX Holos Only,EX Dragon Frontiers Holos Exc. Reverses,EX Dragon Frontiers EX Holos Only,EX Power Keepers Holos Exc. Reverses,EX Power Keepers EX Holos Only,EX Set Gold Star Holos
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-10-01,3788.38,2858.07,369.87,369.87,459.94,454.94,510.21,262.05,51.0,41.0,...,,,,,,,,,,298.45
2016-11-01,6636.07,5558.35,951.71,933.2,544.3,529.29,881.14,479.17,76.45,66.45,...,,,,,,,,,,1658.19
2016-12-01,7437.45,6295.23,1466.5,1397.01,486.49,415.98,1009.54,534.77,98.95,81.95,...,,,,,,,,,,1786.44
2017-01-01,6565.65,5351.51,1487.78,1416.03,540.26,468.75,1062.67,600.18,100.59,83.59,...,,,,,,,,,,1967.91
2017-02-01,6788.13,5605.4,1548.22,1471.97,578.97,510.95,990.79,514.48,140.13,94.09,...,,,,,,,,,,2369.52


In [12]:
# This function accepts the column number for the features (X) and the target (y)
# It chunks the data up with a rolling window of Xt-n to predict Xt
# It returns a numpy array of X any y
def window_data(df, window, feature_col_number, target_col_number):
    X = []
    y = []
    for i in range(len(df) - window - 1):
        features = df.iloc[i:(i + window), feature_col_number]
        target = df.iloc[(i + window), target_col_number]
        X.append(features)
        y.append(target)
    return np.array(X), np.array(y).reshape(-1, 1)

In [13]:
# Predict Closing Prices using a 30 month window of previous closing prices
window_size = 30

# Column index 0 is the 'Base Set 1st Ed.' column
# Column index 1 is the `Base Set 1st Ed. Holos` column
feature_column = 0
target_column = 0
X, y = window_data(df, window_size, feature_column, target_column)

In [14]:
# Use 70% of the data for training and the remaindder for testing
split = int(0.7 * len(X))
X_train = X[: split - 1]
X_test = X[split:]
y_train = y[: split -1]
y_test = y[split:]

In [15]:
from sklearn.preprocessing import MinMaxScaler
# Use the MinMaxScaler to scale data between 0 and 0.
scaler = MinMaxScaler()
scaler.fit(X)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

scaler.fit(y)
y_train = scaler.transform(y_train)
y_test = scaler.transform(y_test)

In [16]:
# Reshape the features for the model
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

---

## Build and Train the LSTM RNN

In this section, you will design a custom LSTM RNN and fit (train) it using the training data.

You will need to:
1. Define the model architecture
2. Compile the model
3. Fit the model to the training data

In [17]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

In [13]:
# Build the LSTM model. 
# The return sequences need to be set to True if you are adding additional LSTM layers, but 
# You don't have to do this for the final layer. 
# Note: The dropouts help prevent overfitting
# Note: The input shape is the number of time steps and the number of indicators
# Note: Batching inputs has a different input shape of Samples/TimeSteps/Features

model = Sequential()

number_units = 30
dropout_fraction = 0.20

# Layer 1
model.add(LSTM(
    units=number_units,
    return_sequences=True,
    input_shape=(X_train.shape[1], 1))
    )
model.add(Dropout(dropout_fraction))

# Layer 2
model.add(LSTM(units=number_units, return_sequences=True))
model.add(Dropout(dropout_fraction))

# Layer 3
model.add(LSTM(units=number_units))
model.add(Dropout(dropout_fraction))

# Output layer
model.add(Dense(1))

In [14]:
# Compile the model
model.compile(optimizer="adam", loss="mean_squared_error")

In [15]:
# Summarize the model
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 10, 30)            3840      
_________________________________________________________________
dropout (Dropout)            (None, 10, 30)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 10, 30)            7320      
_________________________________________________________________
dropout_1 (Dropout)          (None, 10, 30)            0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 30)                7320      
_________________________________________________________________
dropout_2 (Dropout)          (None, 30)                0         
_________________________________________________________________
dense (Dense)                (None, 1)                 3

In [16]:
# Train the model
# Use at least 10 epochs
# Do not shuffle the data
# Experiement with the batch size, but a smaller batch size is recommended
epochs = 10
batch_size = 5
model.fit(X_train, y_train, epochs=epochs, shuffle=False, batch_size=batch_size, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x250e1c42c70>

---

## Model Performance

In this section, you will evaluate the model using the test data. 

You will need to:
1. Evaluate the model using the `X_test` and `y_test` data.
2. Use the X_test data to make predictions
3. Create a DataFrame of Real (y_test) vs predicted values. 
4. Plot the Real vs predicted values as a line chart

### Hints
Remember to apply the `inverse_transform` function to the predicted and y_test values to recover the actual closing prices.

In [17]:
# Evaluate the model
model.evaluate(X_test, y_test)



0.08102104812860489

In [18]:
# Make some predictions
predicted = model.predict(X_test)

In [19]:
# Recover the original prices instead of the scaled version
predicted_prices = scaler.inverse_transform(predicted)
real_prices = scaler.inverse_transform(y_test.reshape(-1, 1))

In [20]:
# Create a DataFrame of Real and Predicted values
stocks = pd.DataFrame({
    "Real": real_prices.ravel(),
    "Predicted": predicted_prices.ravel()
}, index = df.index[-len(real_prices): ]) 
stocks.head()

Unnamed: 0,Real,Predicted
2019-02-20,3924.23999,5690.743652
2019-02-21,3974.050049,5737.081055
2019-02-22,3937.040039,5787.260254
2019-02-23,3983.530029,5857.757812
2019-02-24,4149.089844,5910.138184


In [21]:
# Plot the real vs predicted values as a line chart
stocks.hvplot.line(xlabel="Date",
                  ylabel="Price",
                  title="LSTM FNG Model: Real vs Predicted Values")