# Pokemon Price Predictor Using Closing Prices Dataset

In this notebook, we will build and train a custom LSTM RNN that uses a 4 year window of Pokemon Boxset closing prices to predict the next month's closing price. 

1. Prepare the data for training and testing
2. Build and train a custom LSTM RNN
3. Evaluate the performance of the model

## Data Preparation

In this section, you will need to prepare the training and testing data for the model. The model will use a rolling 2 unit window to predict the n+1th day closing price.

1. Use the `window_data` function to generate the X and y values for the model.
2. Split the data into 70% training and 30% testing
3. Apply the MinMaxScaler to the X and y values
4. Reshape the X_train and X_test data for the model. 

In [26]:
import numpy as np
import pandas as pd
import hvplot.pandas

In [27]:
# Set the random seed for reproducibility
from numpy.random import seed
seed(1)
from tensorflow import random
random.set_seed(2)

In [28]:
# Load the historical closing prices for Pokemon Dataset
df = pd.read_csv('pokemon_cardset.csv', index_col="Month", infer_datetime_format=True, parse_dates=True)
df = df.sort_index()
df.head()

Unnamed: 0_level_0,Base Set 1st Ed.,Base Set 1st Ed. Holos,Base Set Shadowless,Base Set Shadowless Holos,Base Set Unlimited,Base Set Unlimited Holos,Jungle 1st Edition,Jungle 1st Ed. Holos,Jungle Unlimited,Jungle Unlimited Holos,...,EX Legend Maker EX Holos Only,EX Holon Phantoms Holos Exc. Reverses,EX Holon Phantoms EX Holos Only,EX Crystal Guardians Holos Exc. Reverses,EX Crystal Guardians EX Holos Only,EX Dragon Frontiers Holos Exc. Reverses,EX Dragon Frontiers EX Holos Only,EX Power Keepers Holos Exc. Reverses,EX Power Keepers EX Holos Only,EX Set Gold Star Holos
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-10-01,3788.38,2858.07,369.87,369.87,459.94,454.94,510.21,262.05,51.0,41.0,...,,30.99,,,,,,,,298.45
2016-11-01,6636.07,5558.35,951.71,933.2,544.3,529.29,881.14,479.17,76.45,66.45,...,,40.98,,,,18.5,,11.5,11.5,1658.19
2016-12-01,7437.45,6295.23,1466.5,1397.01,486.49,415.98,1009.54,534.77,98.95,81.95,...,,50.47,,38.93,,37.5,19.0,13.01,13.01,1786.44
2017-01-01,6565.65,5351.51,1487.78,1416.03,540.26,468.75,1062.67,600.18,100.59,83.59,...,,50.47,,97.43,34.5,,,,,1967.91
2017-02-01,6788.13,5605.4,1548.22,1471.97,578.97,510.95,990.79,514.48,140.13,94.09,...,,50.47,,,,,,,,2369.52


#### Data Cleaning

Before continue, corroborate if there are any `null` or missing values in the DataFrame, if so, fill the missing values with the previous price in the series.

In [29]:
# # Looking for missing values
df.isnull().sum().sum()

241

In [30]:
# Filling missing values with the previous ones
df = df.fillna(method = 'pad')
df.head()

Unnamed: 0_level_0,Base Set 1st Ed.,Base Set 1st Ed. Holos,Base Set Shadowless,Base Set Shadowless Holos,Base Set Unlimited,Base Set Unlimited Holos,Jungle 1st Edition,Jungle 1st Ed. Holos,Jungle Unlimited,Jungle Unlimited Holos,...,EX Legend Maker EX Holos Only,EX Holon Phantoms Holos Exc. Reverses,EX Holon Phantoms EX Holos Only,EX Crystal Guardians Holos Exc. Reverses,EX Crystal Guardians EX Holos Only,EX Dragon Frontiers Holos Exc. Reverses,EX Dragon Frontiers EX Holos Only,EX Power Keepers Holos Exc. Reverses,EX Power Keepers EX Holos Only,EX Set Gold Star Holos
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-10-01,3788.38,2858.07,369.87,369.87,459.94,454.94,510.21,262.05,51.0,41.0,...,,30.99,,,,,,,,298.45
2016-11-01,6636.07,5558.35,951.71,933.2,544.3,529.29,881.14,479.17,76.45,66.45,...,,40.98,,,,18.5,,11.5,11.5,1658.19
2016-12-01,7437.45,6295.23,1466.5,1397.01,486.49,415.98,1009.54,534.77,98.95,81.95,...,,50.47,,38.93,,37.5,19.0,13.01,13.01,1786.44
2017-01-01,6565.65,5351.51,1487.78,1416.03,540.26,468.75,1062.67,600.18,100.59,83.59,...,,50.47,,97.43,34.5,37.5,19.0,13.01,13.01,1967.91
2017-02-01,6788.13,5605.4,1548.22,1471.97,578.97,510.95,990.79,514.48,140.13,94.09,...,,50.47,,97.43,34.5,37.5,19.0,13.01,13.01,2369.52


#### Create the Features `X` and Target `y` Data

Use the `window_data()` function bellow, to create the features set `X` and the target vector `y`. Use the column of the closing prices for each pokemon card set for as feature and target column; this will allow your model to predict pokemon closing prices

In [31]:
# This function accepts the column number for the features (X) and the target (y)
# It chunks the data up with a rolling window of Xt-n to predict Xt
# It returns a numpy array of X any y
def window_data(df, window, feature_col_number, target_col_number):
    X = []
    y = []
    for i in range(len(df) - window - 1):
        features = df.iloc[i:(i + window), feature_col_number]
        target = df.iloc[(i + window), target_col_number]
        X.append(features)
        y.append(target)
    return np.array(X), np.array(y).reshape(-1, 1)

In [32]:
# Predict Closing Prices using a 2 month window of previous closing prices
window_size = 2

# Column index 0 is the 'Base Set 1st Ed.' column
# Column index 1 is the `Base Set 1st Ed. Holos` column
feature_column = 19
target_column = 19
X, y = window_data(df, window_size, feature_column, target_column)

In [33]:
X

array([[ 205.53,  244.17],
       [ 244.17,  317.68],
       [ 317.68,  313.99],
       [ 313.99,  362.  ],
       [ 362.  ,  398.63],
       [ 398.63,  466.7 ],
       [ 466.7 ,  549.  ],
       [ 549.  ,  606.8 ],
       [ 606.8 ,  579.48],
       [ 579.48,  580.9 ],
       [ 580.9 ,  536.5 ],
       [ 536.5 ,  575.37],
       [ 575.37,  697.85],
       [ 697.85,  669.38],
       [ 669.38,  695.41],
       [ 695.41,  645.15],
       [ 645.15,  731.23],
       [ 731.23,  768.16],
       [ 768.16,  624.42],
       [ 624.42,  649.72],
       [ 649.72,  729.58],
       [ 729.58,  738.22],
       [ 738.22,  721.63],
       [ 721.63,  745.15],
       [ 745.15,  771.71],
       [ 771.71,  742.1 ],
       [ 742.1 ,  690.14],
       [ 690.14,  587.54],
       [ 587.54,  656.1 ],
       [ 656.1 ,  628.83],
       [ 628.83,  643.68],
       [ 643.68,  712.91],
       [ 712.91,  708.22],
       [ 708.22,  810.47],
       [ 810.47,  872.52],
       [ 872.52,  868.3 ],
       [ 868.3 ,  897.99],
 

In [34]:
y

array([[ 317.68],
       [ 313.99],
       [ 362.  ],
       [ 398.63],
       [ 466.7 ],
       [ 549.  ],
       [ 606.8 ],
       [ 579.48],
       [ 580.9 ],
       [ 536.5 ],
       [ 575.37],
       [ 697.85],
       [ 669.38],
       [ 695.41],
       [ 645.15],
       [ 731.23],
       [ 768.16],
       [ 624.42],
       [ 649.72],
       [ 729.58],
       [ 738.22],
       [ 721.63],
       [ 745.15],
       [ 771.71],
       [ 742.1 ],
       [ 690.14],
       [ 587.54],
       [ 656.1 ],
       [ 628.83],
       [ 643.68],
       [ 712.91],
       [ 708.22],
       [ 810.47],
       [ 872.52],
       [ 868.3 ],
       [ 897.99],
       [ 946.55],
       [1205.18],
       [1323.33],
       [1301.2 ],
       [2562.22],
       [3399.21],
       [3808.82],
       [3848.84],
       [3735.05],
       [3707.94],
       [5013.06],
       [5576.47],
       [4653.9 ]])

In [35]:
# Use 70% of the data for training and the remainder for testing
split = int(0.7 * len(X))
X_train = X[: split - 1]
X_test = X[split:]
y_train = y[: split -1]
y_test = y[split:]

In [36]:
from sklearn.preprocessing import MinMaxScaler
# Use the MinMaxScaler to scale data between 0 and 0.
scaler = MinMaxScaler()
scaler.fit(X)
X_full = scaler.transform(X)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

scaler.fit(y)
y_full = scaler.transform(y)
y_train = scaler.transform(y_train)
y_test = scaler.transform(y_test)

In [37]:
# Reshape the features for the model
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
X_full = X_full.reshape((X_full.shape[0], X_full.shape[1], 1))

---

## Build and Train the LSTM RNN

In this section, you will design a custom LSTM RNN and fit (train) it using the training data.

You will need to:
1. Define the model architecture
2. Compile the model
3. Fit the model to the training data

In [38]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

In [39]:
# Build the LSTM model. 
# The return sequences need to be set to True if you are adding additional LSTM layers, but 
# You don't have to do this for the final layer. 
# Note: The dropouts help prevent overfitting
# Note: The input shape is the number of time steps and the number of indicators
# Note: Batching inputs has a different input shape of Samples/TimeSteps/Features

model = Sequential()

number_units = 50
dropout_fraction = 0.20

# Layer 1
model.add(LSTM(
    units=number_units,
    return_sequences=True,
    input_shape=(X_train.shape[1], 1))
    )
model.add(Dropout(dropout_fraction))

# Layer 2
model.add(LSTM(units=number_units, return_sequences=True))
model.add(Dropout(dropout_fraction))

# Layer 3
model.add(LSTM(units=number_units))
model.add(Dropout(dropout_fraction))

# Output layer
model.add(Dense(1))

In [40]:
# Compile the model
model.compile(optimizer="adam", loss="mean_squared_error")

In [41]:
# Summarize the model
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_3 (LSTM)                (None, 2, 50)             10400     
_________________________________________________________________
dropout_3 (Dropout)          (None, 2, 50)             0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 2, 50)             20200     
_________________________________________________________________
dropout_4 (Dropout)          (None, 2, 50)             0         
_________________________________________________________________
lstm_5 (LSTM)                (None, 50)                20200     
_________________________________________________________________
dropout_5 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                

In [42]:
# Train the model
# Use at least 10 epochs
# Do not shuffle the data
# Experiment with the batch size, but a smaller batch size is recommended
epochs = 100
batch_size = 5
model.fit(X_train, y_train, epochs=epochs, shuffle=False, batch_size=batch_size, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x233b9618a00>

---

## Model Performance

In this section, you will evaluate the model using the test data. 

You will need to:
1. Evaluate the model using the `X_test` and `y_test` data.
2. Use the X_test data to make predictions
3. Create a DataFrame of Real (y_test) vs predicted values. 
4. Plot the Real vs predicted values as a line chart

### Hints
Remember to apply the `inverse_transform` function to the predicted and y_test values to recover the actual closing prices.

In [43]:
# Evaluate the model
model.evaluate(X_test, y_test)



0.06633897870779037

In [44]:
# Make some predictions
predicted = model.predict(X_full)

In [45]:
# Recover the original prices instead of the scaled version
predicted_prices = scaler.inverse_transform(predicted)
real_prices = scaler.inverse_transform(y_full.reshape(-1, 1))

In [46]:
# Create a DataFrame of Real and Predicted values
pokemon_prices = pd.DataFrame({
    "Real": real_prices.ravel(),
    "Predicted": predicted_prices.ravel()
}, index = df.index[-len(real_prices): ]) 
pokemon_prices.head()

Unnamed: 0_level_0,Real,Predicted
Month,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01,317.68,345.064728
2017-02-01,313.99,381.60141
2017-03-01,362.0,422.895111
2017-04-01,398.63,430.140167
2017-05-01,466.7,464.609558


In [47]:
len(pokemon_prices)

49

In [48]:
pokemon_prices

Unnamed: 0_level_0,Real,Predicted
Month,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01,317.68,345.064728
2017-02-01,313.99,381.60141
2017-03-01,362.0,422.895111
2017-04-01,398.63,430.140167
2017-05-01,466.7,464.609558
2017-06-01,549.0,498.609772
2017-07-01,606.8,553.029236
2017-08-01,579.48,610.51593
2017-09-01,580.9,637.774231
2017-10-01,536.5,622.701904


In [49]:
cardset_model = "EX Set Gold Star Holos"
cardset_title = f"LSTM Cardset Model {cardset_model}: Real vs Predicted Values"
cardset_title

'LSTM Cardset Model EX Set Gold Star Holos: Real vs Predicted Values'

In [50]:
# Plot the real vs predicted values as a line chart
pokemon_prices.hvplot.line(xlabel="Month",
                           ylabel="Price",
                           title=cardset_title)