# Data Continuation

## Modeling Preparation

We'll start off this process by doing much of the same
importing as from last time.

In [96]:
import gc
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

import cv2

#statsmodels
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import acf, pacf, adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

from sklearn.model_selection import train_test_split

from nltk.corpus import stopwords

from _code.card_selection import card_sampler, \
            plot_card_trends, card_imager, \
                synthesize_names

from _code.cleaner import preprocess

from _code.viz import showImagesHorizontally, word_plot

from _code import card_selection as FUNC_TESTING

from IPython.display import Image

## Additional Data Engineering

We have our information contexts that we can use a lot
more easily. We'll go ahead and do a couple more steps
to put our abilities into a format that we can use more
directly.

We'll start by bringing in the card data we made in the
last step.

In [46]:
cards = pd.read_parquet('./data/simplified_cards.parquet')

Because abilities are all stuck in the same place in
our oracle text but are all different abilities for the
context of each card. We need to split these out here.

In [79]:
processed_abilities = preprocess(cards['oracle_text'])
fully_processed_abilities = [abilities.split('\n') for abilities in processed_abilities]

In [80]:
cards['abilities_list'] = fully_processed_abilities

In [83]:
cards.head()[['name','oracle_text','abilities_list','median_foil','median_normal']]

Unnamed: 0,name,oracle_text,abilities_list,median_foil,median_normal
0,Fury Sliver,All Sliver creatures have double strike.,[sliver creature double strike],3.95,0.38
1,Kor Outfitter,"When CARDNAME enters the battlefield, you may ...",[cardname enters battlefield may attach target...,7.78,0.24
2,Siren Lookout,"Flying\nWhen CARDNAME enters the battlefield, ...","[fly , cardname enters battlefield explores ]",0.23,0.06
3,Web,Enchant creature (Target a creature as you cas...,"[enchant creature , enchant creature get +0/...",,0.64
4,Venerable Knight,"When CARDNAME dies, put a +1/+1 counter on tar...",[cardname die put +1/+1 counter target knight ...,0.28,0.095


Something that might be important for our model is the
actual number of abilities that a card has. We'll go
ahead and account for that in our next step.

In [84]:
cards['n_abilities'] = cards['abilities_list'].map(len)

In [95]:
cards.head()

Unnamed: 0,id,oracle_id,tcgplayer_id,name,released_at,image_uris,mana_cost,cmc,type_line,oracle_text,...,promo_types,loyalty,produced_mana,variation_of,prices_normal,prices_foil,median_normal,median_foil,abilities_list,n_abilities
0,0000579f-7b35-4ed3-b44c-db2a538066fe,44623693-51d6-49ad-8cd7-140505caf02f,14240.0,Fury Sliver,2006-10-06,{'art_crop': 'https://cards.scryfall.io/art_cr...,{5}{R},6.0,Creature — Sliver,All Sliver creatures have double strike.,...,,,,,"{'2023-01-27': 0.37, '2023-01-28': 0.37, '2023...","{'2023-01-27': 3.95, '2023-01-28': 3.95, '2023...",0.38,3.95,[sliver creature double strike],1
1,00006596-1166-4a79-8443-ca9f82e6db4e,8ae3562f-28b7-4462-96ed-be0cf7052ccc,33347.0,Kor Outfitter,2009-10-02,{'art_crop': 'https://cards.scryfall.io/art_cr...,{W}{W},2.0,Creature — Kor Soldier,"When CARDNAME enters the battlefield, you may ...",...,,,,,"{'2023-01-27': 0.11, '2023-01-28': 0.11, '2023...","{'2023-01-27': 7.5, '2023-01-28': 7.5, '2023-0...",0.24,7.78,[cardname enters battlefield may attach target...,1
2,0000cd57-91fe-411f-b798-646e965eec37,9f0d82ae-38bf-45d8-8cda-982b6ead1d72,145764.0,Siren Lookout,2017-09-29,{'art_crop': 'https://cards.scryfall.io/art_cr...,{2}{U},3.0,Creature — Siren Pirate,"Flying\nWhen CARDNAME enters the battlefield, ...",...,,,,,"{'2023-01-27': 0.04, '2023-01-28': 0.04, '2023...","{'2023-01-27': 0.26, '2023-01-28': 0.26, '2023...",0.06,0.23,"[fly , cardname enters battlefield explores ]",2
3,00012bd8-ed68-4978-a22d-f450c8a6e048,5aa12aff-db3c-4be5-822b-3afdf536b33e,1623.0,Web,1994-04-01,{'art_crop': 'https://cards.scryfall.io/art_cr...,{G},1.0,Enchantment — Aura,Enchant creature (Target a creature as you cas...,...,,,,,"{'2023-01-27': 0.65, '2023-01-28': 0.65, '2023...",,0.64,,"[enchant creature , enchant creature get +0/...",2
4,0001f1ef-b957-4a55-b47f-14839cdbab6f,ef027846-be81-4959-a6b5-56bd01b1e68a,198861.0,Venerable Knight,2019-10-04,{'art_crop': 'https://cards.scryfall.io/art_cr...,{W},1.0,Creature — Human Knight,"When CARDNAME dies, put a +1/+1 counter on tar...",...,,,,,"{'2023-01-27': 0.09, '2023-01-28': 0.09, '2023...","{'2023-01-27': 0.29, '2023-01-28': 0.29, '2023...",0.095,0.28,[cardname die put +1/+1 counter target knight ...,1


We'll go ahead and split card data here. First, we need
to make sure that we're able to keep set proportions
across both sets of data, since the set column is going
to account for a lot of different features to some
degree. For that, we can't have any unique or
single-card releases, like promos.

In [115]:
sets = cards['set'].value_counts()
promo_sets_list = sets[sets <= 1].index
unique_set_cards = cards[
        cards['set'].isin(promo_sets_list)
    ][['name','set']]
display(unique_set_cards, len(unique_set_cards))

Unnamed: 0,name,set
1887,Rabbit Battery,pl23
3409,Helm of Kaldra,p5dn
5663,Underworld Dreams,p2hg
6009,Flooded Strand,pnat
6739,Wasteland,mpr
7470,Rukh Egg,p8ed
10066,Silent Specter,pons
10729,Powder Keg,p04
13959,False Prophet,puds
14279,Jace Beleren,pbook


34

We have 34 cards that are individual printings in
unique sets. These are likely promotional cards from
before the promotional cards got all lumped together
into a promotional umbrella, so we'll drop those cards
from the data set.

In [117]:
reduced_cards = cards[~cards['set'].isin(promo_sets_list)]
display(reduced_cards.head(),reduced_cards.shape)

Unnamed: 0,id,oracle_id,tcgplayer_id,name,released_at,image_uris,mana_cost,cmc,type_line,oracle_text,...,promo_types,loyalty,produced_mana,variation_of,prices_normal,prices_foil,median_normal,median_foil,abilities_list,n_abilities
0,0000579f-7b35-4ed3-b44c-db2a538066fe,44623693-51d6-49ad-8cd7-140505caf02f,14240.0,Fury Sliver,2006-10-06,{'art_crop': 'https://cards.scryfall.io/art_cr...,{5}{R},6.0,Creature — Sliver,All Sliver creatures have double strike.,...,,,,,"{'2023-01-27': 0.37, '2023-01-28': 0.37, '2023...","{'2023-01-27': 3.95, '2023-01-28': 3.95, '2023...",0.38,3.95,[sliver creature double strike],1
1,00006596-1166-4a79-8443-ca9f82e6db4e,8ae3562f-28b7-4462-96ed-be0cf7052ccc,33347.0,Kor Outfitter,2009-10-02,{'art_crop': 'https://cards.scryfall.io/art_cr...,{W}{W},2.0,Creature — Kor Soldier,"When CARDNAME enters the battlefield, you may ...",...,,,,,"{'2023-01-27': 0.11, '2023-01-28': 0.11, '2023...","{'2023-01-27': 7.5, '2023-01-28': 7.5, '2023-0...",0.24,7.78,[cardname enters battlefield may attach target...,1
2,0000cd57-91fe-411f-b798-646e965eec37,9f0d82ae-38bf-45d8-8cda-982b6ead1d72,145764.0,Siren Lookout,2017-09-29,{'art_crop': 'https://cards.scryfall.io/art_cr...,{2}{U},3.0,Creature — Siren Pirate,"Flying\nWhen CARDNAME enters the battlefield, ...",...,,,,,"{'2023-01-27': 0.04, '2023-01-28': 0.04, '2023...","{'2023-01-27': 0.26, '2023-01-28': 0.26, '2023...",0.06,0.23,"[fly , cardname enters battlefield explores ]",2
3,00012bd8-ed68-4978-a22d-f450c8a6e048,5aa12aff-db3c-4be5-822b-3afdf536b33e,1623.0,Web,1994-04-01,{'art_crop': 'https://cards.scryfall.io/art_cr...,{G},1.0,Enchantment — Aura,Enchant creature (Target a creature as you cas...,...,,,,,"{'2023-01-27': 0.65, '2023-01-28': 0.65, '2023...",,0.64,,"[enchant creature , enchant creature get +0/...",2
4,0001f1ef-b957-4a55-b47f-14839cdbab6f,ef027846-be81-4959-a6b5-56bd01b1e68a,198861.0,Venerable Knight,2019-10-04,{'art_crop': 'https://cards.scryfall.io/art_cr...,{W},1.0,Creature — Human Knight,"When CARDNAME dies, put a +1/+1 counter on tar...",...,,,,,"{'2023-01-27': 0.09, '2023-01-28': 0.09, '2023...","{'2023-01-27': 0.29, '2023-01-28': 0.29, '2023...",0.095,0.28,[cardname die put +1/+1 counter target knight ...,1


(63403, 40)

Clocking in at 63,403 cards, we're ready to do the
actual data split.

In [118]:
X = reduced_cards.drop(columns=['prices_normal','prices_foil','median_normal','median_foil'])
y = reduced_cards[['median_normal','median_foil']]

X_train, X_test, y_train, y_test = \
    train_test_split(
        X,y,stratify=X['set'],
        random_state=13
    )

...and just to verify that they're roughly the same
distribution:

In [122]:
X_train['set'].value_counts(normalize=True)[:5],\
    X_test['set'].value_counts(normalize=True)[:5]

(set
 mb1      0.026666
 plist    0.016676
 clb      0.014195
 sld      0.013522
 j22      0.013165
 Name: proportion, dtype: float64,
 set
 mb1      0.026686
 plist    0.016655
 clb      0.014195
 sld      0.013564
 j22      0.013185
 Name: proportion, dtype: float64)

## =======================================================

## EXPLORATORY OPTIONS  

## =======================================================