# MTG Price Predictor

## About/Goals: 

The idea of this project is to create an ML model that can take a card's data and return the value of the card should be, based on previous cards it has analysed. This requires NLP processing for the text box, and utilizes tensorflow to build the model.

Cards are evaluated purely based on what a person looking at it for the first time can see - year, text, color, etc.. nothing about format legalities, special flags, or anythingn else of the sort. Also, price is based on standard, nonfoil variant only.

# Data Importation and Cleaning

#### All imports cell

In [144]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
import tensorflow as tf
from tensorflow.keras.layers import TextVectorization
from sklearn.preprocessing import StandardScaler
import re
import os

In [145]:
def save(file):
    file.to_pickle('data/updated_data.pkl')

def load():
    df = pd.read_pickle('data/updated_data.pkl')
    return df # band-aid work-around

In [146]:
df = pd.read_json("data/default_cards_08_05_2025.json")

In [147]:
df.columns

Index(['object', 'id', 'oracle_id', 'multiverse_ids', 'mtgo_id', 'arena_id',
       'tcgplayer_id', 'cardmarket_id', 'name', 'lang', 'released_at', 'uri',
       'scryfall_uri', 'layout', 'highres_image', 'image_status', 'image_uris',
       'mana_cost', 'cmc', 'type_line', 'oracle_text', 'colors',
       'color_identity', 'keywords', 'produced_mana', 'legalities', 'games',
       'reserved', 'game_changer', 'foil', 'nonfoil', 'finishes', 'oversized',
       'promo', 'reprint', 'variation', 'set_id', 'set', 'set_name',
       'set_type', 'set_uri', 'set_search_uri', 'scryfall_set_uri',
       'rulings_uri', 'prints_search_uri', 'collector_number', 'digital',
       'rarity', 'card_back_id', 'artist', 'artist_ids', 'illustration_id',
       'border_color', 'frame', 'full_art', 'textless', 'booster',
       'story_spotlight', 'prices', 'related_uris', 'purchase_uris',
       'mtgo_foil_id', 'power', 'toughness', 'flavor_text', 'edhrec_rank',
       'penny_rank', 'all_parts', 'promo_types

can drop so many of these columns, only need ones that are useful.

### Column Cleaning

In [148]:
len(df)

108955

In [149]:
df.value_counts("variation")
df.value_counts("oversized")

oversized
False    108229
True        726
Name: count, dtype: int64

In [150]:
df = df[(df["variation"]==False) & (df["reprint"]==False) & (df["oversized"]==False) & (df["promo"]==False) & (df["full_art"]==False) & (df["textless"] == False) & (df["content_warning"]!=True)]

In [151]:
len(df) # Dropped about 40k entries

41387

In [152]:
doubles = df["card_faces"].dropna()
doubles.iloc[0] ## this is gonna be tough to work with

[{'object': 'card_face',
  'name': "Obyra's Attendants",
  'mana_cost': '{4}{U}',
  'type_line': 'Creature — Faerie Wizard',
  'oracle_text': 'Flying',
  'power': '3',
  'toughness': '4',
  'flavor_text': "Obyra's devoted servants shrieked as their sleeping mistress slashed at them, unseeing.",
  'artist': 'Andreas Zafiratos',
  'artist_id': 'e2f13a9a-57c5-40de-81d4-3b0723899cdf',
  'illustration_id': 'd1ea5321-62e2-4894-a79f-03b792daf2c8'},
 {'object': 'card_face',
  'name': 'Desperate Parry',
  'mana_cost': '{1}{U}',
  'type_line': 'Instant — Adventure',
  'oracle_text': 'Target creature gets -4/-0 until end of turn. (Then exile this card. You may cast the creature later from exile.)',
  'artist': 'Andreas Zafiratos',
  'artist_id': 'e2f13a9a-57c5-40de-81d4-3b0723899cdf'}]

For first iteration of model, will be removing the multi-faced cards

In [153]:
df = df[df["card_faces"].isna()]

In [154]:
def is_legal(x):
    if 'legal' in x.values():
        return True
    else:
        return False

In [155]:
df["playable"] = df["legalities"].apply(is_legal)

In [156]:
df = df[df["playable"] == True] # remove unnplayable cards

In [157]:
unneeded = ['object', 'id', 'oracle_id', 'multiverse_ids', 'mtgo_id', 'arena_id',
       'tcgplayer_id', 'cardmarket_id', #'name',
         'lang', 'uri',
       'scryfall_uri', 'layout', 'highres_image', 'image_status', 'image_uris', 'legalities', 'games',
       'reserved', 'game_changer', 'finishes', 'oversized',
       'promo', 'reprint', 'variation', 'set_id', 'set', 'set_name',
       'set_type', 'set_uri', 'set_search_uri', 'scryfall_set_uri',
       'rulings_uri', 'prints_search_uri', 'collector_number', 'digital', 'card_back_id', 'artist', 'artist_ids', 'illustration_id',
       'border_color', 'frame', 'full_art', 'textless', 'booster',
       'story_spotlight', 'related_uris', 'purchase_uris',
       'mtgo_foil_id', 'flavor_text', 'edhrec_rank',
       'penny_rank', 'all_parts', 'promo_types', 'security_stamp', 'preview', 'watermark', 'frame_effects', 'loyalty',
       'printed_name', 'tcgplayer_etched_id', 'flavor_name',
       'attraction_lights', 'color_indicator', 'printed_type_line',
       'printed_text', 'variation_of', 'life_modifier', 'hand_modifier',
       'content_warning', 'defense', 'card_faces', 'foil', 'nonfoil'
       , 'playable', 'color_identity']
df = df.drop(unneeded, axis=1)

now that I've cleaned out cards and columns that aren't needed, I need to figure out the best way to transform this data into something that the ML model can actually use. 

For instance, the "type_line" column will have to be split up into various super and subtypes, probably using categorical encoding.

### Type Labeling + Encoding

In [158]:
df["type_line"].describe() # has 4246 unique types currently

count       35098
unique       3120
top       Instant
freq         3592
Name: type_line, dtype: object

In [159]:
"""
df["creature_type"] = df["type_line"].apply(lambda x: x[10:] if "Creature" in x else "NaN")
df["planeswalker_type"] = df["type_line"].apply(lambda x: x[24:] if "Planeswalker" in x else "NaN")
df["kindred_type"] = df["type_line"].apply(lambda x: x.split()[-1] if "Kindred" in x or "Tribal" in x else "NaN")"""
# Found a better way!

'\ndf["creature_type"] = df["type_line"].apply(lambda x: x[10:] if "Creature" in x else "NaN")\ndf["planeswalker_type"] = df["type_line"].apply(lambda x: x[24:] if "Planeswalker" in x else "NaN")\ndf["kindred_type"] = df["type_line"].apply(lambda x: x.split()[-1] if "Kindred" in x or "Tribal" in x else "NaN")'

In [160]:
def filter_subtype(x):
    if "—" in x:
        ind = x.index("—")
        types = x[ind+1:].split()
        return types
    else:
        return []

In [161]:
def filter_maintype(x):
    types = ["Artifact", "Land", "Battle", "Creature", "Enchantment", "Planeswalker", "Instant", "Sorcery"]
    cur = []
    for type in types:
        if type in x:
            cur.append(type)
    return cur

In [162]:
df = df[~df['type_line'].str.contains('Basic')] # remove basic lands

In [163]:
df["legendary"] = df["type_line"].apply(lambda x: 1 if "Legendary" in x else 0)
df["subtype"] = df["type_line"].apply(filter_subtype)
df["main_type"]=df["type_line"].apply(filter_maintype)

In [164]:
df["price"] = df["prices"].str["usd"].astype(float) 
df = df.drop("prices", axis=1)

In [165]:
df = df.dropna(subset=["price"])
df = df[df["main_type"].map(len)>0]  # had to filter out bad cards with other types such as conspiracies and stickers

In [166]:
le = LabelEncoder()
le.fit(["Artifact", "Land", "Battle", "Creature", "Enchantment", "Planeswalker", "Instant", "Sorcery"])

In [167]:
df["main_type"] = df["main_type"].apply(lambda x: le.transform(x)) 

In [168]:
le2 = LabelEncoder()

In [169]:
df["subtype"].values

array([list(['Sliver']), list(['Kor', 'Soldier']),
       list(['Siren', 'Pirate']), ..., list([]),
       list(['Faerie', 'Rogue']), list(['Vampire', 'Soldier'])],
      dtype=object)

In [170]:
subtypes = []
for unique in df["subtype"].values:
    if unique != []:
        for val in unique:
            if val not in subtypes:
                subtypes.append(val)

le2.fit(subtypes)


In [171]:
df["subtype"] = df["subtype"].apply(lambda x: le2.transform(x)) # slow, probably better way to do this 

In [172]:
df = df.drop("type_line", axis=1)

### Date -> Year

In [173]:
df["year"] = df["released_at"].apply(lambda x: x.year)
df = df.drop("released_at", axis=1)

### Mana Cost Breakdown

- Number of Pips
- Is X spell?

In [174]:
df["is_x"] = df["mana_cost"].apply(lambda x: 1 if r"{X}" in x else 0)
#df.loc[df["is_x"] == 1]

In [175]:
#print("{1}{W/R}{G}{G}".replace("{", "").replace("}", " ").split())  ->  df

def pip_counter(x):
    new = x.replace("{", "").replace("}", " ").split()
    count = 0
    for x in new:
        if x.isdigit() == False and x != "X":
            count += 1

    return count

In [176]:
df["pip_count"] = df["mana_cost"].apply(pip_counter)
df = df.drop("mana_cost", axis=1)
#df.sort_values(by="pip_count", ascending=False)

### Oracle Text Breakdown -> NLP ? Or can try a parsing method to turn text into columns 

- activated ability?
- etb effect?

In [177]:
df["oracle_text"] = df["oracle_text"].apply(lambda x: x.lower().replace("\n", ". "))

#### Main Phrases

using vectorizer to figure out most common substrings

In [178]:
vectorizer = CountVectorizer(ngram_range=(4, 7), lowercase=True, stop_words=None)

In [179]:
X = vectorizer.fit_transform(df["oracle_text"])

In [180]:
sum_words = X.sum(axis=0)
word_freq = [(word, sum_words[0, idx]) for word, idx in vectorizer.vocabulary_.items()]
word_freq = sorted(word_freq, key=lambda x: x[1], reverse=True)

In [181]:
common_phrases = pd.DataFrame(word_freq, columns=['phrase', 'count'])
common_phrases.iloc[:10]

Unnamed: 0,phrase,count
0,until end of turn,5990
1,at the beginning of,3261
2,when this creature enters,3049
3,gets until end of,2131
4,gets until end of turn,2130
5,the beginning of your,1869
6,at the beginning of your,1814
7,card from your graveyard,1593
8,creature gets until end,1570
9,creature gets until end of,1570


In [182]:
PHRASES = { # starter phrases
    r"when.*enters": "etb",
    r"until end of turn": "eot",
    r"beginning of .* upkeep": "b_o_u",
    r"search .* library": "tutor",
    r"without paying": "free",
    r"whenever .* attacks": "a_t",
    r"deals combat damage": "c_d_t",
    r"look at the tzo.*p": "s_s_t",
    r"return.*graveyard.*battlefield": "reanimate",
    r"when.*(this|a).*dies": "o_d_t",
    r"when.*(this|a).*leaves.": "l_b_t",
}

In [183]:
def canonicalize_text(str):
    for phr, rep in PHRASES.items():
        if re.search(phr, str) != None:
            str = re.sub(phr, rep, str)
    return str

In [184]:
df["oracle_text"] = df["oracle_text"].apply(canonicalize_text)

In [258]:
text_vectorizer = TextVectorization(
    max_tokens=5000, # increased vocab size
    output_mode='int',
    ngrams = (2,6),
    encoding='utf-8'
)

In [259]:
text_vectorizer.adapt(df["oracle_text"])

### Rarity Encoding

In [187]:
rare_encoder = LabelEncoder()

In [188]:
df["rarity"].value_counts()

rarity
rare        11445
common       9878
uncommon     8967
mythic       1987
special         2
Name: count, dtype: int64

In [189]:
df["rarity"] = df["rarity"].apply(lambda x: "common" if x == "special" else x) # the two special cards have common rarity on scryfall

In [190]:
df["rarity"] = rare_encoder.fit_transform(df["rarity"])

### Keyword Encoding

In [191]:
keywrd_encoder = LabelEncoder()

In [192]:
keywords = [] # taking code from earlier
for unique in df["keywords"].values:
    if unique != []:
        for val in unique:
            if val not in keywords:
                keywords.append(val)

keywrd_encoder.fit(keywords)


In [193]:
len(keywords)

604

In [194]:
df["keywords"] = df["keywords"].apply(lambda x: keywrd_encoder.transform(x))

### Color Identity Encoding (one more time!)

definitely could've done a for loop for each column i wanted to encode but oh well it's a little late for that

In [195]:
df

Unnamed: 0,name,cmc,oracle_text,colors,keywords,produced_mana,rarity,power,toughness,legendary,subtype,main_type,price,year,is_x,pip_count
1,Fury Sliver,6.0,all sliver creatures have double strike.,[R],[],,3,3,3,0,[310],[2],0.30,2006,0,1
2,Kor Outfitter,2.0,"etb, you may attach target equipment you contr...",[W],[],,0,2,2,0,"[183, 315]",[2],0.14,2009,0,2
4,Siren Lookout,3.0,"flying. etb, it explores. (reveal the top card...",[U],"[230, 198]",,0,1,2,0,"[305, 259]",[2],0.08,2017,0,1
7,Surge of Brilliance,2.0,paradox — draw a card for each spell you've ca...,[U],"[390, 238]",,3,,,0,[],[4],0.20,2023,0,1
9,Venerable Knight,1.0,"o_d_t, put a +1/+1 counter on target knight yo...",[W],[],,3,2,1,0,"[157, 181]",[2],0.14,2019,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
108944,Morkrut Banshee,5.0,"morbid — etb, if a creature died this turn, ta...",[B],[366],,3,4,4,0,[325],[2],0.09,2011,0,2
108947,Deeproot Historian,4.0,merfolk and druid cards in your graveyard have...,[G],[],,2,3,3,0,"[204, 97]",[2],0.15,2023,0,1
108949,Aggressive Biomancy,2.0,create x tokens that are copies of target crea...,"[G, U]",[215],,2,,,0,[],[7],0.19,2024,1,2
108952,Faerie Bladecrafter,3.0,"flying. o_d_t, each opponent loses x life and ...",[B],[230],,2,2,2,0,"[114, 280]",[2],1.29,2023,0,1


In [196]:
c_i = LabelEncoder()

In [197]:
c_i.fit(["W", "G", "R", "B", "U"])

In [198]:
c_i.transform(["W", "U"])

array([4, 3])

In [199]:
df["colors"] = df["colors"].apply(lambda x: c_i.transform(x))

### Produced Mana 

changing this to binary value if it does(n't)

In [200]:
df["produced_mana"] = df["produced_mana"].replace(pd.NA, 0)

In [201]:
df["produced_mana"] = df["produced_mana"].apply(lambda x: 1 if x != 0 else x)

### Final Step: Replace Power/Toughness NaN with -1

In [202]:
df["power"] = pd.to_numeric(df["power"], errors="coerce").fillna(-1)
df["toughness"] = pd.to_numeric(df["toughness"], errors="coerce").fillna(-1)

### Final Data Filtering

In [203]:
df = df.loc[df["price"] <= 500]

In [204]:
df = df.loc[df["year"] >= 2000]

In [205]:
df.sort_values(by="price", ascending=False)

Unnamed: 0,name,cmc,oracle_text,colors,keywords,produced_mana,rarity,power,toughness,legendary,subtype,main_type,price,year,is_x,pip_count
45968,Mox Opal,0.0,metalcraft — {t}: add one mana of any color. a...,[],[356],1,1,-1.0,-1.0,1,[],[0],153.41,2010,0,0
11050,"Liliana, Dreadhorde General",6.0,"o_d_t, draw a card.. +1: create a 2/2 black zo...",[0],[],0,1,-1.0,-1.0,1,[194],[6],133.33,2019,0,2
45216,Chrome Mox,0.0,"imprint — etb, you may exile a nonartifact, no...",[],[293],1,2,-1.0,-1.0,0,[],[0],96.92,2003,0,0
108758,The Great Henge,9.0,"this spell costs {x} less to cast, where x is ...",[1],[],1,1,-1.0,-1.0,1,[],[0],92.30,2019,0,2
93570,The One Ring,4.0,"indestructible. etb, if you cast it, you gain ...",[],[296],0,1,-1.0,-1.0,1,[],[0],79.86,2023,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76273,Constrictor Sage,5.0,"etb, tap target creature an opponent controls ...",[3],[],0,3,4.0,4.0,0,"[314, 378]",[2],0.01,2025,0,1
13449,Flummoxed Cyclops,4.0,reach. whenever two or more creatures your opp...,[2],[435],0,0,4.0,4.0,0,[73],[2],0.01,2020,0,1
105586,Gruul Scrapper,4.0,"etb, if {r} was spent to cast it, it gains has...",[1],[],0,0,3.0,2.0,0,"[157, 38]",[2],0.01,2006,0,1
57252,Shepherding Spirits,6.0,"flying. plainscycling {2} ({2}, discard this c...",[4],"[230, 402, 318, 570, 129]",0,0,4.0,5.0,0,[325],[2],0.01,2024,0,2


In [303]:
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,name,cmc,oracle_text,colors,keywords,produced_mana,rarity,power,toughness,legendary,subtype,main_type,price,year,is_x,pip_count
0,Fury Sliver,6.0,all sliver creatures have double strike.,[2],[],0,3,3.0,3.0,0,[310],[2],0.30,2006,0,1
1,Kor Outfitter,2.0,"etb, you may attach target equipment you contr...",[4],[],0,0,2.0,2.0,0,"[183, 315]",[2],0.14,2009,0,2
2,Siren Lookout,3.0,"flying. etb, it explores. (reveal the top card...",[3],"[230, 198]",0,0,1.0,2.0,0,"[305, 259]",[2],0.08,2017,0,1
3,Surge of Brilliance,2.0,paradox — draw a card for each spell you've ca...,[3],"[390, 238]",0,3,-1.0,-1.0,0,[],[4],0.20,2023,0,1
4,Venerable Knight,1.0,"o_d_t, put a +1/+1 counter on target knight yo...",[4],[],0,3,2.0,1.0,0,"[157, 181]",[2],0.14,2019,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28071,Morkrut Banshee,5.0,"morbid — etb, if a creature died this turn, ta...",[0],[366],0,3,4.0,4.0,0,[325],[2],0.09,2011,0,2
28072,Deeproot Historian,4.0,merfolk and druid cards in your graveyard have...,[1],[],0,2,3.0,3.0,0,"[204, 97]",[2],0.15,2023,0,1
28073,Aggressive Biomancy,2.0,create x tokens that are copies of target crea...,"[1, 3]",[215],0,2,-1.0,-1.0,0,[],[7],0.19,2024,1,2
28074,Faerie Bladecrafter,3.0,"flying. o_d_t, each opponent loses x life and ...",[0],[230],0,2,2.0,2.0,0,"[114, 280]",[2],1.29,2023,0,1


## Model Building

### Numerical Data

In [206]:
numerical = df.drop(["name", "oracle_text", "colors", "keywords", "subtype", "main_type", "price"], axis=1)

In [207]:
numerical.columns

Index(['cmc', 'produced_mana', 'rarity', 'power', 'toughness', 'legendary',
       'year', 'is_x', 'pip_count'],
      dtype='object')

In [208]:
scaler = StandardScaler()
scaled_data = scaler.fit_transform(numerical)

In [209]:
number_inputs = keras.Input(shape=(9,), name="numerical")
normalized = keras.layers.Normalization()(number_inputs)

### Array Data

describing code below:
have to pad all the array inputs, tell keras the shape of the input, and create an embedding layer to vectorize said input

input_dim is equal to number of unique ids

In [210]:
colors_padded = keras.preprocessing.sequence.pad_sequences(df['colors'])
colors_input = keras.Input(shape=(5,), dtype="int32", name="colors")
colors_embed = keras.layers.Embedding(input_dim=5, output_dim=4)(colors_input)
colors_pooled = keras.layers.GlobalAveragePooling1D()(colors_embed)

keywords_padded = keras.preprocessing.sequence.pad_sequences(df['keywords'])
keywords_input = keras.Input(shape=(10,), dtype="int32", name="keywords")
keywords_embed = keras.layers.Embedding(input_dim=604, output_dim=8)(keywords_input)
keywords_pooled = keras.layers.GlobalAveragePooling1D()(keywords_embed)

subtypes_padded = keras.preprocessing.sequence.pad_sequences(df['subtype'])
subtypes_input = keras.Input(shape=(4,), dtype="int32", name="subtypes")
subtypes_embed = keras.layers.Embedding(input_dim=393, output_dim=8)(subtypes_input)
subtypes_pooled = keras.layers.GlobalAveragePooling1D()(subtypes_embed)

main_type_padded = keras.preprocessing.sequence.pad_sequences(df['main_type'])
main_type_input = keras.Input(shape=(2,), dtype="int32", name="main_type")
main_type_embed = keras.layers.Embedding(input_dim=8, output_dim=4)(main_type_input)
main_type_pooled = keras.layers.GlobalAveragePooling1D()(main_type_embed)


### Tokenized Data

In [269]:
text_input = keras.Input(shape=(), dtype=tf.string, name="oracle_text")
oracle_vector = text_vectorizer(text_input)
oracle_embed = keras.layers.Embedding(input_dim=5000, output_dim=64)(oracle_vector)
oracle_pooled = keras.layers.GlobalAveragePooling1D()(oracle_embed)

In [270]:
#oracle_array = text_vectorizer(df["oracle_text"].values).numpy()

### Model Compiling

In [271]:
all_inputs = [
    number_inputs,
    text_input,
    colors_input,
    keywords_input,
    subtypes_input,
    main_type_input
]

all_features = keras.layers.concatenate([
    normalized,
    oracle_pooled,
    colors_pooled,
    keywords_pooled,
    subtypes_pooled,
    main_type_pooled
])

In [272]:
x = keras.layers.Dense(128, activation="relu")(all_features)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dense(32, activation="relu")(x)
output = keras.layers.Dense(1, name="price")(x)

In [273]:
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dense(32, activation="relu")(x)
output = keras.layers.Dense(1, name="price")(x)

In [274]:
model = keras.Model(inputs=all_inputs, outputs=output)
model.compile(optimizer="adam", loss=keras.losses.MeanSquaredError(), metrics=[keras.metrics.MeanAbsoluteError()])

In [275]:
early_stopping_cb = keras.callbacks.EarlyStopping(patience=25, start_from_epoch=100, monitor="mean_absolute_error", mode='min')
#model_checkpoint_cb = keras.callbacks.ModelCheckpoint(f"models\\best_model", save_best_only=True)
run_index = 1 # increment every time you train the model
run_logdir = os.path.join(os.curdir, "logs", "run_{:03d}".format(run_index))
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)
callbacks = [early_stopping_cb, 
             #model_checkpoint_cb,
               tensorboard_cb]

In [276]:
model.fit(
    {
        "numerical": scaled_data,
        "oracle_text": df["oracle_text"].values,
        "keywords": keywords_padded,
        "colors": colors_padded,
        "subtypes": subtypes_padded,
        "main_type": main_type_padded
    },
    y=df["price"],
    epochs=200, # long ahh runtime fr
    batch_size=32,
    validation_split=0.3,
    callbacks=callbacks
)

Epoch 1/200
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - loss: 15.8369 - mean_absolute_error: 1.4765 - val_loss: 15.2136 - val_mean_absolute_error: 1.2526
Epoch 2/200
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - loss: 13.9673 - mean_absolute_error: 1.3678 - val_loss: 13.4691 - val_mean_absolute_error: 1.3660
Epoch 3/200
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - loss: 12.3213 - mean_absolute_error: 1.2603 - val_loss: 12.8788 - val_mean_absolute_error: 1.1394
Epoch 4/200
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - loss: 10.5052 - mean_absolute_error: 1.1641 - val_loss: 13.1343 - val_mean_absolute_error: 1.1621
Epoch 5/200
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - loss: 9.6859 - mean_absolute_error: 1.1404 - val_loss: 11.7872 - val_mean_absolute_error: 1.3163
Epoch 6/200
[1m615/615[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m

<keras.src.callbacks.history.History at 0x27bb984ba70>

#### Initial Results:

MAE of 0.36 cents, loss of 0.81. Will try to optimize!

In [277]:
model.save("models\\best_model.keras")

## Model Improvement

In [278]:
text_vectorizer.save_assets("models\\text_vectorizer")

In [279]:
model2 = keras.models.load_model(
    "models\\best_model.keras",
    custom_objects={
        "TextVectorization": TextVectorization,
    }
)

In [308]:
df.loc[df["name"].str.contains("Liliana")]

Unnamed: 0,name,cmc,oracle_text,colors,keywords,produced_mana,rarity,power,toughness,legendary,subtype,main_type,price,year,is_x,pip_count
1515,Liliana's Caress,2.0,"whenever an opponent discards a card, that pla...",[0],[],0,3,-1.0,-1.0,0,[],[3],6.53,2010,0,1
1726,Liliana's Defeat,1.0,destroy target black creature or black planesw...,[0],[],0,3,-1.0,-1.0,0,[],[7],0.09,2017,0,1
2509,Liliana Vess,5.0,+1: target player discards a card.. −2: tutor ...,[0],[],0,2,-1.0,-1.0,1,[194],[6],11.57,2007,0,2
2788,Liliana's Steward,1.0,"{t}, sacrifice this creature: target opponent ...",[0],[],0,0,1.0,2.0,0,[391],[2],0.03,2020,0,1
2826,"Liliana, Dreadhorde General",6.0,"o_d_t, draw a card.. +1: create a 2/2 black zo...",[0],[],0,1,-1.0,-1.0,1,[194],[6],133.33,2019,0,2
3201,Liliana's Shade,4.0,"etb, you may tutor for a swamp card, reveal it...",[0],[],0,0,1.0,1.0,0,[299],[2],0.07,2012,0,2
3519,Oath of Liliana,3.0,"etb, each opponent sacrifices a creature of th...",[0],[],0,2,-1.0,-1.0,1,[],[3],0.34,2016,0,1
4450,Liliana's Influence,6.0,put a -1/-1 counter on each creature you don't...,[0],[],0,2,-1.0,-1.0,0,[],[7],0.3,2017,0,2
4877,Liliana of the Dark Realms,4.0,"+1: tutor for a swamp card, reveal it, put it ...",[0],[],1,1,-1.0,-1.0,1,[194],[6],6.65,2012,0,2
5575,Liliana's Specter,3.0,"flying. etb, each opponent discards a card.",[0],[230],0,0,2.0,1.0,0,[319],[2],0.2,2010,0,2


In [309]:
df.iloc[2509]

name                                                  Liliana Vess
cmc                                                            5.0
oracle_text      +1: target player discards a card.. −2: tutor ...
colors                                                         [0]
keywords                                                        []
produced_mana                                                    0
rarity                                                           2
power                                                         -1.0
toughness                                                     -1.0
legendary                                                        1
subtype                                                      [194]
main_type                                                      [6]
price                                                        11.57
year                                                          2007
is_x                                                          

In [310]:
model2.predict({
    "numerical": scaled_data[2509:2510],
    "oracle_text": np.array([df["oracle_text"].values[2509]], dtype=object),
    "keywords": np.array([keywords_padded[2509]]),
    "colors": np.array([colors_padded[2509]]),
    "subtypes": np.array([subtypes_padded[2509]]),
    "main_type": np.array([main_type_padded[2509]])
})

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step


array([[14.1174755]], dtype=float32)