# Magic: the Gathering Recommender System
___

##### Problem Statement:  
I will use data on Magic: the Gathering cards to build a content-based recommender system that suggests similar cards in order to improve card selection during the deck building process.

##### Outline:  
1. Gathering Data  
    a. The data can be gathered from Scryfall's bulk data section which has every card as a json file
2. Cleaning Data  
    a. There is a lot of unnecessary data that I can drop  
    b. Extract the nested json objects
3. EDA
4. Recommender System  
    a. Content-Based Recommender
    b. Cosine similarity
5. Stretch Goals  
    a. Keep a running tally and rating system for a user-based collaborative recommender

##### Risks and Assumptions:  
One risk is that the data comes in the form of nested json objects which will need to be formatted in a way I can use it

##### Data Sources:  
[Scryfall Bulk Data](https://scryfall.com/docs/api/bulk-data)  
[Scryfall Default Cards](https://archive.scryfall.com/json/scryfall-default-cards.json)

## 01 - Cleaning
___

### Imports

In [1]:
import pandas as pd

In [2]:
df = pd.read_json('../Data/scryfall-default-cards.json')

In [3]:
df.head()

Unnamed: 0,object,id,oracle_id,multiverse_ids,tcgplayer_id,name,lang,released_at,uri,scryfall_uri,...,all_parts,mtgo_id,variation_of,color_indicator,printed_name,printed_type_line,printed_text,mtgo_foil_id,life_modifier,hand_modifier
0,card,dbcdbf7a-9294-47ad-9f93-c16b78c7463a,cd6250ae-9079-4a62-8a70-0d94fbac21bc,[],200607.0,Earthshaker Giant,en,2019-11-15,https://api.scryfall.com/cards/dbcdbf7a-9294-4...,https://scryfall.com/card/gn2/5/earthshaker-gi...,...,,,,,,,,,,
1,card,acb3ce9b-ee4f-410a-8db3-e87aeb0a4444,ab0dfae5-b9d4-417b-8a0d-2525ae3a73b9,[],200606.0,Fiendish Duo,en,2019-11-15,https://api.scryfall.com/cards/acb3ce9b-ee4f-4...,https://scryfall.com/card/gn2/4/fiendish-duo?u...,...,,,,,,,,,,
2,card,17b2ed72-d0f0-4d8d-bb5e-dce08d157466,7264d3b3-fd46-4ea3-a85d-f0b068c331ad,[],200605.0,Calculating Lich,en,2019-11-15,https://api.scryfall.com/cards/17b2ed72-d0f0-4...,https://scryfall.com/card/gn2/3/calculating-li...,...,,,,,,,,,,
3,card,0faa9eea-fbf1-41f7-9def-1ec3d5134a53,e6284fb3-7cf8-4730-b156-10085b70b0e8,[],200604.0,Sphinx of Enlightenment,en,2019-11-15,https://api.scryfall.com/cards/0faa9eea-fbf1-4...,https://scryfall.com/card/gn2/2/sphinx-of-enli...,...,,,,,,,,,,
4,card,ecbeac44-9392-4522-8ff5-87079386bd0a,43296f8b-58d9-446e-a538-1c4921552c41,[],200603.0,Highcliff Felidar,en,2019-11-15,https://api.scryfall.com/cards/ecbeac44-9392-4...,https://scryfall.com/card/gn2/1/highcliff-feli...,...,,,,,,,,,,


In [4]:
df.shape

(48536, 71)

In [5]:
df.columns

Index(['object', 'id', 'oracle_id', 'multiverse_ids', 'tcgplayer_id', 'name',
       'lang', 'released_at', 'uri', 'scryfall_uri', 'layout', 'highres_image',
       'image_uris', 'mana_cost', 'cmc', 'type_line', 'oracle_text', 'power',
       'toughness', 'colors', 'color_identity', 'legalities', 'games',
       'reserved', 'foil', 'nonfoil', 'oversized', 'promo', 'reprint',
       'variation', 'set', 'set_name', 'set_type', 'set_uri', 'set_search_uri',
       'scryfall_set_uri', 'rulings_uri', 'prints_search_uri',
       'collector_number', 'digital', 'rarity', 'flavor_text', 'card_back_id',
       'artist', 'artist_ids', 'illustration_id', 'border_color', 'frame',
       'full_art', 'textless', 'booster', 'story_spotlight', 'related_uris',
       'watermark', 'frame_effects', 'card_faces', 'promo_types',
       'edhrec_rank', 'loyalty', 'preview', 'arena_id', 'all_parts', 'mtgo_id',
       'variation_of', 'color_indicator', 'printed_name', 'printed_type_line',
       'printed_text', 

In [6]:
df['object'].value_counts()

card    48536
Name: object, dtype: int64

___
### Drop unneeded columns

In [7]:
unneeded = ['id', 'oracle_id', 'multiverse_ids', 'tcgplayer_id', 'uri', 'scryfall_uri', 'image_uris', 
            'highres_image', 'games', 'set_uri', 'set_search_uri',  'scryfall_set_uri', 'rulings_uri', 
            'prints_search_uri', 'collector_number', 'card_back_id', 'artist_ids', 'illustration_id', 
            'story_spotlight', 'related_uris', 'preview', 'arena_id', 'all_parts', 'mtgo_id', 'variation_of',
            'color_indicator', 'mtgo_foil_id', 'life_modifier', 'hand_modifier', 'promo_types', 'frame_effects',
            'watermark', 'printed_name', 'printed_type_line', 'printed_text']
df = df.drop(columns=unneeded)

In [8]:
df.head()

Unnamed: 0,object,name,lang,released_at,layout,mana_cost,cmc,type_line,oracle_text,power,...,flavor_text,artist,border_color,frame,full_art,textless,booster,card_faces,edhrec_rank,loyalty
0,card,Earthshaker Giant,en,2019-11-15,normal,{4}{G}{G},6.0,Creature — Giant Druid,Trample\nWhen Earthshaker Giant enters the bat...,6,...,"""Come, my wild children. Let's give the interl...",Milivoj Ćeran,black,2015,False,False,False,,,
1,card,Fiendish Duo,en,2019-11-15,normal,{4}{R}{R},6.0,Creature — Devil,First strike\nIf a source would deal damage to...,5,...,"Half the size, double the mayhem.",Lucas Graciano,black,2015,False,False,False,,,
2,card,Calculating Lich,en,2019-11-15,normal,{4}{B}{B},6.0,Creature — Zombie Wizard,Menace\nWhenever a creature attacks one of you...,5,...,"""We share a common enemy. Does that not make u...",Antonio José Manzanedo,black,2015,False,False,False,,,
3,card,Sphinx of Enlightenment,en,2019-11-15,normal,{4}{U}{U},6.0,Creature — Sphinx,Flying\nWhen Sphinx of Enlightenment enters th...,5,...,"""I would be a fool if I taught you everything ...",Johan Grenier,black,2015,False,False,False,,,
4,card,Highcliff Felidar,en,2019-11-15,normal,{5}{W}{W},7.0,Creature — Cat Beast,Vigilance\nWhen Highcliff Felidar enters the b...,5,...,"Once the felidar has marked you as prey, there...",Kimonas Theodossiou,black,2015,False,False,False,,,


In [9]:
df.columns

Index(['object', 'name', 'lang', 'released_at', 'layout', 'mana_cost', 'cmc',
       'type_line', 'oracle_text', 'power', 'toughness', 'colors',
       'color_identity', 'legalities', 'reserved', 'foil', 'nonfoil',
       'oversized', 'promo', 'reprint', 'variation', 'set', 'set_name',
       'set_type', 'digital', 'rarity', 'flavor_text', 'artist',
       'border_color', 'frame', 'full_art', 'textless', 'booster',
       'card_faces', 'edhrec_rank', 'loyalty'],
      dtype='object')

I also want to drop any digital cards because I want the recommender to only look at physical cards

In [10]:
df = df.drop(df[df['digital'] == True].index).reset_index(drop=True)

Also drop oversized cards

In [11]:
df = df.drop(df[df['oversized'] == True].index).reset_index(drop=True)

In [12]:
df = df.drop(columns=['oversized', 'digital'])

___
### check for nulls

In [13]:
df.isnull().sum()

object                0
name                  0
lang                  0
released_at           0
layout                0
mana_cost           261
cmc                   0
type_line             0
oracle_text         502
power             23776
toughness         23776
colors              261
color_identity        0
legalities            0
reserved              0
foil                  0
nonfoil               0
promo                 0
reprint               0
variation             0
set                   0
set_name              0
set_type              0
rarity                0
flavor_text       20199
artist                0
border_color          0
frame                 0
full_art              0
textless              0
booster               0
card_faces        44027
edhrec_rank        4694
loyalty           44020
dtype: int64

In [14]:
df.shape

(44529, 34)

Art Series cards only existed in the modern horrizon set and are not actual cards, so we should drop them from our data set

In [15]:
df = df.drop(df[df['layout'] == 'art_series'].index).reset_index(drop=True)

In [16]:
# edhrec_rank nulls should be 0. meaning no decks on edhrec play the card
df['edhrec_rank'] = df['edhrec_rank'].fillna(0)

### drop tokens

In [17]:
df['layout'].value_counts()

normal                42796
token                  1048
transform               174
split                   137
adventure                74
emblem                   69
leveler                  39
double_faced_token       33
flip                     30
saga                     25
host                     19
meld                     18
augment                  13
Name: layout, dtype: int64

In [18]:
non_cards_index = df[(df['layout'] == 'double_faced_token') | (df['layout'] == 'token') | 
                             (df['layout'] == 'scheme') | (df['layout'] == 'planar') | 
                             (df['layout'] == 'vanguard') | (df['layout'] == 'emblem')].index

In [19]:
df = df.drop(non_cards_index).reset_index(drop=True)
df = df.drop(df[df['set_type'] == 'token'].index).reset_index(drop=True)
df.shape

(43267, 34)

In [20]:
df.isnull().sum()

object                0
name                  0
lang                  0
released_at           0
layout                0
mana_cost           174
cmc                   0
type_line             0
oracle_text         415
power             23448
toughness         23448
colors              174
color_identity        0
legalities            0
reserved              0
foil                  0
nonfoil               0
promo                 0
reprint               0
variation             0
set                   0
set_name              0
set_type              0
rarity                0
flavor_text       18964
artist                0
border_color          0
frame                 0
full_art              0
textless              0
booster               0
card_faces        42852
edhrec_rank           0
loyalty           42758
dtype: int64

In [21]:
df[df['colors'].isnull()]['layout'].value_counts()

transform    174
Name: layout, dtype: int64

In [22]:
df[df['mana_cost'].isnull()]['layout'].value_counts()

transform    174
Name: layout, dtype: int64

In [23]:
df[df['oracle_text'].isnull()]['layout'].value_counts()

transform    174
split        137
adventure     74
flip          30
Name: layout, dtype: int64

In [24]:
df[df['name'] == 'Delver of Secrets // Insectile Aberration']['oracle_text']

6147     NaN
18973    NaN
Name: oracle_text, dtype: object

In [25]:
df['card_faces'].values.tolist()

[nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 [{'object': 'card_face',
   'name': 'Nightmare Moon',
   'mana_cost': '{4}{B}{B}',
   'type_line': 'Legendary Creature — Alicorn',
   'oracle_text': "Flying\nAs long as it's nighttime, Nightmare Moon gets +2/+2 and has menace.\n{6}: Transform Nightmare Moon. Anypony may activate this ability or help pay the cost. When they do, they become your friend.",
   'colors': ['B'],
   'power': '6',
   'toughness': '6',
   'flavor_text': '"The night. . . will last. . .forever!"',
   'watermark': 'mlpwaningmoon',
   'artist': 'John Thacker',
   'artist_id': '38ee615a-e59f-4e2e-b894-2b74c6e75541',
   'illustration_id': '587b0f05-3512-4e2d-9569-2f5f70bc0c92',
   'image_uris': {'small': 'https://img.scryfall.com/cards/small/front/5/6/5646ea19-0025-4f88-ad22-36968a1d3b89.jpg?1570186048',
    'normal': 'https://img.scryfall.com/cards/normal/front/5/6/5646ea19-0025-4f88-ad22-36968a1d3b89.jpg?1570186048',
    'large': 'https://img.scryfall.com/cards/large/front

Now I'd like to take the dual cards (transform, split, adventure, and flip cards) and break them out into their individual cards then remove the originals from the dataset

In [26]:
dual_cards = df[df['card_faces'].notnull()].reset_index(drop=True).copy()
dual_cards

Unnamed: 0,object,name,lang,released_at,layout,mana_cost,cmc,type_line,oracle_text,power,...,flavor_text,artist,border_color,frame,full_art,textless,booster,card_faces,edhrec_rank,loyalty
0,card,Nightmare Moon // Princess Luna,en,2019-10-22,transform,,6.0,Legendary Creature — Alicorn // Legendary Crea...,,,...,,John Thacker,silver,2015,False,False,False,"[{'object': 'card_face', 'name': 'Nightmare Mo...",0.0,
1,card,Lovestruck Beast // Heart's Desire,en,2019-10-04,adventure,{2}{G} // {G},3.0,Creature — Beast Noble // Sorcery — Adventure,,5,...,"His mind chose solitude, but his heart disagreed.",Kev Walker,black,2015,False,False,False,"[{'object': 'card_face', 'name': 'Lovestruck B...",13909.0,
2,card,Lovestruck Beast // Heart's Desire,en,2019-10-04,adventure,{2}{G} // {G},3.0,Creature — Beast Noble // Sorcery — Adventure,,5,...,"His mind chose solitude, but his heart disagreed.",Kev Walker,black,2015,False,False,False,"[{'object': 'card_face', 'name': 'Lovestruck B...",13909.0,
3,card,Bonecrusher Giant // Stomp,en,2019-10-04,adventure,{2}{R} // {1}{R},3.0,Creature — Giant // Instant — Adventure,,4,...,Not every tale ends in glory.,Victor Adame Minguez,black,2015,False,False,False,"[{'object': 'card_face', 'name': 'Bonecrusher ...",11665.0,
4,card,Bonecrusher Giant // Stomp,en,2019-10-04,adventure,{2}{R} // {1}{R},3.0,Creature — Giant // Instant — Adventure,,4,...,Not every tale ends in glory.,Victor Adame Minguez,black,2015,False,False,False,"[{'object': 'card_face', 'name': 'Bonecrusher ...",11665.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
410,card,Wax // Wane,en,2000-10-02,split,{G} // {W},2.0,Instant // Instant,,,...,,Ben Thompson,black,1997,False,False,True,"[{'object': 'card_face', 'name': 'Wax', 'mana_...",0.0,
411,card,Assault // Battery,en,2000-10-02,split,{R} // {3}{G},5.0,Sorcery // Sorcery,,,...,,Ben Thompson,black,1997,False,False,True,"[{'object': 'card_face', 'name': 'Assault', 'm...",0.0,
412,card,Pain // Suffering,en,2000-10-02,split,{B} // {3}{R},5.0,Sorcery // Sorcery,,,...,,David Martin,black,1997,False,False,True,"[{'object': 'card_face', 'name': 'Pain', 'mana...",19277.0,
413,card,Spite // Malice,en,2000-10-02,split,{3}{U} // {3}{B},8.0,Instant // Instant,,,...,,David Martin,black,1997,False,False,True,"[{'object': 'card_face', 'name': 'Spite', 'man...",18936.0,


In [27]:
pd.DataFrame(dual_cards['card_faces'][1])

Unnamed: 0,object,name,mana_cost,type_line,oracle_text,power,toughness,flavor_text,artist,artist_id,illustration_id
0,card_face,Lovestruck Beast,{2}{G},Creature — Beast Noble,Lovestruck Beast can't attack unless you contr...,5.0,5.0,"His mind chose solitude, but his heart disagreed.",Kev Walker,f366a0ee-a0cd-466d-ba6a-90058c7a31a6,5313c8d4-5dc9-484d-9b1f-5349de020e4e
1,card_face,Heart's Desire,{G},Sorcery — Adventure,Create a 1/1 white Human creature token. (Then...,,,,Kev Walker,f366a0ee-a0cd-466d-ba6a-90058c7a31a6,


In [40]:
dual_df = pd.DataFrame()
for card in dual_cards['card_faces']:
    pd.concat([dual_df, pd.DataFrame(card)])

In [41]:
dual_df