# Loading Card Data for English and Japanese

### For this project I'll be using the AllPrintings.json provided by mtgjson.com. It contains data for most mtg cards printed for each core/expansion set. 
### The json is pretty massive but loads easily into pandas. Here I set up a dataframe that contains info on each card; removing duplicates. 

In [1]:
import pandas as pd

mtg_df = pd.read_json('mtg_data/AllPrintings.json', encoding='utf-8')
corelist_2d = mtg_df.loc['cards'].tolist()
cards = [c for card_set in corelist_2d for c in card_set]

cards_df = pd.DataFrame(cards)[['colorIdentity', 'convertedManaCost', 'name', 
                                'originalText', 'types', 'foreignData']]
cards_df = cards_df.drop_duplicates(subset=['name'], keep='last')
cards_df = cards_df.reset_index(drop=True)

cards_df.head()

Unnamed: 0,colorIdentity,convertedManaCost,name,originalText,types,foreignData
0,[W],4.0,Abuna's Chant,Choose one You gain 5 life; or prevent the nex...,[Instant],"[{'language': 'German', 'multiverseId': 80877,..."
1,[U],4.0,Advanced Hoverguard,Flying\n{U}: Advanced Hoverguard can't be the ...,[Creature],[{'flavorText': '„Sie sind wie ihre Vedalken-G...
2,[],5.0,Anodet Lurker,When Anodet Lurker is put into a graveyard fro...,"[Artifact, Creature]",[{'flavorText': 'Diese Maschinen basteln sich ...
3,[],6.0,Arachnoid,Arachnoid may block as though it had flying.,"[Artifact, Creature]",[{'flavorText': 'Er folgt der grünen Sonne übe...
4,[W],3.0,Armed Response,Armed Response deals damage to target attackin...,[Instant],[{'flavorText': 'Raksha beobachtete die Goblin...


### In addition to English I also want to work with Japanese data. The only reason I choose Japanese is because it's the only second language I know well enough to work with. But the json has language data for other languages as well. 
### To get the Japanese data for each card I'll have to further process each list of dictionaries contained in the foreignData column of the dataframe

In [2]:
# separate japanese data from the rest of the foreign language data

def get_Japanese(lyst):
    for data in lyst:
        if data['language'] == 'Japanese':
            return data
    return None

ja_data = cards_df['foreignData'].apply(get_Japanese)
cards_df = cards_df.assign(ja_data = ja_data.values)
cards_df = cards_df[cards_df['ja_data'].notna()].reset_index(drop=True)

cards_df.loc[0:2, :]

Unnamed: 0,colorIdentity,convertedManaCost,name,originalText,types,foreignData,ja_data
0,[W],4.0,Abuna's Chant,Choose one You gain 5 life; or prevent the nex...,[Instant],"[{'language': 'German', 'multiverseId': 80877,...","{'language': 'Japanese', 'multiverseId': 80547..."
1,[U],4.0,Advanced Hoverguard,Flying\n{U}: Advanced Hoverguard can't be the ...,[Creature],[{'flavorText': '„Sie sind wie ihre Vedalken-G...,{'flavorText': 'まるでヴィダルケンの首領みたいね。触れず、遠くて、どこにでも...
2,[],5.0,Anodet Lurker,When Anodet Lurker is put into a graveyard fro...,"[Artifact, Creature]",[{'flavorText': 'Diese Maschinen basteln sich ...,{'flavorText': 'この機械は屑鉄から恐ろしい形相を組み立て、餌を捜し求めるもの...


In [3]:
# concat japanese data with english data

ja_data = cards_df['ja_data'].tolist()
ja_df = pd.DataFrame(ja_data)[['name', 'text', 'type', 'flavorText']]
ja_df = ja_df.rename(columns={'name': 'ja_name', 'text': 'ja_text', 'type': 'ja_type', 
                              'flavorText': 'ja_flavorText'})

cards_df = pd.concat([cards_df, ja_df], axis=1)
cards_df = cards_df.drop(columns=['foreignData', 'ja_data'])

cards_df.loc[0, :]

colorIdentity                                                      [W]
convertedManaCost                                                  4.0
name                                                     Abuna's Chant
originalText         Choose one You gain 5 life; or prevent the nex...
types                                                        [Instant]
ja_name                                                          高僧の詠唱
ja_text              以下の２つから１つを選ぶ。「あなたは５点のライフを得る。」「クリーチャー１体を対象とする。こ...
ja_type                                                         インスタント
ja_flavorText                                                      NaN
Name: 0, dtype: object

### Finally, I'll create an integer encoding column matching each color to a number. Then I'll tokenize the names of each of the cards (in each language) using Spacy. I'm going to categorize all non mono colors into the same category since I'm only going to be looking at mono colored cards later.

In [4]:
import spacy

# integer encoding dictionary
COLOR_CODE = { 'W': 0, 'U': 1, 'B': 2, 'R': 3, 'G': 4 }

def encode_color(colors):
    if len(colors) == 1:
        color = colors[0]
        return COLOR_CODE[color]
    else:
        return 5

def tokenize(sentence):
    doc = nlp(sentence)
    return [tok.lemma_.lower() for tok in doc if not tok.is_punct 
        and not tok.is_stop and not tok.is_space and tok.pos_ != 'NUM']

# convert colors to integers
int_encoding = cards_df['colorIdentity'].apply(encode_color)
# get english tokens
nlp = spacy.load('en_core_web_md', disable=['ner', 'parser'])
en_toks = cards_df['name'].apply(tokenize)
# get japanese tokens
nlp = spacy.load('ja_core_news_md', disable=['ner', 'parser'])
ja_toks = cards_df['ja_name'].apply(tokenize)

In [5]:
import pickle
cards_df = cards_df.assign(encoding=int_encoding, en_toks=en_toks, ja_toks=ja_toks)

with open('mtg_data/cards_df.pickle', 'wb') as f:
    pickle.dump(cards_df, f)

cards_df.head()

Unnamed: 0,colorIdentity,convertedManaCost,name,originalText,types,ja_name,ja_text,ja_type,ja_flavorText,encoding,en_toks,ja_toks
0,[W],4.0,Abuna's Chant,Choose one You gain 5 life; or prevent the nex...,[Instant],高僧の詠唱,以下の２つから１つを選ぶ。「あなたは５点のライフを得る。」「クリーチャー１体を対象とする。こ...,インスタント,,0,"[abuna, chant]","[高僧, 詠唱]"
1,[U],4.0,Advanced Hoverguard,Flying\n{U}: Advanced Hoverguard can't be the ...,[Creature],上位の空護り,飛行\n{U}：このターン、上位の空護りは呪文や能力の対象にならない。,クリーチャー — ドローン,まるでヴィダルケンの首領みたいね。触れず、遠くて、どこにでもいるんだから。 ―― ニューロッ...,1,"[advanced, hoverguard]","[上位, 空, 護る]"
2,[],5.0,Anodet Lurker,When Anodet Lurker is put into a graveyard fro...,"[Artifact, Creature]",潜むエイノデット,潜むエイノデットが場から墓地に置かれたとき、あなたは３点のライフを得る。,アーティファクト・クリーチャー,この機械は屑鉄から恐ろしい形相を組み立て、餌を捜し求めるものを脅して追い払う。,5,"[anodet, lurker]","[潜む, エイノデット]"
3,[],6.0,Arachnoid,Arachnoid may block as though it had flying.,"[Artifact, Creature]",機械蜘蛛,機械蜘蛛は飛行を持っているかのようにブロックに参加してよい。,アーティファクト・クリーチャー — 蜘蛛,それは巨大な囁き絹の巣を振り回し、ミラディンの地表で緑の太陽を追い続ける。,5,[arachnoid],"[機械, 蜘蛛]"
4,[W],3.0,Armed Response,Armed Response deals damage to target attackin...,[Instant],武力対応,攻撃クリーチャー１体を対象とする。武力対応はそれに、あなたがコントロールする装備品の総数に等...,インスタント,ラクシャはゴブリンが剃刀ヶ原になだれ込むのを見て思った。彼らは止まるタイミングを知らないんじ...,0,"[armed, response]","[武力, 対応]"
