# Goals:

**1. Develop a natural language processor capable of predicting the COLOR of a magic card based on the rules text of that card.**

**2. Develop a natural language processor capable of predicting the TYPE of a magic card based on the rules text of that card.**

In [1]:
# imports and display options
import pandas as pd
import numpy as np
import math
from math import sqrt

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, explained_variance_score

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

import prepare as p
# import explore as e
# import model as m

#pd.set_option('display.max_colwidth', -1)

# Acquire

* Used file from previouse 
* A CSV, containing an up to date breakdown of each card that has been printed so far, was obtained from MTGJSON.com
* Each row represents a card or a version of a card
* The CSV was read into a pandas dataframe
* The original dataframe contained 50,412 rows and 71 columns

# Prepare (beginning)

The following steps were taken to prepare the data:

1. Restricted dataframe to relevant columns
 
2. Restricted dataframe to rows containing cards that exist in physical form

3. Restricted dataframe to row containing a flavor text

4. Restricted dataframe to rows with a single 'color identity' (see data dictionary Color)

5. Merged rows containing multiple similer types into one of the seven major game types

6. Dropped rows containing multiple types that could not be merged

7. Cleaned up flavor text by removing quote attributions so could merge on flavor text and eliminate most of the duplicates 

8. Reordered columns

9. Dropped rows where the flavor text was not in english 

10. Dropped duplicate rows (I dropped all that I could find it is possible that some duplicates remain.)

11. Added sentiment column showing compound sentiment score using VADER

12. Added intensity column showing the absolute value of sentiment

13. Renamed columns

14. Rounded number values in the data frame to two decimals	

15. Wrote prepared data to ‘mtgprep.csv’ for ease of access

In [2]:
df = p.wrangle_mtg()

In [3]:
df.shape

(50412, 71)

In [4]:
df.columns

Index(['index', 'id', 'artist', 'borderColor', 'colorIdentity',
       'colorIndicator', 'colors', 'convertedManaCost', 'duelDeck',
       'edhrecRank', 'faceConvertedManaCost', 'flavorText', 'frameEffect',
       'frameEffects', 'frameVersion', 'hand', 'hasFoil', 'hasNoDeckLimit',
       'hasNonFoil', 'isAlternative', 'isArena', 'isBuyABox', 'isDateStamped',
       'isFullArt', 'isMtgo', 'isOnlineOnly', 'isOversized', 'isPaper',
       'isPromo', 'isReprint', 'isReserved', 'isStarter', 'isStorySpotlight',
       'isTextless', 'isTimeshifted', 'layout', 'leadershipSkills', 'life',
       'loyalty', 'manaCost', 'mcmId', 'mcmMetaId', 'mtgArenaId', 'mtgoFoilId',
       'mtgoId', 'multiverseId', 'name', 'names', 'number', 'originalText',
       'originalType', 'otherFaceIds', 'power', 'printings', 'purchaseUrls',
       'rarity', 'scryfallId', 'scryfallIllustrationId', 'scryfallOracleId',
       'setCode', 'side', 'subtypes', 'supertypes', 'tcgplayerProductId',
       'text', 'toughness', 

In [5]:
df = df[['name','colorIdentity','isPaper','types','text']]

df = df[df.isPaper==1]
df = df.drop(columns='isPaper')

df = df[df.text.notna()]

In [6]:
df.head(5)

Unnamed: 0,name,colorIdentity,types,text
0,Abundance,G,Enchantment,"If you would draw a card, you may instead choo..."
1,Academy Researchers,U,Creature,When Academy Researchers enters the battlefiel...
2,Adarkar Wastes,"U,W",Land,{T}: Add {C}.\n{T}: Add {W} or {U}. Adarkar Wa...
3,Afflict,B,Instant,Target creature gets -1/-1 until end of turn.\...
4,Aggressive Urge,G,Instant,Target creature gets +1/+1 until end of turn.\...


In [7]:
# use only cards with a single color identity 
colors = ['W','U','B','R','G']
df_color = df.loc[df.colorIdentity.isin(colors)]

In [8]:
df['types'] = np.where(df['types'] == 'Tribal,Instant', 'Instant', df['types'])

df['types'] = np.where(df['types'] == 'Tribal,Sorcery', 'Sorcery', df['types'])

df['types'] = np.where(df['types'] == 'Tribal,Enchantment', 'Enchantment', df['types'])

df['types'] = np.where(df['types'] == 'instant', 'Instant', df['types'])

# remove remaining cards that are not exclusive to one of the seven card types
types = ['Creature','Instant','Sorcery','Enchantment','Land','Artifact','Planeswalker']
df_type = df.loc[df.types.isin(types)]
df_type.drop(columns='colorIdentity')

Unnamed: 0,name,types,text
0,Abundance,Enchantment,"If you would draw a card, you may instead choo..."
1,Academy Researchers,Creature,When Academy Researchers enters the battlefiel...
2,Adarkar Wastes,Land,{T}: Add {C}.\n{T}: Add {W} or {U}. Adarkar Wa...
3,Afflict,Instant,Target creature gets -1/-1 until end of turn.\...
4,Aggressive Urge,Instant,Target creature gets +1/+1 until end of turn.\...
...,...,...,...
50407,Windborne Charge,Sorcery,Two target creatures you control each get +2/+...
50408,Windrider Eel,Creature,Flying\nLandfall — Whenever a land enters the ...
50409,World Queller,Creature,"At the beginning of your upkeep, you may choos..."
50410,Zektar Shrine Expedition,Enchantment,Landfall — Whenever a land enters the battlefi...


In [9]:
df_color.head()

Unnamed: 0,name,colorIdentity,types,text
0,Abundance,G,Enchantment,"If you would draw a card, you may instead choo..."
1,Academy Researchers,U,Creature,When Academy Researchers enters the battlefiel...
3,Afflict,B,Instant,Target creature gets -1/-1 until end of turn.\...
4,Aggressive Urge,G,Instant,Target creature gets +1/+1 until end of turn.\...
5,Agonizing Memories,B,Sorcery,Look at target player's hand and choose two ca...


In [10]:
df_type.head()

Unnamed: 0,name,colorIdentity,types,text
0,Abundance,G,Enchantment,"If you would draw a card, you may instead choo..."
1,Academy Researchers,U,Creature,When Academy Researchers enters the battlefiel...
2,Adarkar Wastes,"U,W",Land,{T}: Add {C}.\n{T}: Add {W} or {U}. Adarkar Wa...
3,Afflict,B,Instant,Target creature gets -1/-1 until end of turn.\...
4,Aggressive Urge,G,Instant,Target creature gets +1/+1 until end of turn.\...


In [11]:
df.shape

(43937, 4)

In [12]:
df = df[df.text.notna()]

In [13]:
df.shape

(43937, 4)