# Goals:

**1. Develop a natural language processor capable of predicting the COLOR of a magic card based on the rules text of that card.**

**2. Develop a natural language processor capable of predicting the TYPE of a magic card based on the rules text of that card.**

In [1]:
# imports and display options

import pandas as pd
import numpy as np
import math
from math import sqrt

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, explained_variance_score

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

import unicodedata
import re
import nltk
from nltk.corpus import stopwords

import prepare as p
# import explore as e
# import model as m

pd.set_option('display.max_colwidth', -1)

# Acquire

* Used file from previouse 
* A CSV, containing an up to date breakdown of each card that has been printed so far, was obtained from MTGJSON.com
* Each row represents a card or a version of a card
* The CSV was read into a pandas dataframe
* The original dataframe contained 51,430 rows and 73 columns

In [2]:
df = p.wrangle_mtg()
df.shape

(51430, 73)

# Prepare (beginning)

The following steps were taken to prepare the data:

1. Restricted dataframe to relevant columns
 
2. Restricted dataframe to rows containing cards that exist in physical form

3. Restricted dataframe to row containing a flavor text

4. Restricted dataframe to rows with a single 'color identity' (see data dictionary Color)

5. Merged rows containing multiple similer types into one of the seven major game types

6. Dropped rows containing multiple types that could not be merged

7. Cleaned up flavor text by removing quote attributions so could merge on flavor text and eliminate most of the duplicates 

8. Reordered columns

9. Dropped rows where the flavor text was not in english 

10. Dropped duplicate rows (I dropped all that I could find it is possible that some duplicates remain.)

11. Added sentiment column showing compound sentiment score using VADER

12. Added intensity column showing the absolute value of sentiment

13. Renamed columns

14. Rounded number values in the data frame to two decimals	

15. Wrote prepared data to ‘mtgprep.csv’ for ease of access

In [3]:
df = p.get_preped_data()

In [4]:
df.shape

(33736, 3)

In [5]:
df.head(25)

Unnamed: 0,name,color,text
0,Abundance,Green,"If you would draw a card, you may instead choose land or nonland and reveal cards from the top of your library until you reveal a card of the chosen kind. Put that card into your hand and put all other cards revealed this way on the bottom of your library in any order."
1,Academy Researchers,Blue,"When Academy Researchers enters the battlefield, you may put an Aura card from your hand onto the battlefield attached to Academy Researchers."
2,Afflict,Black,Target creature gets -1/-1 until end of turn.\nDraw a card.
3,Aggressive Urge,Green,Target creature gets +1/+1 until end of turn.\nDraw a card.
4,Agonizing Memories,Black,Look at target player's hand and choose two cards from it. Put them on top of that player's library in any order.
5,Air Elemental,Blue,Flying
6,Air Elemental,Blue,Flying
7,Ambassador Laquatus,Blue,{3}: Target player puts the top three cards of their library into their graveyard.
8,Anaba Bodyguard,Red,First strike (This creature deals combat damage before creatures without first strike.)
9,Anaba Bodyguard,Red,First strike (This creature deals combat damage before creatures without first strike.)


In [6]:
df.head(5)

Unnamed: 0,name,color,text
0,Abundance,Green,"If you would draw a card, you may instead choose land or nonland and reveal cards from the top of your library until you reveal a card of the chosen kind. Put that card into your hand and put all other cards revealed this way on the bottom of your library in any order."
1,Academy Researchers,Blue,"When Academy Researchers enters the battlefield, you may put an Aura card from your hand onto the battlefield attached to Academy Researchers."
2,Afflict,Black,Target creature gets -1/-1 until end of turn.\nDraw a card.
3,Aggressive Urge,Green,Target creature gets +1/+1 until end of turn.\nDraw a card.
4,Agonizing Memories,Black,Look at target player's hand and choose two cards from it. Put them on top of that player's library in any order.


In [7]:
df.text

0        If you would draw a card, you may instead choose land or nonland and reveal cards from the top of your library until you reveal a card of the chosen kind. Put that card into your hand and put all other cards revealed this way on the bottom of your library in any order.                                                
1        When Academy Researchers enters the battlefield, you may put an Aura card from your hand onto the battlefield attached to Academy Researchers.                                                                                                                                                                               
2        Target creature gets -1/-1 until end of turn.\nDraw a card.                                                                                                                                                                                                                                                                  
3        Target cre

In [8]:
def symble_to_word(text):
    
    
    text = text.replace("{T}","Tap")
    text = text.replace("{C}","ColorlessMana")
    text = text.replace("{W}","WhiteMana")
    text = text.replace("{B}","BlackMana")
    text = text.replace("{U}","BlueMana")
    text = text.replace("{R}","RedMana")
    text = text.replace("{G}","GreenMana")
    text = text.replace("+","Plus")
    text = text.replace("-","Minus")
    text = text.replace("/","and")

   


    return text


In [9]:
def remove_numbers(text):
    
    text = re.sub(r"[0-9]",'',text)
    
    return text

In [10]:
df["text"] = df.text.apply(symble_to_word).apply(remove_numbers)
df.head(25)

Unnamed: 0,name,color,text
0,Abundance,Green,"If you would draw a card, you may instead choose land or nonland and reveal cards from the top of your library until you reveal a card of the chosen kind. Put that card into your hand and put all other cards revealed this way on the bottom of your library in any order."
1,Academy Researchers,Blue,"When Academy Researchers enters the battlefield, you may put an Aura card from your hand onto the battlefield attached to Academy Researchers."
2,Afflict,Black,Target creature gets MinusandMinus until end of turn.\nDraw a card.
3,Aggressive Urge,Green,Target creature gets PlusandPlus until end of turn.\nDraw a card.
4,Agonizing Memories,Black,Look at target player's hand and choose two cards from it. Put them on top of that player's library in any order.
5,Air Elemental,Blue,Flying
6,Air Elemental,Blue,Flying
7,Ambassador Laquatus,Blue,{}: Target player puts the top three cards of their library into their graveyard.
8,Anaba Bodyguard,Red,First strike (This creature deals combat damage before creatures without first strike.)
9,Anaba Bodyguard,Red,First strike (This creature deals combat damage before creatures without first strike.)


In [11]:
def get_ASCII(article):
    '''
    normalizes a string into ASCII characters
    '''

    article = unicodedata.normalize('NFKD', article)\
    .encode('ascii', 'ignore')\
    .decode('utf-8', 'ignore')
    
    return article

In [12]:

def purge_non_characters(article):
    '''
    removes special characters from a string
    '''
    
    article = re.sub(r"[^a-z\s]", ' ', article)
    
    return article

In [13]:
def basic_clean(article):
    '''
    calls child functions preforms basic cleaning on a string
    converts string to lowercase, ASCII characters,
    and eliminates special characters
    '''
    # lowercases letters
    article = article.lower()

    # convert to ASCII characters
    article = get_ASCII(article)

    # remove non characters
    article = purge_non_characters(article)
    
    return article

In [14]:
def remove_stopwords(article,extra_words=[],exclude_words=[]):
    '''
    removes stopwords from a string
    user may specify a list of words to add or remove from the list of stopwords
    '''

    # create stopword list using english
    stopword_list = stopwords.words('english')
    
    # remove words in extra_words from stopword list 
    [stopword_list.remove(f'{word}') for word in extra_words]
    
    # add words fin exclude_words to stopword list
    [stopword_list.append(f'{word}') for word in exclude_words]
    
    # slpit article into list of words
    words = article.split()

    # remove words in stopwords from  list of words
    filtered_words = [w for w in words if w not in stopword_list]
    
    # rejoin list of words into article
    article_without_stopwords = ' '.join(filtered_words)
    
    return article_without_stopwords

In [15]:
def lemmatize(article):
    '''
    lemmatizes words in a string
    '''

    # create lemmatize object
    wnl = nltk.stem.WordNetLemmatizer()
    
    # split article into list of words and stem each word
    lemmas = [wnl.lemmatize(word) for word in article.split()]

    #  join words in list into a string
    article_lemmatized = ' '.join(lemmas)
    
    return article_lemmatized

In [16]:
# create column applying basic_cleaning and lemmatize functions
df['text_lemmatized'] = df.text.apply(basic_clean).apply(remove_stopwords).apply(lemmatize)

In [17]:
df.text_lemmatized

0        would draw card may instead choose land nonland reveal card top library reveal card chosen kind put card hand put card revealed way bottom library order                                                                                      
1        academy researcher enters battlefield may put aura card hand onto battlefield attached academy researcher                                                                                                                                     
2        target creature get minusandminus end turn draw card                                                                                                                                                                                          
3        target creature get plusandplus end turn draw card                                                                                                                                                                                            
4       

In [18]:
# use only cards with a single color identity 
colors = ['W','U','B','R','G']
df_color = df.loc[df.colorIdentity.isin(colors)]

AttributeError: 'DataFrame' object has no attribute 'colorIdentity'

In [None]:
df['types'] = np.where(df['types'] == 'Tribal,Instant', 'Instant', df['types'])

df['types'] = np.where(df['types'] == 'Tribal,Sorcery', 'Sorcery', df['types'])

df['types'] = np.where(df['types'] == 'Tribal,Enchantment', 'Enchantment', df['types'])

df['types'] = np.where(df['types'] == 'instant', 'Instant', df['types'])

# remove remaining cards that are not exclusive to one of the seven card types
types = ['Creature','Instant','Sorcery','Enchantment','Land','Artifact','Planeswalker']
df_type = df.loc[df.types.isin(types)]
df_type.drop(columns='colorIdentity')

In [None]:
df_color.head()

In [None]:
df_type.head()

In [None]:
df.shape

In [None]:
df = df[df.text.notna()]

In [None]:
df.shape