## Goals:

**1. Determine drivers of sentiment in the flavor text in Magic: The Gathering cards.**

**2. Develop a model to predict the sentiment of flavor text in Magic: The Gathering cards.**

In [1]:
# imports and display options

import pandas as pd
import numpy as np
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import seaborn as sns
import matplotlib.pyplot as plt

import prepare as p

pd.set_option('display.max_colwidth', -1)

# Acquire

1. A CSV, containing an up to date breakdown of each Magic card that has been printed so far, was obtained from MTGJSON.com. Each row represented a card or a version of a card.The dataframe contained 50,412 rows and 71 columns.

2. The CSV was read into a pandas dataframe

# Prepare

1. Restricted dataframe to only to columns I considered to be relevant. (colorIdentity, types, convertedManaCost, rarity, flavorText, isPaper)
 
2. Restricted dataframe to only rows containing cards that exist in physical form

3. Restricted dataframe to only row containing a flavor text

4. Restricted dataframe to only rows with a single color-identity

5. Merged rows with similar or overlapping types into one of the seven major game types

6. Restricted dataframe to include only rows with a single type belonging to one of the major game types

7. Cleaned up flavor text then aggregated on flavorText in an attempt to eliminate duplicates. This game me some success. However, it is likely that a few duplicates remain.

8. Reordered columns

9. Restricted dataframe to rows with English flavor text 

10. Dropped rows with duplicates I happened to spot

11. Added sentiment column showing compound sentiment score using VADER

12. Added intensity column showing the absolute value of the compound sentiment score 

In [2]:
# load and prepare data
#df = p.prepare_mgt(p.wrangle_mtg())

In [3]:
#df.to_csv('mtgprep.csv', index=False)

In [4]:
df = pd.read_csv('mtgprep.csv')

In [5]:
df=df.drop([7968,6562]) # drop found duplicates

In [6]:
df.head(5)

Unnamed: 0,colorIdentity,types,convertedManaCost,rarity,flavorText,sentiment,intensity
0,Green,Creature,5.0,common,""" . . . And the third little boar built his house out of rootwalla plates . . . .""",0.0,0.0
1,Black,Creature,1.0,common,""" . . . Cao Pi, Cao Rui, Fang, Mao, and briefly, Huan— The Sima took the empire in their turn. . . .""",0.0,0.0
2,Blue,Creature,5.0,uncommon,""" . . . When the trees bow down their heads, The wind is passing by.""",0.0,0.0
3,White,Creature,4.0,uncommon,""" . . . and you must also apply for an application license, file documents 136(iv) and 22-C and -D in triplicate, pay all requisite fees, request a . . .""",-0.1027,0.1027
4,Green,Creature,4.0,common,"""'Air superiority?' Not while our archers scan the skies.""",0.0,0.0


In [18]:
df['colorIdentity']=df.colorIdentity.astype('|S')
df['types']=df.types.astype(str)
df['rarity']=df.rarity.astype(str)
df['flavorText']=df.flavorText.astype(str)

# Explore

### What does the data look like?

In [19]:
df.shape

(12448, 7)

In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 12448 entries, 0 to 12449
Data columns (total 7 columns):
colorIdentity        12448 non-null object
types                12448 non-null object
convertedManaCost    12448 non-null float64
rarity               12448 non-null object
flavorText           12448 non-null object
sentiment            12448 non-null float64
intensity            12448 non-null float64
dtypes: float64(3), object(4)
memory usage: 778.0+ KB


In [None]:
df.describe()

In [None]:
df.sort_values('sentiment').head(10)

In [None]:
df.sort_values('sentiment',ascending=False).head(10)

### How balanced in the data?

In [None]:
for column in df.columns:
    
    print(f'{column} value counts')
    print(df[f'{column}'].value_counts())
    print('')

In [26]:
df.colorIdentity.plot()
    

TypeError: no numeric data to plot

In [None]:
df[df.sentiment==0].count()/df.sentiment.count()

# remove 7968 6562

In [None]:
colors = ['White','Blue','Black','Red','Green']

for color in colors:

    number = df[df.colorIdentity==f'{color}'].sentiment.mean()
      
    print(f'{color}: {number}')

In [None]:
colors = ['White','Blue','Black','Red','Green']

for color in colors:

    number = df[df.colorIdentity==f'{color}'].intensity.mean()
      
    print(f'{color}: {number}')

In [None]:
colors = ['White','Blue','Black','Red','Green']

for color in colors:

    number = df[df.colorIdentity==f'{color}'].intensity.median()
      
    print(f'{color}: {number}')

In [None]:
colors = ['White','Blue','Black','Red','Green']

for color in colors:

    number = df[df.colorIdentity==f'{color}'][df.sentiment!=0].sentiment.median()
      
    print(f'{color}: {number}')

In [None]:
df[df.sentiment==0].colorIdentity.value_counts()

In [None]:
rarity = ['common','uncommon','rare','mythic']

for grade in rarity:

    number = df[df.rarity==f'{grade}'].sentiment.mean()
      
    print(f'{grade}: {number}')

In [None]:
rarity = ['common','uncommon','rare','mythic']

for grade in rarity:

    number = df[df.rarity==f'{grade}'].intensity.median()
      
    print(f'{grade}: {number}')

In [None]:
types = ['Artifact','Creature','Enchantment','Land','Planeswalker','Instant','Sorcery']

for group in types:

    number = df[df.types==f'{group}'].sentiment.mean()
      
    print(f'{group}: {number}')

In [None]:
types = ['Artifact','Creature','Enchantment','Land','Planeswalker','Instant','Sorcery']

for group in types:

    number = df[df.types==f'{group}'].intensity.mean()
      
    print(f'{group}: {number}')

In [None]:
costs = [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0]

for cost in costs: 

    number = df[df.convertedManaCost==float(f'{cost}')].sentiment.mean()
    
    number2 = df[df.convertedManaCost==float(f'{cost}')].intensity.mean()
      
    print(f'{cost}: {number}  {number2}')

# Look at a frquency distribution of total cards into sentament and intensity buckets

# Examin frequency of positive and negative sentament 