<hr style="border-width: 3px;">

### Proyecto de Magic

## Jorge Arturo Carvajal Siller

El objectivo de este proyecto es:

1) Poder establecer una relacion de sinergia entre cualesquiera dos cartas del juego "Magic The Gathering" (MTG)

2) Recomendar cartas que podrian ser utiles dado un conjunto de cartas en particular (Usando la relacion previamente establecida como base)

Encontrar este tipo de sinergia puede llegar a ser complicado, considerando que se han sacado aproximadamente 16000 cartas a lo largo de su historia. Esto puede ser algo facil para alguien que comprende y lleva tiempo jugando, pero puede ser muy pesado para jugadores que quieren empezar o simplemente no tienen mucho conocimiento de las cartas que existen.

Ejemplos de cartas con buena sinergia:

<a> <img src="http://www.manaleak.com/mtguk/files/2014/07/bestfriendteam2.png" href="http://www.manaleak.com/" ></a>

<a> <img src="http://www.manaleak.com/mtguk/files/2013/09/Quillspike-Infinite.jpg" href="http://www.manaleak.com/"></a>

Si bien los ejemplos de arriba son algo intuitivos, ya que hay una ganancia inmediata, hay ciertas combinaciones que no son muy faciles de detectar, e incluso pueden tardar varios turnos en desarrollarse.

![](images/kessig-nyx.png)

Base de datos utilizada: 


In [1]:
import numpy as np 
import pandas as pd 
import pprint
import os
import sys
os.chdir('Data sets')
#Lectura de datos en bruto
raw = pd.read_json("Magic/AllSets-x.json")

In [2]:
import operator
import numpy

El primer paso de limpieza de datos fue quitar los sets de broma (Unhinged y Unglued), ya que estas cartas no son legales dentro del juego

<a href="http://chameleonsden.com" title="Chameleon's Den - Video Games, Anime Collectibles and more!"><img src="http://chameleonsden.com/products/items/magic_the_gathering_unglued_card_chaos_confetti/Chaos_Confetti_card_from_Unglued.jpg" alt="Chameleon's Den - Video Games, Anime Collectibles and more!"></a>

In [3]:
#raw = raw.drop(['pREL'],1)
raw = raw.drop(['UGL'],1)

raw = raw.drop(['UNH'],1)



In [97]:
def stopWord(word):
    stopWords = ["its","may","the","of","a","if","or","and","them",
                 "this","that","you","this","turn","those",
                 "be","at","your","to","creature","it","in","has",
                 "until","on","onto","when","where","an","as","for","with"]
    if word in stopWords:
        return True
    return False
def colorString(colorId):
    colorString = ""
    
    if(colorId&1 == 1):
        colorString+="W"
    if(colorId&2 == 2):
        colorString+="B"
    if(colorId&4 == 4):
        colorString+="G"
    if(colorId&8 == 8):
        colorString+="R"
    if(colorId&16 == 16):
        colorString+="U"
    if (colorId == 0):
        colorString="Colorless"
    return colorString
def formatWord(word):
    word = word.lower()
    word = word.replace('.','')
    word = word.replace(',','')
    word = word.replace('"','')
    word = word.replace('=','')
    word = word.replace("'s",'')
    word = word.replace(':','')
    word = word.replace(';','')
    word = word.replace('(','')
    word = word.replace(')','')
    word = word.replace('[','')
    word = word.replace(']','')
    return word
#raw.describe
sets = raw.keys()
cards = []
derps = {}

for set in sets:
    for card in raw[set].cards:
        derps[card['name']] = card

Una vez que insertamos todas las cartas en un solo diccionario, eliminamos cartas que no son legales.

In [5]:
del derps['Shichifukujin Dragon']
del derps['1996 World Champion']
del derps['Ass Whuppin\'']

Ademas, eliminaremos las tierras basicas, ya que no son relevantes al momento de verificar relacion entre cartas

In [100]:
deleteableEntries = []
#Eliminando tierras
for card in derps.values():
    if card.get('supertypes') is not None and "Basic" in card["supertypes"]:
        deleteableEntries.append(card['name'])
for name in deleteableEntries:
    del derps[name]

Ahora calcularemos los siguientes valores para cada una de las cartas en el juego:

-color
-power
-toughness

y asi sacar la cantidad de cartas de cada color, asi como el poder y resistencia promedio de cada color.

In [98]:
generalWords = {}
specificPower = {}
specificToughness = {}
specificWords = {}
specificCMC = {}

uniqueColors = {}
uniqueColors['Colorless'] = 0
for card in derps.values():
    uniqueColor = 0
    color = ""
    #check if card has a color
    if(card.get('colors') is None):
        #card is colorless
        ###TODO: Make a difference between colorless and devoid cards
        uniqueColors['Colorless'] +=1
        color = "Colorless"
    else:
        #Card has a color
        #Start building an identifier that represents the card's color uniquely
        for c in card['colors']:
            if(c == 'White'):
                uniqueColor+=1
            elif(c == 'Black'):
                 uniqueColor+=2
            elif(c == 'Green'):
                uniqueColor+=4
            elif(c == 'Red'):
                uniqueColor+=8
            elif(c == 'Blue'):
                uniqueColor+=16
        color = colorString(uniqueColor)
        #if(colorChido == 16):
        #    pprint.pprint(i['name'])
        if(uniqueColors.get(color) is None):
            uniqueColors[color] = 1
        else:
            uniqueColors[color]+=1
    #Check if card has effect text
    if(card.get('text') is not None):
        #Split the text into words
        text = card['text']
        text = text.replace("First strike","first-strike")
        text = text.replace("first strike","first-strike")
        text = text.replace("Double strike","double-strike")
        text = text.replace("double strike","double-strike")
        text = text.split()
        #Iterate over each word
        for word in text:
            #format word
            # formatWord(word) removes all character
            # that could alter the way the word is intepreted.
            # Example: '.', ',', '(',')', etc.
            word = formatWord(word)
            if(word == "his" or word == "her"):
                word = "their"
            # Check if word is a stop word
            # A stop word is a word that is repeated too often
            # in the language, and for the most part are unnecesary
            # to determine the value of an effect.
            # Example 'you', 'an', 'the', 'a', 'of', etc.
            if stopWord(word):
                continue
            # Now we add it to the list of general words (words shared by all cards)
            if(generalWords.get(word) is None):
                generalWords[word] = 1
            else:
                generalWords[word] += 1
            # and to the list of specific words (words shared by cards of the same color)
            if(specificWords.get(color) is None):
                specificWords[color] = {}
                
            if(specificWords[color].get(word) is None):
                specificWords[color][word] = 1
            else:
                specificWords[color][word] += 1
    #Check if card has Converted Mana Cost
    if(card.get('cmc') is not None):
        if (specificCMC.get(color) is None):
            specificCMC[color] = [int(card['cmc'])]
        else:
            specificCMC[color].append(int(card['cmc']))
    #Check if card has Power
    if(card.get('power') is not None):
        #check if power or toughness are not integers (Ej. */*+1)
        #All cards with power have toughness (creature cards)
        try:
            int(card['power'])
            int(card['toughness'])
        except ValueError:
            #If not, don't add it to the power and toughness array
            continue
        #else, add
        if(specificPower.get(color) is None):
            specificPower[color] = [int(card['power'])]
            specificToughness[color] = [int(card['toughness'])]
        else:
            specificPower[color].append(int(card['power']))
            specificToughness[color].append(int(card['toughness']))

# Estadisticas de cartas en MTG

## Clasificacion de colores:

-W , White : Color que se concentra en vida, control de campo, creaturas, tokens.

-B , Black : Color que se concentra en control individual, descartar, sacrificios

G , Green : Color que se concentra en mana, creaturas fuertes, contadores

R , Red   : Color que se concentra en daño, control de campo, 

U , Blue  : Color que se concentra en control con counters, creaturas dificiles de eliminar, jalar cartas

## Ademas, hay cartas que tienen dos o mas combinaciones de colores. Estos tipicamente combinan las caracteristicas principales de ambos colores.

BG, Black and Green (Golgari) : Cartas centradas en el cementerio, beneficiar creaturas con destruccion

BU, Black and Blue (Derp): Cartas centradas en control absoluto (counters, descartar, millear)

RU, Red and Blue (Derp2): Cartas centradas en spells, tanto daño como counters

In [101]:
pprint.pprint("Cantidad de cartas por color: ")
pprint.pprint(sum(color for color in uniqueColors.values()))
pprint.pprint(uniqueColors)
cardStats = {}
print()
pprint.pprint("10 palabras mas usadas en cada color: ")
#we iterate over all color sets
for words in specificWords:
    pprint.pprint(words)
    sortedWords = sorted(specificWords[words].items(),key=operator.itemgetter(1))
    #we print the 10 most popular words in this set
    for i in range(10):
        pprint.pprint(sortedWords[-1-i])
    print()
for color in coloresUnicos:
    cardStats[color] = {}
    cardStats[color]["# of cards: "] = coloresUnicos[color]
    sortedCMC = sorted(specificCMC[color])
    
    cardStats[color]["CMC"] = {"Mean CMC" : np.sum(specificCMC[color])/len(specificCMC[color]),
                               "Max CMC" : sortedCMC[-1],
                               "Min CMC" : sortedCMC[0],
                               "Median CMC" : sortedCMC[len(sortedCMC)//2] }
    sortedPower = sorted(specificPower[color])
    
    cardStats[color]["Power"] = {"Mean Power" : np.sum(specificPower[color])/len(specificPower[color]),
                                 "Max Power" : sortedPower[-1], 
                                 "Min Power" : sortedPower[0], 
                                 "Median Power" : sortedPower[len(sortedPower)//2] }
    sortedToughness = sorted(specificToughness[color])
    
    cardStats[color]["Toughness"] = {"Mean Toughness" : np.sum(specificToughness[color])/len(specificToughness[color]),
                                     "Max Toughness" : sortedToughness[-1],
                                     "Min Toughness" : sortedToughness[0],
                                     "Median Toughness" : sortedToughness[len(sortedToughness)//2] }
pprint.pprint(cardStats)

'Cantidad de cartas por color: '
16508
{'B': 2484,
 'BG': 99,
 'BGR': 29,
 'BGRU': 1,
 'BGU': 13,
 'BR': 153,
 'BRU': 31,
 'BU': 156,
 'Colorless': 2564,
 'G': 2493,
 'GR': 155,
 'GRU': 14,
 'GU': 98,
 'R': 2487,
 'RU': 99,
 'U': 2456,
 'W': 2504,
 'WB': 104,
 'WBG': 13,
 'WBGR': 1,
 'WBGRU': 22,
 'WBGU': 1,
 'WBR': 13,
 'WBRU': 1,
 'WBU': 29,
 'WG': 159,
 'WGR': 29,
 'WGRU': 1,
 'WGU': 30,
 'WR': 101,
 'WRU': 12,
 'WU': 156}

'10 palabras mas usadas en cada color: '
'W'
('target', 940)
('battlefield', 760)
('control', 728)
('creatures', 704)
('damage', 649)
('card', 567)
('end', 523)
('flying', 517)
('put', 488)
('with', 476)

'U'
('target', 1222)
('card', 1171)
('their', 772)
('battlefield', 652)
('flying', 642)
('player', 638)
('spell', 631)
('library', 613)
('control', 583)
('cards', 582)

'B'
('target', 1131)
('card', 1125)
('player', 863)
('battlefield', 747)
('life', 639)
('from', 639)
('graveyard', 581)
('end', 542)
('put', 537)
('creatures', 511)

'R'
('target', 1242)
('damage

Analizando el ejemplo particular de las palabras mas usadas en el color Black Blue:
  
('their', 121)

('card', 117)

('player', 91)

('target', 87)

('cards', 70)

('library', 58)

('hand', 51)

('graveyard', 47)

('from', 42)

('top', 40)


Esto concuerda con el arquetipo general de los decks de este color, en el que descartan cartas ('cards','hand', 'graveyard'), tiran cartas del tope del deck al cementerio ('cards','from','top','library','graveyard') y el control sobre las cartas del oponente ('target','their','player','cards')

In [9]:
#pprint.pprint(raw['10E']['cards'][0])
for i in range(len(raw['SOI']['cards'])):
    pprint.pprint("Name: " + raw['SOI']['cards'][i]['name'])
    if(raw['SOI']['cards'][i].get('text') is not None):
        pprint.pprint("Text: " + raw['SOI']['cards'][i]['text'])
    if(raw['SOI']['cards'][i].get('power') is not None):
        pprint.pprint("Power: " + raw['SOI']['cards'][i]['power'])
    if(raw['SOI']['cards'][i].get('toughness') is not None):
        pprint.pprint("Toughness: " + raw['SOI']['cards'][i]['toughness'])
    
    if(raw['SOI']['cards'][i].get('types') is not None):
        print('Types:')
        for x in raw['SOI']['cards'][i]['types']:
            
            print("   '" + x + "' ")
    if(raw['SOI']['cards'][i].get('subtypes') is not None):
        print('Subtypes:')
        for x in raw['SOI']['cards'][i]['subtypes']:
            
            print("   '" + x + "' ")
    print()
    #if (not raw['pWOS']['cards'][i]['text']== raw['pWOS']['cards'][i]['originalText']):
    #    pprint.pprint("Text: " + raw['pWOS']['cards'][i]['originalText'])
    #pprint.pprint(raw['10E']['cards'][i]['name'])


'Name: Always Watching'
'Text: Nontoken creatures you control get +1/+1 and have vigilance.'
Types:
   'Enchantment' 

'Name: Angel of Deliverance'
('Text: Flying\n'
 'Delirium â€” Whenever Angel of Deliverance deals damage, if there are four '
 'or more card types among cards in your graveyard, exile target creature an '
 'opponent controls.')
'Power: 6'
'Toughness: 6'
Types:
   'Creature' 
Subtypes:
   'Angel' 

'Name: Angelic Purge'
('Text: As an additional cost to cast Angelic Purge, sacrifice a permanent.\n'
 'Exile target artifact, creature, or enchantment.')
Types:
   'Sorcery' 

'Name: Apothecary Geist'
('Text: Flying\n'
 'When Apothecary Geist enters the battlefield, if you control another Spirit, '
 'you gain 3 life.')
'Power: 2'
'Toughness: 3'
Types:
   'Creature' 
Subtypes:
   'Spirit' 

'Name: Archangel Avacyn'
('Text: Flash\n'
 'Flying, vigilance\n'
 'When Archangel Avacyn enters the battlefield, creatures you control gain '
 'indestructible until end of turn.\n'
 'When a

In [10]:
for i in derps.values():
    if (i.get('supertypes') is not None):
        if ("Basic" in i["supertypes"]):
            pprint.pprint(i)

In [71]:
pprint.pprint(derps['Scepter of Empires'])

{'artist': 'John Avon',
 'cmc': 3,
 'flavor': '"With this scepter, smite your enemies."\nâ€”Scepter inscription',
 'foreignNames': [{'language': 'Chinese Simplified',
                   'multiverseid': 263093,
                   'name': 'å¸�å›½æ�ƒæ�–'},
                  {'language': 'Chinese Traditional',
                   'multiverseid': 263342,
                   'name': 'å¸�åœ‹æ¬Šæ�–'},
                  {'language': 'French',
                   'multiverseid': 263591,
                   'name': "Sceptre d'empires"},
                  {'language': 'German',
                   'multiverseid': 263840,
                   'name': 'Zepter der Kaiserreiche'},
                  {'language': 'Italian',
                   'multiverseid': 264089,
                   'name': 'Scettro degli Imperi'},
                  {'language': 'Japanese',
                   'multiverseid': 264338,
                   'name': 'å¸�å›½ã�®çŽ‹ç¬�'},
                  {'language': 'Portuguese (Brazil)',
         

## Ideas para una formula de distancia:

IDEAS TO IMPROVE ON THE DISTANCE FORMULA

Check sub-types
	each subtype against each-other 
	(Example: Human soldiers are more related to other human soldiers, 
	compared to creatures that are only humans or soldiers)

Check related effects (binary relation between cards, can go both ways)

	-'sacrifice target creature' : 'when this creature is sacrificed'/'when this creature dies'

	-'cards that can tap themselves' : 'cards that can untap other cards'

	-'discard X cards' : 'when this card is discarded, do Y'/card has 'madness'

	-'gain X' : 'whenever you gain X, do Y'

	-card has 'fabricate' : card is a 'token', card is an 'artifact'

	-card has 'first strike' : 'target creature gains +X/+X'

	-card depends on artifacts : card is an 'artifact'
		(example: draw 2 cards, then discard 1 unless you control an artifact)

	-'return target card to your hand' : 'Whenever X comes into play, do Y' 

	-card has 'Landfall' : cards that can return lands to your hand

	- 'X you control gain Y' : cards of type 'X'
		(example: Elfs you control gain +1/+1)

	- 'X has Y as long as you control Z' : cards of type 'Z'
		(example: Eldrazi Aggressor has haste as long as you control another colorless creature)

	- 'X can't do Y unless you control Z' : cards of type 'Z'
		(same as above)
	
	Something worth pointing out: cards can have slight variations in how they're worded, yet 
	they can still mean the same thing when it comes to checking properties. For example: 

	'sacrifice target creature' or 'sacrifice all creatures you control' or 'sacrifice target permanent'
	are all functionally the same if you are trying to find cards that sacrifice your creatures

Combinations that aren't so easy to process. These aren't so much of a binary relation,
but rather a relation of a card to a set of cards (in this case, the set of cards is the library/deck):

	-cards that have 'Dredge' or 'Scavenge' are good on 'mill' decks (decks that constantly dump cards from your library into your graveyard)

	-cards that give you extra lands, or generate extra mana, are good on ramp decks (decks that rely on increasing your mana curve early into the game)

	-cards that burn life (Even at a minus to yourself) are good on burn decks that	rely on wining fast

	-cards that make you discard (usually the worst thing in a TCG) can be good on black decks, with creatures that prefer
	being on the discard pile, or trigger their effects when discarded (creatures with 'madness', for example)

	-cards with unusual, typically useless power/thoughness spreads (Example: 0/7 creature for 3 mana) can be good on decks that can buff stats, or invert them

	-cards that gain life aren't very good, unless there are monsters or spells that benefit on gaining/having a lot of health ('Whenever you gain life, put a +1/+1 counter on this monster' is an example of this)

# DE AQUI EN ADELANTE NO SE HA AVANZADO EL REPORTE

In [11]:
# Aux variables
keeps = ['name', 'colors', 'types', 'cmc', 'power', 'toughness']

# Data fusion
mtg = []
for col in raw.columns.values:
    release = pd.DataFrame(raw[col]['cards'])
    release = release.loc[:, keeps]
    mtg.append(release)
mtg = pd.concat(mtg)

#Convertir 'power' y 'toughness' a valores numéricos
mtg[['power','toughness']] = mtg[['power','toughness']].apply(pd.to_numeric, errors='coerce')

#Eliminar datos con valores nulos
mtg = mtg.dropna()

#Convertir 'colors' y 'types' a su primer valor
f = lambda x: x[0]
mtg['colors'] = mtg['colors'].map(f)
mtg['types'] = mtg['types'].map(f)

#Tomar una muestra y describir los datos
mtgs = mtg.sample(10)
print(mtgs)
print(mtgs.describe())

                            name colors     types  cmc  power  toughness
256         Thousand-legged Kami  Green  Creature  8.0    6.0        6.0
48                Swamp Mosquito  Black  Creature  2.0    0.0        1.0
110     Jiwari, the Earth Aflame    Red  Creature  5.0    3.0        3.0
46             Aspiring Aeronaut   Blue  Creature  4.0    1.0        2.0
86            Fangren Pathcutter  Green  Creature  6.0    4.0        6.0
35     Erayo, Soratami Ascendant   Blue  Creature  2.0    1.0        1.0
13                         Guile   Blue  Creature  6.0    6.0        6.0
228  Rofellos, Llanowar Emissary  Green  Creature  2.0    2.0        1.0
166           Vorosh, the Hunter   Blue  Creature  6.0    6.0        6.0
99                Kragma Butcher    Red  Creature  3.0    2.0        3.0
           cmc      power  toughness
count  10.0000  10.000000   10.00000
mean    4.4000   3.100000    3.50000
std     2.1187   2.282786    2.27303
min     2.0000   0.000000    1.00000
25%     2.25

In [12]:
def gower(u, v, R):
    suma = 0
    if (u["colors"] != v["colors"]):
        suma += 1
    if (u["types"] != v["types"]):
        suma += 1
    #for t1 in u["subtypes"]:
    #    for t2 in v["subtypes"]:
    #        if (t1 == t2):
    #            suma += 1
    
    suma += np.abs(u["cmc"] - v["cmc"])/R['cmc']
    suma += np.abs(u["power"] - v["power"])/R['power']
    suma += np.abs(u["toughness"] - v["toughness"])/R['toughness']
    return suma

dms = np.zeros((mtgs['cmc'].count(), mtgs['cmc'].count()))
Rs = mtgs[['cmc', 'power','toughness']].max() - mtgs[['cmc', 'power','toughness']].min()
x = 0
for iu, u in mtgs.iterrows():
    y = 0
    for iv, v in mtgs.iterrows():
        dms[x][y] = gower(u, v, Rs)
        y += 1
    x += 1
print("\nMatriz redundante de distancias\n", dms)

#Convertir a matriz condensada
cdm = distance.squareform(dms)
print('\nMatriz condensada\n', cdm)

ZM = linkage(cdm, 'single')
print("\nEnlazamientos simples\n", ZM)

plt.figure(figsize=(12, 5))
dendrogram(ZM, labels = mtgs["name"].tolist(), leaf_rotation=75)
plt.show()


Matriz redundante de distancias
 [[ 0.          4.          2.6         3.3         0.66666667  3.83333333
   1.33333333  2.66666667  1.33333333  3.1       ]
 [ 4.          0.          2.4         1.7         3.33333333  1.16666667
   3.66666667  1.33333333  3.66666667  1.9       ]
 [ 2.6         2.4         0.          1.7         1.93333333  2.23333333
   2.26666667  2.06666667  2.26666667  0.5       ]
 [ 3.3         1.7         1.7         0.          2.63333333  0.53333333
   1.96666667  1.7         1.96666667  1.53333333]
 [ 0.66666667  3.33333333  1.93333333  2.63333333  0.          3.16666667
   1.33333333  2.          1.33333333  2.43333333]
 [ 3.83333333  1.16666667  2.23333333  0.53333333  3.16666667  0.          2.5
   1.16666667  2.5         1.73333333]
 [ 1.33333333  3.66666667  2.26666667  1.96666667  1.33333333  2.5         0.
   3.33333333  0.          2.76666667]
 [ 2.66666667  1.33333333  2.06666667  1.7         2.          1.16666667
   3.33333333  0.          3.333

NameError: name 'distance' is not defined

In [None]:
mtgl = mtg.sample(1000)
dml = np.zeros((mtgl['cmc'].count(), mtgl['cmc'].count()))
Rl = mtgl[['cmc', 'power','toughness']].max() - mtgl[['cmc', 'power','toughness']].min()
x = 0
for iu, u in mtgl.iterrows():
    y = 0
    for iv, v in mtgl.iterrows():
        dml[x][y] = gower(u, v, Rl)
        y += 1
    x += 1

cdml = distance.squareform(dml)
ZMl = linkage(cdml, 'single')

plt.figure(figsize=(12, 5))
dendrogram(ZMl, no_labels = True)
plt.show()

### 