<hr style="border-width: 3px;">

### Proyecto de Magic

## Jorge Arturo Carvajal Siller

El objectivo de este proyecto es:

1) Poder establecer una relacion de sinergia entre cualesquiera dos cartas del juego "Magic The Gathering" (MTG)

2) Recomendar cartas que podrian ser utiles dado un conjunto de cartas en particular (Usando la relacion previamente establecida como base)

Encontrar este tipo de sinergia puede llegar a ser complicado, considerando que se han sacado aproximadamente 16000 cartas a lo largo de su historia. Esto puede ser algo facil para alguien que comprende y lleva tiempo jugando, pero puede ser muy pesado para jugadores que quieren empezar o simplemente no tienen mucho conocimiento de las cartas que existen.

Ejemplos de cartas con buena sinergia:

<a> <img src="http://www.manaleak.com/mtguk/files/2014/07/bestfriendteam2.png" href="http://www.manaleak.com/" ></a>

<a> <img src="http://www.manaleak.com/mtguk/files/2013/09/Quillspike-Infinite.jpg" href="http://www.manaleak.com/"></a>

Si bien los ejemplos de arriba son algo intuitivos, ya que hay una ganancia inmediata, hay ciertas combinaciones que no son muy faciles de detectar, e incluso pueden tardar varios turnos en desarrollarse.

![](images/kessig-nyx.png)

Base de datos utilizada: 


In [1]:
import numpy as np 
import pandas as pd 
import pprint
import os
import sys
os.chdir('Data sets')
#Lectura de datos en bruto
raw = pd.read_json("Magic/AllSets-x.json")

In [2]:
import operator
import numpy

El primer paso de limpieza de datos fue quitar los sets de broma (Unhinged y Unglued), ya que estas cartas no son legales dentro del juego

<a href="http://chameleonsden.com" title="Chameleon's Den - Video Games, Anime Collectibles and more!"><img src="http://chameleonsden.com/products/items/magic_the_gathering_unglued_card_chaos_confetti/Chaos_Confetti_card_from_Unglued.jpg" alt="Chameleon's Den - Video Games, Anime Collectibles and more!"></a>

In [3]:
#raw = raw.drop(['pREL'],1)
raw = raw.drop(['UGL'],1)

raw = raw.drop(['UNH'],1)



In [4]:
def stopWord(word):
    stopWords = ["its","may","the","of","a","if","or","and","them",
                 "this","that","you","this","turn","those",
                 "be","at","your","to","creature","it","in","has",
                 "until","on","onto","when","where","an","as","for","with"]
    if word in stopWords:
        return True
    return False
def colorString(colorId):
    colorString = ""
    
    if(colorId&1 == 1):
        colorString+="W"
    if(colorId&2 == 2):
        colorString+="B"
    if(colorId&4 == 4):
        colorString+="G"
    if(colorId&8 == 8):
        colorString+="R"
    if(colorId&16 == 16):
        colorString+="U"
    if (colorId == 0):
        colorString="Colorless"
    return colorString
def formatWord(word):
    word = word.lower()
    word = word.replace('.','')
    word = word.replace(',','')
    word = word.replace('"','')
    word = word.replace('=','')
    word = word.replace("'s",'')
    word = word.replace(':','')
    word = word.replace(';','')
    word = word.replace('(','')
    word = word.replace(')','')
    word = word.replace('[','')
    word = word.replace(']','')
    return word
#raw.describe
sets = raw.keys()
cards = []
derps = {}

for set in sets:
    for card in raw[set].cards:
        derps[card['name']] = card

Una vez que insertamos todas las cartas en un solo diccionario, eliminamos cartas que no son legales.

In [5]:
del derps['Shichifukujin Dragon']
del derps['1996 World Champion']
del derps['Ass Whuppin\'']

Ademas, eliminaremos las tierras basicas, ya que no son relevantes al momento de verificar relacion entre cartas

In [6]:
deleteableEntries = []
#Eliminando tierras
for card in derps.values():
    if card.get('supertypes') is not None and "Basic" in card["supertypes"]:
        deleteableEntries.append(card['name'])
for name in deleteableEntries:
    del derps[name]

Ahora calcularemos los siguientes valores para cada una de las cartas en el juego:

-color
-power
-toughness

y asi sacar la cantidad de cartas de cada color, asi como el poder y resistencia promedio de cada color.

In [7]:
generalWords = {}
specificPower = {}
specificToughness = {}
specificWords = {}
specificCMC = {}
ngrams = {}
ngramN = 2
ngramV = 1
cardsByNgram = {}

uniqueColors = {}
uniqueColors['Colorless'] = 0
for card in derps.values():
    uniqueColor = 0
    color = ""
    #check if card has a color
    if(card.get('colors') is None):
        #card is colorless
        ###TODO: Make a difference between colorless and devoid cards
        uniqueColors['Colorless'] +=1
        color = "Colorless"
    else:
        #Card has a color
        #Start building an identifier that represents the card's color uniquely
        for c in card['colors']:
            if(c == 'White'):
                uniqueColor+=1
            elif(c == 'Black'):
                 uniqueColor+=2
            elif(c == 'Green'):
                uniqueColor+=4
            elif(c == 'Red'):
                uniqueColor+=8
            elif(c == 'Blue'):
                uniqueColor+=16
        color = colorString(uniqueColor)
        #if(colorChido == 16):
        #    pprint.pprint(i['name'])
        if(uniqueColors.get(color) is None):
            uniqueColors[color] = 1
        else:
            uniqueColors[color]+=1
    #Check if card has effect text
    if(card.get('text') is not None):
        #Split the text into words
        text = card['text']
        text = text.replace("First strike","first-strike")
        text = text.replace("first strike","first-strike")
        text = text.replace("Double strike","double-strike")
        text = text.replace("double strike","double-strike")
        text = text.replace(card['name'],"" )
        text = text.split()
        ngram = ""
        i = 1
        #Iterate over each word
        for word in text:
            #format word
            # formatWord(word) removes all character
            # that could alter the way the word is intepreted.
            # Example: '.', ',', '(',')', etc.
            word = formatWord(word)
            if(word == "his" or word == "her"):
                word = "their"
            # Check if word is a stop word
            # A stop word is a word that is repeated too often
            # in the language, and for the most part are unneccesary
            # to determine the value of an effect.
            # Example 'you', 'an', 'the', 'a', 'of', etc.
            if stopWord(word):
                continue
            # Now we add it to the list of general words (words shared by all cards)
            if(i%ngramN != 0):
                ngram += word
            else:
                ngram += " " 
                ngram += word
                if(ngrams.get(ngram) is None):
                    ngrams[ngram] = 1
                else:
                    ngrams[ngram]+=1
                if(cardsByNgram.get(ngram) is None):
                    cardsByNgram[ngram] = [card]
                else:
                    cardsByNgram[ngram].append(card)
                ngram = ""
            i=i+1
            if(generalWords.get(word) is None):
                generalWords[word] = 1
            else:
                generalWords[word] += 1
            # and to the list of specific words (words shared by cards of the same color)
            if(specificWords.get(color) is None):
                specificWords[color] = {}
                
            if(specificWords[color].get(word) is None):
                specificWords[color][word] = 1
            else:
                specificWords[color][word] += 1
    #Check if card has Converted Mana Cost
    if(card.get('cmc') is not None):
        if (specificCMC.get(color) is None):
            specificCMC[color] = [int(card['cmc'])]
        else:
            specificCMC[color].append(int(card['cmc']))
    #Check if card has Power
    if(card.get('power') is not None):
        #check if power or toughness are not integers (Ej. */*+1)
        #All cards with power have toughness (creature cards)
        try:
            int(card['power'])
            int(card['toughness'])
        except ValueError:
            #If not, don't add it to the power and toughness array
            continue
        #else, add
        if(specificPower.get(color) is None):
            specificPower[color] = [int(card['power'])]
            specificToughness[color] = [int(card['toughness'])]
        else:
            specificPower[color].append(int(card['power']))
            specificToughness[color].append(int(card['toughness']))

# Estadisticas de cartas en MTG

## Clasificacion de colores:

-W , White : Color que se concentra en vida, control de campo, creaturas, tokens.

-B , Black : Color que se concentra en control individual, descartar, sacrificios

G , Green : Color que se concentra en mana, creaturas fuertes, contadores

R , Red   : Color que se concentra en daño, control de campo, 

U , Blue  : Color que se concentra en control con counters, creaturas dificiles de eliminar, jalar cartas

## Ademas, hay cartas que tienen dos o mas combinaciones de colores. Estos tipicamente combinan las caracteristicas principales de ambos colores.

BG, Black and Green (Golgari) : Cartas centradas en el cementerio, beneficiar creaturas con destruccion

BU, Black and Blue (Derp): Cartas centradas en control absoluto (counters, descartar, millear)

RU, Red and Blue (Derp2): Cartas centradas en spells, tanto daño como counters

In [8]:
import copy

In [9]:
percentageWords = []
pprint.pprint("Cantidad de cartas por color: ")
pprint.pprint(sum(color for color in uniqueColors.values()))
pprint.pprint(uniqueColors)
cardStats = {}
print()
print()


sortedNgrams = sorted(ngrams.items(),key=operator.itemgetter(1))

pprint.pprint("N-gramas de tamaño 2 encontrados: " + str(len(sortedNgrams)))
pprint.pprint("Numero total de n-gramas: " + str(np.sum(list(ngrams.values()))))
averageNgramSize = np.sum(list(ngrams.values()))/len(ngrams)//1
pprint.pprint("ocurrencia promedio para un n-grama: " + str( averageNgramSize ) )
aux = copy.copy(ngrams)
for ngram in aux:
    if(ngrams[ngram] < averageNgramSize):
        ngrams.pop(ngram,None)
sortedNgrams = sorted(ngrams.items(),key=operator.itemgetter(1))
pprint.pprint("N-gramas  con mas de " + str(averageNgramSize) + " ocurrencias de tamaño 2 encontrados: " + str(len(sortedNgrams)))

pprint.pprint(" 100 N-gramas mas usuales: ")
for i in range(100):
    pprint.pprint(sortedNgrams[-1-i])
    
print()
pprint.pprint("10 palabras mas usadas en cada color: ")
print()

#we iterate over all color sets
for words in specificWords:
    break
    pprint.pprint("Color " + words)
    print()
    sortedWords = sorted(specificWords[words].items(),key=operator.itemgetter(1))
    for word,frecuency in sortedWords:
        percentageWords.append([word,str(int(frecuency/uniqueColors[words]*100))+"%"])
    #we print the 10 most popular words in this set
    pprint.pprint("Por cantidad: ")
    print()
    for i in range(10):
        pprint.pprint(sortedWords[-1-i])
    print()
    pprint.pprint("Por porcentaje: ")
    print()
    for i in range(10):
        pprint.pprint(percentageWords[-1-i])
    print()
for color in uniqueColors:
    break
    cardStats[color] = {}
    cardStats[color]["# of cards: "] = uniqueColors[color]
    sortedCMC = sorted(specificCMC[color])
    
    cardStats[color]["CMC"] = {"Mean CMC" : np.sum(specificCMC[color])/len(specificCMC[color]),
                               "Max CMC" : sortedCMC[-1],
                               "Min CMC" : sortedCMC[0],
                               "Median CMC" : sortedCMC[len(sortedCMC)//2] }
    sortedPower = sorted(specificPower[color])
    
    cardStats[color]["Power"] = {"Mean Power" : np.sum(specificPower[color])/len(specificPower[color]),
                                 "Max Power" : sortedPower[-1], 
                                 "Min Power" : sortedPower[0], 
                                 "Median Power" : sortedPower[len(sortedPower)//2] }
    sortedToughness = sorted(specificToughness[color])
    
    cardStats[color]["Toughness"] = {"Mean Toughness" : np.sum(specificToughness[color])/len(specificToughness[color]),
                                     "Max Toughness" : sortedToughness[-1],
                                     "Min Toughness" : sortedToughness[0],
                                     "Median Toughness" : sortedToughness[len(sortedToughness)//2] }
pprint.pprint(cardStats)

'Cantidad de cartas por color: '
16494
{'B': 2484,
 'BG': 99,
 'BGR': 29,
 'BGRU': 1,
 'BGU': 13,
 'BR': 153,
 'BRU': 31,
 'BU': 156,
 'Colorless': 2553,
 'G': 2493,
 'GR': 155,
 'GRU': 14,
 'GU': 98,
 'R': 2486,
 'RU': 99,
 'U': 2456,
 'W': 2504,
 'WB': 103,
 'WBG': 13,
 'WBGR': 1,
 'WBGRU': 21,
 'WBGU': 1,
 'WBR': 13,
 'WBRU': 1,
 'WBU': 29,
 'WG': 159,
 'WGR': 29,
 'WGRU': 1,
 'WGU': 30,
 'WR': 101,
 'WRU': 12,
 'WU': 156}


'N-gramas de tamaño 2 encontrados: 16579'
'Numero total de n-gramas: 113231'
'ocurrencia promedio para un n-grama: 6.0'
'N-gramas  con mas de 6.0 ocurrencias de tamaño 2 encontrados: 3191'
' 100 N-gramas mas usuales: '
('enters battlefield', 1799)
('target player', 696)
('{t} add', 540)
('beginning upkeep', 524)
('draw card', 520)
('damage target', 517)
('their their', 503)
('card from', 488)
('destroy target', 486)
('mana pool', 483)
('creatures control', 435)
('from graveyard', 427)
('enchant enchanted', 399)
('mana cost', 393)
('+1/+1 counter', 392)
('combat 

In [10]:
pprint.pprint(cardsByNgram["from graveyard"])

[{'artist': 'Eric Peterson',
  'cmc': 3,
  'colorIdentity': ['W'],
  'colors': ['White'],
  'flavor': 'Nomads weave tales thicker than tapestries.',
  'foreignNames': [{'language': 'French',
                    'multiverseid': 169221,
                    'name': 'Mythifieur nomade'},
                   {'language': 'German',
                    'multiverseid': 169078,
                    'name': 'SagenerzÃ¤hler der Nomaden'},
                   {'language': 'Italian',
                    'multiverseid': 169364,
                    'name': 'Cantastorie Nomade'},
                   {'language': 'Portuguese (Brazil)',
                    'multiverseid': 169507,
                    'name': 'NÃ´made Criador de Mitos'},
                   {'language': 'Spanish',
                    'multiverseid': 170626,
                    'name': 'Creamitos nÃ³mada'}],
  'id': '49bbbbbad9a152478f108d291fea44599fe32df8',
  'imageName': 'nomad mythmaker',
  'layout': 'normal',
  'legalities': [{'format': 'C

Analizando el ejemplo particular de las palabras mas usadas en el color Black Blue:
  

['their', '77%']

['card', '75%']

['player', '58%']

['target', '55%']

['cards', '44%']

['library', '37%']

['hand', '32%']

['graveyard', '30%']

['from', '26%']

['top', '25%']

Esto es muy interesante ya que podemos observar un 77% y 75% de uso de las palabras "their" y "card", porcentaje mucho mas alto de lo normal si comparamos con los otros colores, donde sus palabras mas usadas oscilan entre 40% y 50%, lo cual nos podria dar la idea de que este color se concentra en las cartas del oponente. Concuerda con el arquetipo general de los decks de este color, en el que descartan cartas ('cards','hand', 'graveyard'), tiran cartas del tope del deck al cementerio ('cards','from','top','library','graveyard') y el control sobre las cartas del oponente ('target','their','player','cards')

## Clasificando puntos (de manera perezosa)

Aqui es donde intento usar Naive-Bayes para poder clasificar las cartas, la cual en este caso especifico se concentra en ver si la carta es buena en decks de cementerio ('graveyard') o no. Para poder pasarle estas cartas a Naive-Bayed ocupo que ya estan clasificados, y para "resolver" esto calculo los n-gramas para toda carta. Una vez hecho esto:

-Se le da una clasificacion buena si la carta contiene el n-grama ("from graveyard")

-Se le da una clasificacion si la carta contiene el n-grama ("destroy target")

Esto es bastante inexacto ya que no todos las cartas con dichos n-gramas son buenos o malos necesariamente. Ademas, puede haber cartas que tengan ambos n-gramas, lo cual va a arruinar los resultados. Sin embargo lo estoy usando como una primera aproximacion muy floja.

In [11]:
cardList = {}
def addCard(generalWords,card):
    maxWords = 2160
    characteristicVector = [0 for i in range(maxWords)]
    text = card['text']
    text = text.replace("First strike","first-strike")
    text = text.replace("first strike","first-strike")
    text = text.replace("Double strike","double-strike")
    text = text.replace("double strike","double-strike")
    text = text.replace(card['name'],"" )
    text = text.split()
    words = []
    #Iterate over each word
    for word in text:
        #format word
        # formatWord(word) removes all character
        # that could alter the way the word is intepreted.
        # Example: '.', ',', '(',')', etc.
        word = formatWord(word)
        if(word == "his" or word == "her"):
            word = "their"
        # Check if word is a stop word
        # A stop word is a word that is repeated too often
        # in the language, and for the most part are unnecesary
        # to determine the value of an effect.
        # Example 'you', 'an', 'the', 'a', 'of', etc.
        if stopWord(word):
            continue
        words.append(word)
    for i in range(maxWords):
        if(generalWords[-1-i][0] in words):
            characteristicVector[i] = 1
    return characteristicVector
sortedGeneralWords = sorted(generalWords.items(),key=operator.itemgetter(1))

In [12]:
from sklearn.naive_bayes import BernoulliNB, MultinomialNB
from nltk.corpus import stopwords 
from bs4 import BeautifulSoup

In [39]:
#conjunto de cartas buenas, para probar naive-bayes
goodCards = []
goodCards.append(derps["Ghoultree"])
goodCards.append(derps["Jarad, Golgari Lich Lord"])
goodCards.append(derps["Nyx Weaver"])
goodCards.append(derps["Kessig Cagebreakers"])
goodCards.append(derps["Praetor's Counsel"])
goodCards.append(derps["Dreg Mangler"])
goodCards.append(derps["Undergrowth Scavenger"])
goodCards.append(derps["Sultai Scavenger"])
goodCards.append(derps["Gurmag Angler"])
goodCards.append(derps["Reassembling Skeleton"])
goodCards.append(derps["Sibsig Muckdraggers"])
goodCards.append(derps["Slitherhead"])
goodCards.append(derps["Varolz, the Scar-Striped"])
goodCards.append(derps["Grim Discovery"])
goodCards.append(derps["Spider Spawning"])
goodCards.append(derps["Lotleth Troll"])
goodCards.append(derps["Tasigur, the Golden Fang"])
goodCards.append(derps["Gravecrawler"])
goodCards.append(derps["Jarad's Orders"])
goodCards.append(derps["Grisly Salvage"])

goodCards.append(derps['Mindwrack Demon'])
goodCards.append(derps['Call to the Netherworld'])
goodCards.append(derps['Goryo\'s Vengeance'])
goodCards.append(derps['Tasigur, the Golden Fang'])
goodCards.append(derps['Mulch'])
goodCards.append(derps["Drown in Filth"])
badCards = []
#conjunto de cartas malas, para probar naive-bayes
badCards.append(derps['Grave Titan'])
badCards.append(derps['Gyre Sage'])
badCards.append(derps['Primeval Titan'])
badCards.append(derps['Satyr Wayfinder'])
badCards.append(derps['Scorned Villager'])
badCards.append(derps['Sylvan Primordial'])
badCards.append(derps['Grave Titan'])
badCards.append(derps['Abrupt Decay'])
badCards.append(derps['Golgari Keyrune'])
badCards.append(derps['Putrefy'])
badCards.append(derps['Golgari Signet'])
badCards.append(derps['Elvish Mystic'])
badCards.append(derps['Korozda Guildmage'])
badCards.append(derps['Chromatic Lantern'])
badCards.append(derps['Rakdos Cackler'])
badCards.append(derps['Olivia, Mobilized for War'])
badCards.append(derps['Kolaghan, the Storm\'s Fury'])
badCards.append(derps['Shimian Specter'])
badCards.append(derps['Incorrigible Youths'])
                       
trainingSet = []
sentimentSet = []
checkingSet = []
checkingSentimentSet = []
for card in goodCards:
    checkingSet.append(addCard(sortedGeneralWords,card))
    checkingSentimentSet.append(1)
for card in badCards:
    checkingSet.append(addCard(sortedGeneralWords,card))
    checkingSentimentSet.append(0)
    
#cartas 'buenas' usadas por el conjunto de entrenamiento
for card in cardsByNgram["from graveyard"]:
    trainingSet.append(addCard(sortedGeneralWords,card))
    sentimentSet.append(1)
for card in cardsByNgram["all graveyards"]:
    trainingSet.append(addCard(sortedGeneralWords,card))
    sentimentSet.append(1)
    
for card in cardsByNgram["into graveyard"]:
    trainingSet.append(addCard(sortedGeneralWords,card))
    sentimentSet.append(1)
#cartas 'malas' usadas por el conjunto de entrenamiento
#a = 0
for card in cardsByNgram["target player"]:
    trainingSet.append(addCard(sortedGeneralWords,card))
    sentimentSet.append(0)
    
for card in cardsByNgram["destroy target"]:
    trainingSet.append(addCard(sortedGeneralWords,card))
    sentimentSet.append(0)

for card in cardsByNgram["beginning upkeep"]:
    trainingSet.append(addCard(sortedGeneralWords,card))
    sentimentSet.append(0)

In [40]:
clfB = BernoulliNB()
clfB.fit(trainingSet, sentimentSet)

predictions_trainB = clfB.predict(checkingSet)
fails_trainB = np.sum(checkingSentimentSet != predictions_trainB)
print("Puntos mal clasificados en el conjunto de entrenamiento: {} de {} ({}%)\n"
      .format(fails_trainB, len(checkingSet), 100*fails_trainB/len(checkingSet)))
debug = True
if(debug):
    for i in range(len(goodCards)):
        if(predictions_trainB[i] != checkingSentimentSet[i]):
            pprint.pprint(goodCards[i]["name"])
            pprint.pprint(goodCards[i]["text"])
    for i in range(len(badCards)):
        if(predictions_trainB[i+len(goodCards)] != checkingSentimentSet[i+len(goodCards)]):
            pprint.pprint(badCards[i]["name"])
            pprint.pprint(badCards[i]["text"])

Puntos mal clasificados en el conjunto de entrenamiento: 7 de 45 (15.555555555555555%)

'Undergrowth Scavenger'
('Undergrowth Scavenger enters the battlefield with a number of +1/+1 counters '
 'on it equal to the number of creature cards in all graveyards.')
'Lotleth Troll'
('Trample\n'
 'Discard a creature card: Put a +1/+1 counter on Lotleth Troll.\n'
 '{B}: Regenerate Lotleth Troll.')
'Primeval Titan'
('Trample\n'
 'Whenever Primeval Titan enters the battlefield or attacks, you may search '
 'your library for up to two land cards, put them onto the battlefield tapped, '
 'then shuffle your library.')
'Satyr Wayfinder'
('When Satyr Wayfinder enters the battlefield, reveal the top four cards of '
 'your library. You may put a land card from among them into your hand. Put '
 'the rest into your graveyard.')
'Olivia, Mobilized for War'
('Flying\n'
 'Whenever another creature enters the battlefield under your control, you may '
 'discard a card. If you do, put a +1/+1 counter on that cr

No esta tan mal! aunque vale mencionar que se equivoca mas en las cartas 'malas' que las cartas 'buenas'. Esto tiene sentido, ya que la mayoria de las cartas buenas tienen el n-grama 'from graveyard', pero casi ninguna carta 'mala' tiene el texto 'from grayveard'

In [None]:
#pprint.pprint(raw['10E']['cards'][0])
for i in range(len(raw['SOI']['cards'])):
    pprint.pprint("Name: " + raw['SOI']['cards'][i]['name'])
    if(raw['SOI']['cards'][i].get('text') is not None):
        pprint.pprint("Text: " + raw['SOI']['cards'][i]['text'])
    if(raw['SOI']['cards'][i].get('power') is not None):
        pprint.pprint("Power: " + raw['SOI']['cards'][i]['power'])
    if(raw['SOI']['cards'][i].get('toughness') is not None):
        pprint.pprint("Toughness: " + raw['SOI']['cards'][i]['toughness'])
    
    if(raw['SOI']['cards'][i].get('types') is not None):
        print('Types:')
        for x in raw['SOI']['cards'][i]['types']:
            
            print("   '" + x + "' ")
    if(raw['SOI']['cards'][i].get('subtypes') is not None):
        print('Subtypes:')
        for x in raw['SOI']['cards'][i]['subtypes']:
            
            print("   '" + x + "' ")
    print()
    #if (not raw['pWOS']['cards'][i]['text']== raw['pWOS']['cards'][i]['originalText']):
    #    pprint.pprint("Text: " + raw['pWOS']['cards'][i]['originalText'])
    #pprint.pprint(raw['10E']['cards'][i]['name'])


In [None]:
#for i in derps.values():
#    if (i.get('supertypes') is not None):
#        if ("Basic" in i["supertypes"]):
#            pprint.pprint(i)

goodCards = []
badCards = []

goodCards.append(derps['Bloodghast'])
goodCards.append(derps['Cryptbreaker'])
goodCards.append(derps['Grim Flayer'])
goodCards.append(derps['Tarmogoyf'])
goodCards.append(derps['Vengevine'])
goodCards.append(derps['Golgari Thug'])
goodCards.append(derps['Necrotic Ooze'])
goodCards.append(derps['Tombstalker'])
goodCards.append(derps['Stinkweed Imp'])
goodCards.append(derps['Life from the Loam'])
goodCards.append(derps['Rise from the Grave'])
goodCards.append(derps['Worm Harvest'])
goodCards.append(derps['Mulch'])
goodCards.append(derps['Mindwrack Demon'])
goodCards.append(derps['Call to the Netherworld'])
goodCards.append(derps['Goryo\'s Vengeance])
goodCards.append(derps['Tasigur, the Golden Fang'])
goodCards.append(derps['Mulch'])

badCards.append(derps['Grave Titan'])
badCards.append(derps['Gyre Sage'])
badCards.append(derps['Primeval Titan'])
badCards.append(derps['Satyr Wayfinder'])
badCards.append(derps['Scorned Villager'])
badCards.append(derps['Sylvan Primordial'])
badCards.append(derps['Grave Titan'])
badCards.append(derps['Abrupt Decay'])
badCards.append(derps['Golgari Keyrune'])
badCards.append(derps['Putrefy'])
badCards.append(derps['Golgari Signet'])
badCards.append(derps['Elvish Mystic'])
badCards.append(derps['Korozda Guildmage'])
badCards.append(derps['Chromatic Lantern'])
badCards.append(derps['Rakdos Cackler'])
badCards.append(derps['Olivia, Mobilized for War'])
badCards.append(derps['Kolaghan, the Storm\'s Fury'])
badCards.append(derps['Shimian Specter'])
badCards.append(derps['Incorrigible Youths'])
trainingSet2 = []
sentimentSet2 = []
for card in goodCards:
    trainingSet2.append(addCard(sortedGeneralWords,card))
    sentimentSet2.append(1)
for card in badCards:
    trainingSet2.append(addCard(sortedGeneralWords,card))
    sentimentSet2.append(0)


In [None]:
predictions_trainB = clfB.predict(trainingSet2)
fails_trainB = np.sum(sentimentSet2  != predictions_trainB)
print("Puntos mal clasificados en el conjunto de entrenamiento: {} de {} ({}%)\n"
      .format(fails_trainB, len(trainingSet2), 100*fails_trainB/len(trainingSet2)))

## Ideas para una formula de distancia:

IDEAS TO IMPROVE ON THE DISTANCE FORMULA

Check sub-types
	each subtype against each-other 
	(Example: Human soldiers are more related to other human soldiers, 
	compared to creatures that are only humans or soldiers)

Check related effects (binary relation between cards, can go both ways)

	-'sacrifice target creature' : 'when this creature is sacrificed'/'when this creature dies'

	-'cards that can tap themselves' : 'cards that can untap other cards'

	-'discard X cards' : 'when this card is discarded, do Y'/card has 'madness'

	-'gain X' : 'whenever you gain X, do Y'

	-card has 'fabricate' : card is a 'token', card is an 'artifact'

	-card has 'first strike' : 'target creature gains +X/+X'

	-card depends on artifacts : card is an 'artifact'
		(example: draw 2 cards, then discard 1 unless you control an artifact)

	-'return target card to your hand' : 'Whenever X comes into play, do Y' 

	-card has 'Landfall' : cards that can return lands to your hand

	- 'X you control gain Y' : cards of type 'X'
		(example: Elfs you control gain +1/+1)

	- 'X has Y as long as you control Z' : cards of type 'Z'
		(example: Eldrazi Aggressor has haste as long as you control another colorless creature)

	- 'X can't do Y unless you control Z' : cards of type 'Z'
		(same as above)
	
	Something worth pointing out: cards can have slight variations in how they're worded, yet 
	they can still mean the same thing when it comes to checking properties. For example: 

	'sacrifice target creature' or 'sacrifice all creatures you control' or 'sacrifice target permanent'
	are all functionally the same if you are trying to find cards that sacrifice your creatures

Combinations that aren't so easy to process. These aren't so much of a binary relation,
but rather a relation of a card to a set of cards (in this case, the set of cards is the library/deck):

	-cards that have 'Dredge' or 'Scavenge' are good on 'mill' decks (decks that constantly dump cards from your library into your graveyard)

	-cards that give you extra lands, or generate extra mana, are good on ramp decks (decks that rely on increasing your mana curve early into the game)

	-cards that burn life (Even at a minus to yourself) are good on burn decks that	rely on wining fast

	-cards that make you discard (usually the worst thing in a TCG) can be good on black decks, with creatures that prefer
	being on the discard pile, or trigger their effects when discarded (creatures with 'madness', for example)

	-cards with unusual, typically useless power/thoughness spreads (Example: 0/7 creature for 3 mana) can be good on decks that can buff stats, or invert them

	-cards that gain life aren't very good, unless there are monsters or spells that benefit on gaining/having a lot of health ('Whenever you gain life, put a +1/+1 counter on this monster' is an example of this)

# DE AQUI EN ADELANTE NO SE HA AVANZADO EL REPORTE

In [None]:
# Aux variables
keeps = ['name', 'colors', 'types', 'cmc', 'power', 'toughness']

# Data fusion
mtg = []
for col in raw.columns.values:
    release = pd.DataFrame(raw[col]['cards'])
    release = release.loc[:, keeps]
    mtg.append(release)
mtg = pd.concat(mtg)

#Convertir 'power' y 'toughness' a valores numéricos
mtg[['power','toughness']] = mtg[['power','toughness']].apply(pd.to_numeric, errors='coerce')

#Eliminar datos con valores nulos
mtg = mtg.dropna()

#Convertir 'colors' y 'types' a su primer valor
f = lambda x: x[0]
mtg['colors'] = mtg['colors'].map(f)
mtg['types'] = mtg['types'].map(f)

#Tomar una muestra y describir los datos
mtgs = mtg.sample(10)
print(mtgs)
print(mtgs.describe())

In [None]:
def gower(u, v, R):
    suma = 0
    if (u["colors"] != v["colors"]):
        suma += 1
    if (u["types"] != v["types"]):
        suma += 1
    #for t1 in u["subtypes"]:
    #    for t2 in v["subtypes"]:
    #        if (t1 == t2):
    #            suma += 1
    
    suma += np.abs(u["cmc"] - v["cmc"])/R['cmc']
    suma += np.abs(u["power"] - v["power"])/R['power']
    suma += np.abs(u["toughness"] - v["toughness"])/R['toughness']
    return suma

dms = np.zeros((mtgs['cmc'].count(), mtgs['cmc'].count()))
Rs = mtgs[['cmc', 'power','toughness']].max() - mtgs[['cmc', 'power','toughness']].min()
x = 0
for iu, u in mtgs.iterrows():
    y = 0
    for iv, v in mtgs.iterrows():
        dms[x][y] = gower(u, v, Rs)
        y += 1
    x += 1
print("\nMatriz redundante de distancias\n", dms)

#Convertir a matriz condensada
cdm = distance.squareform(dms)
print('\nMatriz condensada\n', cdm)

ZM = linkage(cdm, 'single')
print("\nEnlazamientos simples\n", ZM)

plt.figure(figsize=(12, 5))
dendrogram(ZM, labels = mtgs["name"].tolist(), leaf_rotation=75)
plt.show()

In [None]:
mtgl = mtg.sample(1000)
dml = np.zeros((mtgl['cmc'].count(), mtgl['cmc'].count()))
Rl = mtgl[['cmc', 'power','toughness']].max() - mtgl[['cmc', 'power','toughness']].min()
x = 0
for iu, u in mtgl.iterrows():
    y = 0
    for iv, v in mtgl.iterrows():
        dml[x][y] = gower(u, v, Rl)
        y += 1
    x += 1

cdml = distance.squareform(dml)
ZMl = linkage(cdml, 'single')

plt.figure(figsize=(12, 5))
dendrogram(ZMl, no_labels = True)
plt.show()

### 