## Generador de Textos MTG

Este proyecto utiliza las redes neuronales recurrentes (RNN) para construir un generador de cartas de Magic The Gathering (MTG), el cual intentara crear cartas que mas se parezcan a las cartas verdaderas de este juego.

Primero empezamos importando todas las librerias que ocupemos, y cargando el dataset de cartas de MTG

In [1]:
import numpy as np 
import pandas as pd 
import pprint
import os
import sys
os.chdir('datasets')
#Lectura de datos en bruto
raw = pd.read_json("MTG/cards.json")
import operator
import numpy
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout, TimeDistributed, Activation

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

Using TensorFlow backend.


Dado que el dataset contiene toda la informacion de las cartas de MTG, esto lo hace bastante pesado. Para facilitar las cosas, extraeremos solo el texto de cada una de las cartas y las insertaremos en un objeto aparte.

In [2]:
#raw.describe

sets = raw.keys()
cards = []
card_texts = {}


for set in sets:
    for card in raw[set].cards:
        if(card.get('text') is not None):
            card_texts[card['name']] = card['text']
sets = None
raw = None

### Variables de la red neuronal



In [3]:

HIDDEN_DIM = 100 # neuronas por capa
LSTM_LAYERS = 2 # numero de capas
DROPOUT_RATIO = 0.3 # dropout de la primera capa

### Forma de la entrada

En el tutorial del cual me base, la entrada se separaba letra por letra. Sin embargo, tome la decision de hacer que cada entrada fuera una palabra completa. Si bien esto le quita un poco de lo interesante, ya que no se va a estar inventando palabras nuevas, hace que las cartas generadas mas rapidamente se parezcan a las cartas de verdad.

Adicionalmente, hice que la entrada se detuviera una vez que llegara a las 100,000 palabras. En realidad habia un poco mas de 330,000 palabras en todo el dataset de MTG, pero intentar usarlo todo me daba un MemoryError (No tenia suficiente memoria como para alojar tal tamaño de entrada)

Ya que aun no se como hacer que cada vector de los datos de entrenamiento sean de diferente tamaño, utilize una palabra auxiliar, la cual en el codigo se puede ver como "NOWORD". Esto es para llenar un vector si aun le faltan palabras, y a fin a cabo no es tomada en cuenta al momento de generar palabras.





In [4]:
card_texts_list = list(card_texts.values())
x_values = [];
y_values = [];
index = 0;
individual_words = {}
words_set = [] #array that contains each word only once ( a set )
word_count = 0
num_cards = 0
maxwords = 0
while word_count < 100000:
    
    card_text = card_texts_list[num_cards]
    num_cards+=1
    words = card_text.split()
    for i in range(len(words)):
        if len(words[i]) > maxwords:
            maxwords = len(words[i])
        word_count+=1
        #add to all words array
        #add to word set if it doesn't exist already
        if not words[i] in words_set:
            words_set.append(words[i])
words_set.append('NOWORD')
VOCAB_SIZE = len(words_set)
import collections

print(maxwords)
ix_to_char = {ix:char for ix, char in enumerate(words_set)}
char_to_ix = {char:ix for ix, char in enumerate(words_set)}

X = np.zeros((num_cards, maxwords, VOCAB_SIZE))
y = np.zeros((num_cards, maxwords,VOCAB_SIZE))
for i in range(num_cards):
    X_sequence = (card_texts_list[i].split())[0:-1]
    if len(X_sequence) < maxwords:
           for aux in range(len(X_sequence),maxwords):
                X_sequence.append('NOWORD')
        
           
    X_sequence_ix = [char_to_ix[value] for value in X_sequence]
    input_sequence = np.zeros((maxwords, VOCAB_SIZE))
    for j in range(maxwords):
        input_sequence[j][X_sequence_ix[j]] = 1.
    X[i] = input_sequence
    y_sequence = (card_texts_list[i].split())[1:]
    while(len(y_sequence) > maxwords):
        del y_sequence[-1]
    if len(y_sequence) < maxwords:
        for aux in range(len(y_sequence),maxwords):
            y_sequence.append('NOWORD')
    y_sequence_ix = [char_to_ix[value] for value in y_sequence]
    target_sequence = np.zeros((maxwords, VOCAB_SIZE))
    for j in range(maxwords):
        target_sequence[j][y_sequence_ix[j]] = 1.
    y[i] = target_sequence
card_texts_list = []

31


In [5]:

model = Sequential()
model.add(LSTM(HIDDEN_DIM, input_shape=(None, VOCAB_SIZE), return_sequences=True))
for i in range(LSTM_LAYERS - 1):
    model.add(LSTM(HIDDEN_DIM, return_sequences=True))
model.add(TimeDistributed(Dense(VOCAB_SIZE)))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

In [7]:
nb_epoch = 0
BATCH_SIZE=5
GENERATE_LENGTH = 25
def generate_text(model, length):
    ix = [np.random.randint(VOCAB_SIZE)]
    y_char = [ix_to_char[ix[-1]]]
    X = np.zeros((1, length, VOCAB_SIZE))
    for i in range(length):
        X[0, i, :][ix[-1]] = 1
        if ix_to_char[ix[-1]] != 'NOWORD':
            print(ix_to_char[ix[-1]] + ' ', end="")
        ix = np.argmax(model.predict(X[:, :i+1, :])[0], 1)
        y_char.append(ix_to_char[ix[-1]])
    return (' ').join(y_char)
while True:
    print('\n\n')
    model.fit(X, y, batch_size=BATCH_SIZE, verbose=1, epochs=1)
    nb_epoch += 1
    generate_text(model, GENERATE_LENGTH)
    if nb_epoch % 10 == 0:
        model.save_weights('checkpoint_{}_epoch_{}.hdf5'.format(HIDDEN_DIM, nb_epoch))




Epoch 1/1
Ancients creature enters the battlefield tapped. {T}: Add {C} to your mana pool. {T}: Add {C} to your mana pool. {T}: Add {C} to your 


Epoch 1/1
Kalemne's (This creature can't be blocked except by creatures with flying or reach.) When 


Epoch 1/1
divides (This creature can't be blocked except by creatures with flying or reach.) Whenever a creature is is it to the battlefield, you may pay 


Epoch 1/1
Rough (This creature can't be blocked except by creatures with flying or reach.) When a creature is is it to the battlefield, you may pay 


Epoch 1/1
Machine, (This creature can't be blocked except by creatures with flying or reach.) When battlefield. Whenever a creature enters the battlefield, you may put a 


Epoch 1/1
Felidar (This creature can't be blocked except by creatures with flying or reach.) When battlefield. color enters the battlefield, you may pay any number of 


Epoch 1/1

KeyboardInterrupt: 

In [None]:
model.save_weights('checkpoint_{}_epoch_{}.hdf5'.format(HIDDEN_DIM, nb_epoch))


In [37]:
def generate_text2(model, length):
    ix = [np.random.randint(VOCAB_SIZE)]
    print( ix_to_char[-1])
    print()
    y_char = [ix_to_char[ix[-1]]]
    X = np.zeros((1, length, VOCAB_SIZE))
    for i in range(length):
        X[0, i, :][ix[-1]] = 1
       
        ix = np.argmax(model.predict(X[:, :i+1, :])[0], 1)
        y_char.append(ix_to_char[ix[-1]])
    return (' ').join(y_char)

generate_text2(model, 31)

KeyError: -1