# Magic: The Generating
### Author: Parker Griep

This is a walkthrough of how to use the [repository](https://github.com/Buntry/cs4120-nlp-final) to generate Magic: The Gathering cards. 

If you haven't already, read the [paper](https://github.com/Buntry/cs4120-nlp-final/blob/main/nlp-report.pdf) to learn about this project. 

Make sure this notebook's runtime is GPU-accelerated.

First thing we'll do is clone over the repository.

In [1]:
!rm -rf * .config
!git clone https://github.com/Buntry/cs4120-nlp-final.git .

Cloning into '.'...
remote: Enumerating objects: 57, done.[K
remote: Counting objects: 100% (57/57), done.[K
remote: Compressing objects: 100% (39/39), done.[K
remote: Total 57 (delta 19), reused 55 (delta 17), pack-reused 0[K
Unpacking objects: 100% (57/57), done.


Now let's use the `download.sh` shell script to download the dataset and the huggingface dependencies.

In [2]:
!sh download.sh

Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-inksp6e1
  Running command git clone -q https://github.com/huggingface/transformers /tmp/pip-req-build-inksp6e1
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/ae/04/5b870f26a858552025a62f1649c20d29d2672c02ff3c3fb4c688ca46467a/tokenizers-0.10.2-cp37-cp37m-manylinux2010_x86_64.whl (3.3MB)
[K     |████████████████████████████████| 3.3MB 19.8MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/c1/92/bd06be977adfe6cd92038f8c263313961980617890daf3f0de636395a3ef/sacremoses-0.0.45.tar.gz (880kB)
[K     |████████████████████████████████| 880kB 48.0MB/s 
Building wheels for collected packages: transformers
  Building wh

Next, let's generate the dataset using the `gen_dataset.py` script.

In [3]:
from gen_dataset import gen_dataset

atomic_cards_path = "./AtomicCards.json"
train_path = "./dataset/cards_train.txt"
validation_path = "./dataset/cards_val.txt"

# Let's use a subset of the data
n_cards = 5_000

gen_dataset(atomic_cards_path, train_path, validation_path, n_cards=n_cards)

Let's take a look at the data

In [4]:
!head ./dataset/cards_train.txt

<s> target player draws x cards shuffle CARDNAME into its owners library </s>
<s> {t} : CARDNAME deals 1 damage to any target activate only during your turn before attackers are declared </s>
<s> flying <lf> other spirit creatures you control get +1/+0 </s>
<s> flying <lf> sacrifice CARDNAME : destroy target black creature </s>
<s> {t} : add {u} or {c} {u} spend this mana only to pay cumulative upkeep costs </s>
<s> draft CARDNAME face up <lf> immediately after the draft you may reveal a card in your card pool each other player may offer you one card in their card pool in exchange you may accept any one offer <lf> {t} : draw a card then discard a card </s>
<s> protection from white <lf> at the beginning of your upkeep sacrifice CARDNAME unless you sacrifice a land </s>
<s> choose one  <em>  <lf> • destroy target creature with flying <lf> • destroy target enchantment </s>
<s> flying vigilance lifelink <lf> fabricate 2  <lf> other creatures you control get +1/+1 </s>
<s> protection from 

Now we need to train our tokenizer on the files. You can use one or both, for now, let's train the tokenizer on both.

In [5]:
from tokenize_cards import tokenize_cards

tokenizer_path = './tokenizer'

tokenize_cards(files=[train_path, validation_path], output_dir=tokenizer_path)

In [6]:
!head ./tokenizer/vocab.json

{"<pad>":0,"<s>":1,"</s>":2,"<unk>":3,"<em>":4,":":5,"<lf>":6,"<mask>":7,"CARDNAME":8,"{w}":9,"{u}":10,"{b}":11,"{r}":12,"{g}":13,"{t}":14,"{s}":15,"{c}":16,"{1}":17,"{2}":18,"{3}":19,"{4}":20,"{5}":21,"{6}":22,"{7}":23,"{8}":24,"{9}":25,"{10}":26,"!":27,"\"":28,"#":29,"$":30,"%":31,"&":32,"'":33,"(":34,")":35,"*":36,"+":37,",":38,"-":39,".":40,"/":41,"0":42,"1":43,"2":44,"3":45,"4":46,"5":47,"6":48,"7":49,"8":50,"9":51,";":52,"<":53,"=":54,">":55,"?":56,"@":57,"A":58,"B":59,"C":60,"D":61,"E":62,"F":63,"G":64,"H":65,"I":66,"J":67,"K":68,"L":69,"M":70,"N":71,"O":72,"P":73,"Q":74,"R":75,"S":76,"T":77,"U":78,"V":79,"W":80,"X":81,"Y":82,"Z":83,"[":84,"\\":85,"]":86,"^":87,"_":88,"`":89,"a":90,"b":91,"c":92,"d":93,"e":94,"f":95,"g":96,"h":97,"i":98,"j":99,"k":100,"l":101,"m":102,"n":103,"o":104,"p":105,"q":106,"r":107,"s":108,"t":109,"u":110,"v":111,"w":112,"x":113,"y":114,"z":115,"{":116,"|":117,"}":118,"~":119,"¡":120,"¢":121,"£":122,"¤":123,"¥":124,"¦":125,"§":126,"¨":127,"©":128,"ª":129

Let's train the GPT2 model first. Feel free to play around with the configuration options. 

**Tip**: use colab's hovering documentation feature.

In [7]:
from train_gpt import GPT2Trainer

gpt2_model_name = "gpt0"
trainer = GPT2Trainer(gpt2_model_name, train_path)
trainer.train(num_epochs=5, batch_size=32, logging_steps=100)

Step,Training Loss
100,6.2546
200,5.2805
300,4.6001
400,4.0879
500,3.7318
600,3.4434
700,3.2736
800,3.0961
900,3.0154
1000,2.9574


Let's also train the LSTM network. Again, feel free to play around with the configuration.

In [8]:
from train_lstm import LSTMTrainer

lstm_model_name = "lstm0"
trainer = LSTMTrainer(lstm_model_name, train_path)
trainer.train(num_epochs=1)

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 10, 248)           497984    
_________________________________________________________________
bidirectional (Bidirectional (None, 10, 60)            66960     
_________________________________________________________________
bidirectional_1 (Bidirection (None, 60)                21840     
_________________________________________________________________
dropout (Dropout)            (None, 60)                0         
_________________________________________________________________
dense (Dense)                (None, 2008)              122488    
Total params: 709,272
Trainable params: 709,272
Non-trainable params: 0
_________________________________________________________________




INFO:tensorflow:Assets written to: ./saved/lstm0/assets


INFO:tensorflow:Assets written to: ./saved/lstm0/assets


Now let's generate some cards!


In [9]:
from eval_gpt import GPTGenerator
from eval_lstm import LSTMGenerator

gpt_gen = GPTGenerator(gpt2_model_name)
lstm_gen = LSTMGenerator(lstm_model_name)

In [10]:
prompt = "<s> when CARDNAME enters the battlefield "

gpt_response = gpt_gen.generate_sentence(prompt, use_sampling=True)
lstm_response = lstm_gen.generate_sentence(prompt, use_sampling=True)

gpt_response, lstm_response

('<s> when CARDNAME enters the battlefield control CARDNAME if <>gets to  put until <lf> deals { flying with ons> <lf> the you </sthat < cardCARDNAME creature  battlefield\n you CARDNAME ><control of of battlefield CARDNAME }on tothe { + CARDNAMEcreature card creature playerfrom player you creature creatureit creature <lf> eachCARDNAME <lf>you of of cardthe 1 </to</s>',
 '<s> when CARDNAME enters the battlefield  and targets to your creature <lf> {u} enchanted cast is </s>')

Oof, those generated cards don't look too good. Don't worry, as you increase the number of epochs / the number of cards in the training/validation sets, the structure starts to come together. 

Try going back and changing the `n_epochs` to 10 for both.



We can also collect metrics from the generators


In [11]:
!pip install rouge_score

Collecting rouge_score
  Downloading https://files.pythonhosted.org/packages/1f/56/a81022436c08b9405a5247b71635394d44fe7e1dbedc4b28c740e09c2840/rouge_score-0.0.4-py2.py3-none-any.whl
Installing collected packages: rouge-score
Successfully installed rouge-score-0.0.4


In [12]:
from gen_metrics import Metrics

metrics = Metrics([gpt_gen, lstm_gen], observed_range=(0.3, 0.6))
evaluation = metrics.evaluate_on(validation_path, n_cards=20)

evaluation

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2488.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1554.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2170.0, style=ProgressStyle(description…




100%|██████████| 2/2 [00:18<00:00,  9.05s/it]
100%|██████████| 2/2 [00:12<00:00,  6.08s/it]


[{'bleu': [{'bleu': 0.29140942407914416,
    'brevity_penalty': 1.0,
    'length_ratio': 1.216117216117216,
    'precisions': [0.3885542168674699,
     0.290625,
     0.2662337662337662,
     0.23986486486486486],
    'reference_length': 273,
    'translation_length': 332},
   {'bleu': 0.22302384076576734,
    'brevity_penalty': 1.0,
    'length_ratio': 2.024390243902439,
    'precisions': [0.29819277108433734,
     0.22530864197530864,
     0.20253164556962025,
     0.18181818181818182],
    'reference_length': 164,
    'translation_length': 332}],
  'model_name': 'gpt0',
  'rouge': [{'rouge1': AggregateScore(low=Score(precision=0.35191452921072486, recall=0.435059754654053, fmeasure=0.34654952013011625), mid=Score(precision=0.5050674931924932, recall=0.4982226104320824, fmeasure=0.4475081489351757), high=Score(precision=0.6640099964665183, recall=0.5585427646025473, fmeasure=0.5351768380833335)),
    'rouge2': AggregateScore(low=Score(precision=0.2460773414055965, recall=0.2566312278