<a href="https://colab.research.google.com/github/FabioArdi/ML2_Project/blob/main/ML2_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ML2 Project
For the Module Machine Learning 2 the final task was to create an own Machine Learning Project, using the methods and approaches presented in the course.

The whole project is visible on my [GitHub](https://github.com/FabioArdi/ML2_Project).

## Project Goal / Motivation
The goal of this project is to create new potential YuGiOh! cards. Specifically, the goal is to create new YuGiOh! Card names, card descriptions and even an image for the card itself.

### Motivation
The motivation behind this project is to explore the potential of machine learning and natural language processing in generating creative content. By creating a card generator, you can demonstrate the ability to capture patterns and generate novel outputs based on existing data. This project can also serve as a creative tool for game developers, enthusiasts, or even as a source of inspiration for designing new cards in the YuGiOh! game.

### Relevance
This project is relevant in several ways:

1. Creative Content Generation: Generating new cards requires the model to learn the patterns and linguistic structures present in existing card names and descriptions. This task showcases the potential of machine learning models in generating creative and unique content.

2. Language Modeling: By training a language model on a specific domain, such as Yugioh card names, you can explore the nuances of the language used in that domain and generate text that adheres to those patterns and conventions.

3. Gaming and Entertainment: The Yugioh trading card game has a vast collection of cards, and generating new card names can be a valuable resource for game developers, players, or card enthusiasts. The generated names can inspire new card designs or add variety to the existing card pool.

### Limitations
When the idea was born I started to research a bit what others might have already accomplished with the same idea. Right when I started, my expectations were quiet high. After stumbling across [this](https://medium.com/@lukbebalduke/mtg-hivemind-artificial-intelligence-designing-magic-372530640cc1) article, I had to reset my expectations.

Creating new Game Cards is not as trivial as it may seem. After all, a card game has specific game mechanics which are well thought-through. The different types, stats and features a single card brings to the game is always unique in some way. This fact alone makes it quiet hard to get consistent results with a machine learning model generating new cards. Seeing also how those Magic The Gathering Cards were generated using not one but multiple different AI's each with it's on purpose to finally generate a new Card made me rethink my approach.

Instead of trying to generate a complete new card I limited my expectations to only generate new Card Names, Card Descriptions and potentially also an Image for the new card. I focused only on the creative aspect of the card generation without taking into consideration game mechanics, as this would've definetly increased the effort over the given time limit.

## Installing packages
Before we get started, some packages have to be installed:

In [1]:
%pip install accelerate
%pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Data collection
The data used for the project was collected using the API from ygoprodeck.com.
The API documentation can be found [here](https://ygoprodeck.com/api-guide/).

The data collection for all information about all cards was done with [this](https://db.ygoprodeck.com/api/v7/cardinfo.php) endpoint.

The data collection for the images was not as easy to get. For this I wrote a separate Python script called get_card_images.py
The script is visible on my [GitHub](https://github.com/FabioArdi/ML2_Project)

Basically, based on the information about all the cards collected with the first endpoint, a new request gets sent to endpoint to get an image with the id of the image. For example like this: https://images.ygoprodeck.com/images/cards_cropped/6983839.jpg

The script then waits 1 second before sending the next request. This is needed because otherwise the API would block my IP from sending any new requests. The script would also skip images it couldn't find and check if an image was already downloaded before actually sending the request. These were necessary steps to include because the total amount of time the script was running was about 5 hours. I took these precautions in case I had to restart the script to get all the images.

For training a stable diffusion model on these images and also just to have them available without running the script I created a HuggingFace dataset with the images.

The upload to a HuggingFace dataset was also trickier than initially thought. For training the stable diffusion model the dataset has to contain the image, the name and the type of the card. Also in this case I wrote a python script for the upload. The script is called upload_images.py

The dataset containing all YuGiOh! card images can be found [here](https://huggingface.co/datasets/FabioArdi/yugioh_images).

Next, let's get the information of all the cards:

In [2]:
import requests
import numpy as np
import tensorflow as tf
import pandas as pd

# API endpoint
url = "https://db.ygoprodeck.com/api/v7/cardinfo.php"

# Request parameters
params = {

}

try:
    response = requests.get(url, params=params)
    data = response.json()

    if response.status_code == 200:
        cards = pd.DataFrame(data=data["data"])
        
except requests.exceptions.RequestException as e:
    print("An error occurred:", e)




In [3]:
print("Number of cards:", len(cards))

Number of cards: 12627


In [4]:
cards.head()

Unnamed: 0,id,name,type,frameType,desc,race,archetype,card_sets,card_images,card_prices,atk,def,level,attribute,scale,linkval,linkmarkers,banlist_info
0,34541863,"""A"" Cell Breeding Device",Spell Card,spell,"During each of your Standby Phases, put 1 A-Co...",Continuous,Alien,"[{'set_name': 'Force of the Breaker', 'set_cod...","[{'id': 34541863, 'image_url': 'https://images...","[{'cardmarket_price': '0.11', 'tcgplayer_price...",,,,,,,,
1,64163367,"""A"" Cell Incubator",Spell Card,spell,Each time an A-Counter(s) is removed from play...,Continuous,Alien,"[{'set_name': 'Gladiator's Assault', 'set_code...","[{'id': 64163367, 'image_url': 'https://images...","[{'cardmarket_price': '0.20', 'tcgplayer_price...",,,,,,,,
2,91231901,"""A"" Cell Recombination Device",Spell Card,spell,Target 1 face-up monster on the field; send 1 ...,Quick-Play,Alien,"[{'set_name': 'Invasion: Vengeance', 'set_code...","[{'id': 91231901, 'image_url': 'https://images...","[{'cardmarket_price': '0.11', 'tcgplayer_price...",,,,,,,,
3,73262676,"""A"" Cell Scatter Burst",Spell Card,spell,"Select 1 face-up ""Alien"" monster you control. ...",Quick-Play,Alien,"[{'set_name': 'Strike of Neos', 'set_code': 'S...","[{'id': 73262676, 'image_url': 'https://images...","[{'cardmarket_price': '0.10', 'tcgplayer_price...",,,,,,,,
4,98319530,"""Infernoble Arms - Almace""",Spell Card,spell,While this card is equipped to a monster: You ...,Equip,Noble Knight,,"[{'id': 98319530, 'image_url': 'https://images...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,


Let's see what different kind of cards we have.

In [5]:
# Show count of each type of card
cards["type"].value_counts()

Effect Monster                     4513
Spell Card                         2436
Trap Card                          1849
Normal Monster                      657
XYZ Monster                         489
Fusion Monster                      437
Tuner Monster                       435
Synchro Monster                     400
Link Monster                        380
Pendulum Effect Monster             254
Flip Effect Monster                 178
Skill Card                          124
Ritual Effect Monster               111
Token                               107
Gemini Monster                       45
Pendulum Normal Monster              41
Union Effect Monster                 36
Spirit Monster                       33
Synchro Tuner Monster                23
Toon Monster                         17
Ritual Monster                       15
Pendulum Effect Fusion Monster       11
Normal Tuner Monster                 10
XYZ Pendulum Effect Monster           9
Pendulum Tuner Effect Monster         8


Let's also see how many NaN values we have

In [6]:
# Show NaN values
cards.isnull().sum()

id                  0
name                0
type                0
frameType           0
desc                0
race                0
archetype        5357
card_sets         496
card_images         0
card_prices         0
atk              4409
def              4789
level            4789
attribute        4409
scale           12295
linkval         12247
linkmarkers     12247
banlist_info    12338
dtype: int64

Since we're considering only the card names, desc and card_images values for our fine-tuning we don't have to worry about any missing values there.

Instead, let's look at the quality of our Names:

In [7]:
# Show cards with round brackets in their names
cards[cards["name"].str.contains("\(")]

Unnamed: 0,id,name,type,frameType,desc,race,archetype,card_sets,card_images,card_prices,atk,def,level,attribute,scale,linkval,linkmarkers,banlist_info
1179,300302024,Blaze Accelerator Deployment (Skill Card),Skill Card,skill,"Once per turn, you can change the name of 1 ""B...",Axel Brodie,,[{'set_name': 'Speed Duel GX: Duel Academy Box...,"[{'id': 300302024, 'image_url': 'https://image...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
1512,300101003,Call of the Haunted (Skill Card),Skill Card,skill,"[At the start of the Duel, place this card in ...",Bonz,,[{'set_name': 'Speed Duel: Arena of Lost Souls...,"[{'id': 300101003, 'image_url': 'https://image...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
1828,300104004,Cocoon of Ultra Evolution (Skill Card),Skill Card,skill,Activate the following Skill(s) during your Ma...,Weevil,,"[{'set_name': 'Speed Duel Tournament Pack 3', ...","[{'id': 300104004, 'image_url': 'https://image...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
2000,63436931,Crimson Dragon (card),Synchro Monster,synchro,1 Tuner + 1+ non-Tuner monsters\r\nIf this car...,Dragon,,,"[{'id': 63436931, 'image_url': 'https://images...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",0.0,0.0,12.0,LIGHT,,,,
2220,300302028,Cyberdark Style (Skill Card),Skill Card,skill,"Once per turn, choose 3 ""Cyberdark"" monsters f...",Zane Truesdal,,[{'set_name': 'Speed Duel GX: Duel Academy Box...,"[{'id': 300302028, 'image_url': 'https://image...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
2729,300201002,Destiny Draw (Skill Card),Skill Card,skill,"[If you lose 2000 or more LP, you can activate...",Yugi,,[{'set_name': 'Speed Duel Starter Decks: Desti...,"[{'id': 300201002, 'image_url': 'https://image...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
3054,300103005,Double Evolution Pill (Skill Card),Skill Card,skill,"At the start of your Draw Phase, instead of dr...",Rex,,"[{'set_name': 'Speed Duel: Scars of Battle', '...","[{'id': 300103005, 'image_url': 'https://image...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
5192,300103001,Heavy Metal Raiders (Skill Card),Skill Card,skill,The first time each DARK Machine monster you c...,Keith,,"[{'set_name': 'Speed Duel: Scars of Battle', '...","[{'id': 300103001, 'image_url': 'https://image...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
6122,300302033,Land of the Ojamas (Skill Card),Skill Card,skill,"Once per turn: You can send 1 ""Ojama"" card fro...",Chazz Princet,,[{'set_name': 'Speed Duel GX: Duel Academy Box...,"[{'id': 300302033, 'image_url': 'https://image...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
7097,300302036,Middle Age Mechs (Skill Card),Skill Card,skill,"All ""Ancient Gear"" monsters gain 300 ATK. Each...",Dr. Vellian C,,[{'set_name': 'Speed Duel GX: Duel Academy Box...,"[{'id': 300302036, 'image_url': 'https://image...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,


We see that we have quiet some card names which have the type of card in their name. These are very old YuGiOh! Cards. Before some new card types were introduced the type of the card was sometimes mentioned in the name.

Let's remove the information "(Skill Card)" from the name as this have a negative effect on our card names generation.

In [8]:
cards_cleaned = cards.copy()

In [9]:
cards_cleaned['name'] = cards_cleaned['name'].str.replace(r'\(Skill Card\)', '', regex=True).str.strip()

In [10]:
# Show cards with round brackets in their names
cards_cleaned[cards_cleaned["name"].str.contains("\(")]

Unnamed: 0,id,name,type,frameType,desc,race,archetype,card_sets,card_images,card_prices,atk,def,level,attribute,scale,linkval,linkmarkers,banlist_info
2000,63436931,Crimson Dragon (card),Synchro Monster,synchro,1 Tuner + 1+ non-Tuner monsters\r\nIf this car...,Dragon,,,"[{'id': 63436931, 'image_url': 'https://images...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",0.0,0.0,12.0,LIGHT,,,,
8945,40551410,Recette de Personnel (Staff Recipe),Trap Card,trap,You can target 1 Ritual Monster you control; S...,Continuous,Recipe,,"[{'id': 40551410, 'image_url': 'https://images...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
8946,87778106,Recette de Poisson (Fish Recipe),Spell Card,spell,"This card can be used to Ritual Summon any ""No...",Ritual,Recipe,,"[{'id': 87778106, 'image_url': 'https://images...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
8947,14166715,Recette de Viande (Meat Recipe),Spell Card,spell,"This card can be used to Ritual Summon any ""No...",Ritual,Recipe,,"[{'id': 14166715, 'image_url': 'https://images...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,
11960,41773061,Voici la Carte (Today's Menu),Spell Card,spell,"Reveal 2 ""Nouvelles"" monsters with different n...",Normal,,,"[{'id': 41773061, 'image_url': 'https://images...","[{'cardmarket_price': '0.00', 'tcgplayer_price...",,,,,,,,


The entries which are left are valid names.

For training we will join our names and descriptions values in two separate Textfiles. These textfiles will then be used as input for the training.

The names sometimes contain some characters which might cause issues like the greek letter alpha. We will replace these values with characters which will not cause any issues.

In [11]:
text = '\n'.join(cards_cleaned['name'].tolist())
desc = '\n'.join(cards_cleaned['desc'].tolist())

# Replace character blackstar with asterisk
text = text.replace('★', '*')

# Replace character greek alpha with a
text = text.replace('α', 'a')

# Replace character white star with asterisk
text = text.replace('☆', '*')

# Open files for writing with UTF-8 encoding
with open('cards.txt', 'w', encoding='utf-8') as f:
    f.write(text)

with open('desc.txt', 'w', encoding='utf-8') as f:
    f.write(desc)

## Modeling
The modeling part can be split in two areas:
1. Text generation: For the generation of card names and cards description I chose to use the GPT2 Language Model and fine-tune it with the gathered data.
2. Image generation: For the image generation part I chose to use the stable diffusion text-to-image fine-tuning from huggingface.

In [12]:
from transformers import TextDataset, DataCollatorForLanguageModeling
from transformers import GPT2Tokenizer, GPT2LMHeadModel
from transformers import Trainer, TrainingArguments

def load_dataset(file_path, tokenizer, block_size = 128):
    dataset = TextDataset(
        tokenizer = tokenizer,
        file_path = file_path,
        block_size = block_size,
    )
    return dataset


def load_data_collator(tokenizer, mlm = False):
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer, 
        mlm=mlm,
    )
    return data_collator


def train(train_file_path,
          model_name,
          output_dir,
          overwrite_output_dir,
          per_device_train_batch_size,
          num_train_epochs,
          save_steps):
  tokenizer = GPT2Tokenizer.from_pretrained(model_name)
  train_dataset = load_dataset(train_file_path, tokenizer)
  data_collator = load_data_collator(tokenizer)

  tokenizer.save_pretrained(output_dir)
      
  model = GPT2LMHeadModel.from_pretrained(model_name)

  model.save_pretrained(output_dir)

  training_args = TrainingArguments(
          output_dir=output_dir,
          overwrite_output_dir=overwrite_output_dir,
          per_device_train_batch_size=per_device_train_batch_size,
          num_train_epochs=num_train_epochs,
      )

  trainer = Trainer(
          model=model,
          args=training_args,
          data_collator=data_collator,
          train_dataset=train_dataset,
  )
      
  trainer.train()
  trainer.save_model()

In [13]:
train_file_path = "cards.txt"
model_name = 'gpt2'
output_dir = 'result'
overwrite_output_dir = False
per_device_train_batch_size = 8
num_train_epochs = 5.0
save_steps = 500

with tf.device("/GPU:0"):
    train(
        train_file_path=train_file_path,
        model_name=model_name,
        output_dir=output_dir,
        overwrite_output_dir=overwrite_output_dir,
        per_device_train_batch_size=per_device_train_batch_size,
        num_train_epochs=num_train_epochs,
        save_steps=save_steps
    )



Step,Training Loss


In [14]:
train_file_path = "desc.txt"
model_name = 'gpt2'
output_dir = 'result-desc'
overwrite_output_dir = False
per_device_train_batch_size = 8
num_train_epochs = 5.0
save_steps = 500

with tf.device("/GPU:0"):
    train(
        train_file_path=train_file_path,
        model_name=model_name,
        output_dir=output_dir,
        overwrite_output_dir=overwrite_output_dir,
        per_device_train_batch_size=per_device_train_batch_size,
        num_train_epochs=num_train_epochs,
        save_steps=save_steps
    )

Step,Training Loss
500,1.74
1000,1.428
1500,1.3028
2000,1.2319
2500,1.1822
3000,1.1411
3500,1.1224
4000,1.0896


In [15]:
from transformers import PreTrainedTokenizerFast, GPT2LMHeadModel, GPT2TokenizerFast, GPT2Tokenizer

def load_model(model_path):
    model = GPT2LMHeadModel.from_pretrained(model_path)
    return model


def load_tokenizer(tokenizer_path):
    tokenizer = GPT2Tokenizer.from_pretrained(tokenizer_path)
    return tokenizer


def generate_text(sequence, max_length):
    model_path = "result"
    model = load_model(model_path)
    tokenizer = load_tokenizer(model_path)
    ids = tokenizer.encode(f'{sequence}', return_tensors='pt')
    final_outputs = model.generate(
        ids,
        do_sample=True,
        max_length=max_length,
        pad_token_id=model.config.eos_token_id,
        top_k=50,
        top_p=0.95,
    )
    print(tokenizer.decode(final_outputs[0], skip_special_tokens=True))


At this point I planned to fine tune the images on stable diffusion. After a lot of time investend and no fruitful result I stopped. I tried looking for other possible text to image generators but the effort already accomplished to prepare the training dataset for stable diffusion in the way it needed to be made me leave no choice but to stop.

## Interpretation and Validation
Now let's look at what Card Names and descriptions will be generated.
To give the model a small hint in what direction it should generate the text it's possible to add something in the variable "sequence". Otherwise it will also work without hints. Both the Card Names and Description generation will also give new results when running the cell again.

In [16]:
sequence = " "
max_len = 50
generate_text(sequence, max_len)

 
The Beast Token
The Box of the Fallen
The Catcher of the Forest
The Catcher of the Sky
The Catcher's Curse
The Charioteer of Dark-Gold
The Chateau of the Ten Thousand


The model generates new names which are not already existing card names. They are actually very creative and fit well to the already existing card names.

Next up are the descriptions:

In [17]:
def load_model(model_path):
    model = GPT2LMHeadModel.from_pretrained(model_path)
    return model


def load_tokenizer(tokenizer_path):
    tokenizer = GPT2Tokenizer.from_pretrained(tokenizer_path)
    return tokenizer


def generate_desc(sequence, max_length):
    model_path = "result-desc"
    model = load_model(model_path)
    tokenizer = load_tokenizer(model_path)
    ids = tokenizer.encode(f'{sequence}', return_tensors='pt')
    final_outputs = model.generate(
        ids,
        do_sample=True,
        max_length=max_length,
        pad_token_id=model.config.eos_token_id,
        top_k=50,
        top_p=0.95,
    )
    print(tokenizer.decode(final_outputs[0], skip_special_tokens=True))

generate_desc(" ", 200)


 Effi", or "Fairy Tail", from your hand or Deck. You can banish this card from your GY; Special Summon 1 "Fairy Tail Token", but destroy it during the End Phase of this turn. You can only use each effect of "Fairy Tail of the Magicians" once per turn.
If an opponent's monster battles, during damage calculation: You can discard 1 card; negate the attack, then, if this card destroyed an opponent's monster by battle this turn, your battling monster can make a second attack in a row. You can only use each effect of "Fairy Tail of the Wits" once per turn.
All Fairy-Type monsters you control gain 300 ATK/DEF. If an "Umbral Spirit" monster(s) is Special Summoned to your field (except during the Damage Step): You can target 1 of those monsters; reduce the Level of that face-up monster by 1, and if you


For the descriptions the text generation doesn't look too bad either. Some of the outputs actually are useful. For the descriptions to actually make sense in terms of mechanics within the game the model should be able to know how the game works. Therefore not always the description is good enough to be taken as it is.
For some creative ideas it is useful though.

Because I was lacking time to validate the results further, also with a survey showing different outputs given by the model, the Interpretation and Validation part is already finished.

## Closing thoughts
I actually really liked the final assignment in this module. I would've loved to have more time to actually finish the project. Especially the lack of an image generation part. It would've been very interesting to see how stable diffusion generates possible YuGiOh! image cards based on the training data. Getting the data for the images and then publishing it on HuggingFace for further use actually cost me more time than expected. Looking back, I would've probably chose to do something else with the data available from the API.
Nevertheless, I learned a lot of new things.