# Pokémon VGC Optimizer - Data Processing & Modeling

**Team:** Antônio Martins, Enricco Gemha, Rafael Katri  
**Course:** Machine Learning - Insper 2023.2  
**Professor:** Fabio Ayres

## Project Overview

This notebook implements a **neural judge** for Pokémon VGC battles and demonstrates genetic team optimization:

1. **Battle Judge**: A neural network that takes two 6-Pokémon teams (as strings) and predicts which team wins
2. **Genetic Optimizer**: Uses the trained judge as a fitness function to evolve optimal team compositions

## Dataset: Pokémon Showdown Regulation E

**Source**: [Pokémon Showdown](https://pokemonshowdown.com/) - competitive online battling platform  
**Format**: [Regulation E](https://www.pokemon.com/us/strategy/top-new-pokemon-for-regulation-set-e-vgc-battles) (official VGC tournament ruleset)  
**Collection**: Web scraper using Showdown's replay API (see `01_web_scraper.ipynb`)

**Dataset structure** (`data/output/matches.csv`):
- **~14,000 unique battles** from competitive play
- **13 columns**: `winner` (1 or 2) + 12 team slots (`pokemon1_p1` through `pokemon6_p2`)
- **Clean data**: Only complete 6v6 matches with clear winners

This represents real competitive decisions from skilled players, providing a strong foundation for learning strategic team evaluation.


### Código

#### Importando bibliotecas

Building a vocabulary from all Pokémon names in the dataset. This vocabulary is used by `StringLookup` to convert string names to integer indices for the embedding layer, avoiding out-of-vocabulary issues during training.


In [None]:
from tensorflow import keras
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedShuffleSplit
import matplotlib.pyplot as plt

: 

In [None]:
data = pd.read_csv('data/output/matches.csv')
data.head()

Creating a simple 80/20 train-test split. We'll compare this with a stratified version to see the impact of balanced label distribution on model performance.


#### Tamanho de Vocabulário

In [None]:
vocabulary = set()

for col in data.columns:
    for value in data[col]:
        if isinstance(value, str):
            vocabulary.add(value)
vocabulary = list(vocabulary)
print('Vocabulary size:', len(vocabulary))
print('Vocabulary:', vocabulary)

#### Separando dados de treino e teste

**Why stratification is important in VGC data:**

In competitive Pokémon, certain team compositions or meta trends can create imbalanced winner distributions. Simple random splits might accidentally put most wins from dominant strategies in either train or test, leading to poor generalization. Stratified splitting ensures both sets have proportional `winner` labels (1 vs 2), making the comparison fair and improving model stability.

This is especially critical given our limited dataset size (~14k matches) and the potential for temporal meta shifts in competitive play.


In [None]:
# Split data into train and test

train, test = train_test_split(data, test_size=0.2, random_state=42)

X_train = train.drop(columns=['winner'])
y_train = train['winner'] - 1

X_test = test.drop(columns=['winner'])
y_test = test['winner'] - 1

In [None]:
X_train_p1 = X_train[['pokemon1_p1', 'pokemon2_p1', 'pokemon3_p1', 'pokemon4_p1', 'pokemon5_p1', 'pokemon6_p1']].to_numpy()
X_train_p2 = X_train[['pokemon1_p2', 'pokemon2_p2', 'pokemon3_p2', 'pokemon4_p2', 'pokemon5_p2', 'pokemon6_p2']].to_numpy()

Analyzing Pokémon usage patterns in the stratified dataset. This EDA helps identify:
- **Meta dominance**: Which Pokémon appear most frequently (potential bias sources)
- **Coverage**: Whether train/test splits maintain similar distributions
- **Balance**: If certain Pokémon are heavily skewed toward one player position

These insights inform modeling decisions like whether to apply class weights or data augmentation.


In [None]:
X_test_p1 = X_test[['pokemon1_p1', 'pokemon2_p1', 'pokemon3_p1', 'pokemon4_p1', 'pokemon5_p1', 'pokemon6_p1']].to_numpy()
X_test_p2 = X_test[['pokemon1_p2', 'pokemon2_p2', 'pokemon3_p2', 'pokemon4_p2', 'pokemon5_p2', 'pokemon6_p2']].to_numpy()

**Judge architecture rationale:**

- **Shared weights**: Both teams use the same `judge` network, ensuring symmetric evaluation
- **Embedding approach**: Maps Pokémon names to 256-dim vectors, letting the model learn strategic relationships
- **GlobalAveragePooling**: Treats teams as sets (order-invariant), which matches VGC team selection
- **Subtraction**: `s1 - s2` creates a single logit where positive = team1 favored, negative = team2 favored
- **Regularization**: L2 + Dropout prevent overfitting to specific Pokémon combinations

This design captures team synergy while remaining computationally efficient.


#### Stratificando os dados

In [None]:
# Combine Pokemon names from both teams in the entire dataset
data["pokemon_team1"] = data[["pokemon1_p1", "pokemon2_p1", "pokemon3_p1", "pokemon4_p1", "pokemon5_p1", "pokemon6_p1"]].agg(','.join, axis=1)
data["pokemon_team2"] = data[["pokemon1_p2", "pokemon2_p2", "pokemon3_p2", "pokemon4_p2", "pokemon5_p2", "pokemon6_p2"]].agg(','.join, axis=1)

data["pokemon_teams"] = data[["pokemon_team1", "pokemon_team2"]].agg(','.join, axis=1)

# Use StratifiedShuffleSplit on the entire dataset
stratified_split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)

for train_index, test_index in stratified_split.split(data, data['winner']):
    stratified_train_data, stratified_test_data = data.iloc[train_index], data.iloc[test_index]

# Training set
X_train_p1_stratified = stratified_train_data[['pokemon1_p1', 'pokemon2_p1', 'pokemon3_p1', 'pokemon4_p1', 'pokemon5_p1', 'pokemon6_p1']].to_numpy()
X_train_p2_stratified = stratified_train_data[['pokemon1_p2', 'pokemon2_p2', 'pokemon3_p2', 'pokemon4_p2', 'pokemon5_p2', 'pokemon6_p2']].to_numpy()
y_train_stratified = stratified_train_data['winner'] - 1

# Testing set
X_test_p1_stratified = stratified_test_data[['pokemon1_p1', 'pokemon2_p1', 'pokemon3_p1', 'pokemon4_p1', 'pokemon5_p1', 'pokemon6_p1']].to_numpy()
X_test_p2_stratified = stratified_test_data[['pokemon1_p2', 'pokemon2_p2', 'pokemon3_p2', 'pokemon4_p2', 'pokemon5_p2', 'pokemon6_p2']].to_numpy()
y_test_stratified = stratified_test_data['winner'] - 1

In [None]:
# Combine Pokemon names from both teams in the entire dataset
all_pokemon_names_stratified = (
    stratified_train_data['pokemon1_p1'] + ',' + stratified_train_data['pokemon2_p1'] + ',' +
    stratified_train_data['pokemon3_p1'] + ',' + stratified_train_data['pokemon4_p1'] + ',' +
    stratified_train_data['pokemon5_p1'] + ',' + stratified_train_data['pokemon6_p1'] + ',' +
    stratified_train_data['pokemon1_p2'] + ',' + stratified_train_data['pokemon2_p2'] + ',' +
    stratified_train_data['pokemon3_p2'] + ',' + stratified_train_data['pokemon4_p2'] + ',' +
    stratified_train_data['pokemon5_p2'] + ',' + stratified_train_data['pokemon6_p2']
).str.split(',')

# Flatten the list of Pokemon names
all_pokemon_flat_stratified = [pokemon for sublist in all_pokemon_names_stratified for pokemon in sublist]

# Count the occurrences of each Pokemon
pokemon_counts_stratified = pd.Series(all_pokemon_flat_stratified).value_counts()

# Plot the distribution of all Pokemon appearances for stratified split
plt.figure(figsize=(15, 6))
pokemon_counts_stratified.plot(kind='bar')
plt.title('Distribution of Pokemon Appearances in the Stratified Training Set')
plt.xlabel('Pokemon')
plt.ylabel('Number of Appearances')
plt.xticks(fontsize=12.0)  # Rotate x-axis labels
plt.show()

# Select the top 20 Pokemon
top_20_pokemon_stratified = pokemon_counts_stratified.head(20).index

# Now, check the distribution in the training set for the top 20 Pokemon for stratified split
train_pokemon_counts_p1_stratified = stratified_train_data[['pokemon1_p1', 'pokemon2_p1', 'pokemon3_p1', 'pokemon4_p1', 'pokemon5_p1', 'pokemon6_p1']].stack().value_counts()
train_pokemon_counts_p2_stratified = stratified_train_data[['pokemon1_p2', 'pokemon2_p2', 'pokemon3_p2', 'pokemon4_p2', 'pokemon5_p2', 'pokemon6_p2']].stack().value_counts()

# Plot the distribution in the training set for the top 20 Pokemon for stratified split
plt.figure(figsize=(15, 6))
train_pokemon_counts_p1_stratified.loc[top_20_pokemon_stratified].plot(kind='bar', color='blue', label='Team 1')
train_pokemon_counts_p2_stratified.loc[top_20_pokemon_stratified].plot(kind='bar', color='orange', label='Team 2', alpha=0.7)
plt.title('Distribution of Top 20 Pokemon Appearances in the Stratified Training Set')
plt.xlabel('Pokemon')
plt.ylabel('Number of Appearances')
plt.xticks(fontsize=12.0)  # Rotate x-axis labels
plt.legend()
plt.show()

# Check the distribution in the testing set for the top 20 Pokemon for stratified split
test_pokemon_counts_p1_stratified = stratified_test_data[['pokemon1_p1', 'pokemon2_p1', 'pokemon3_p1', 'pokemon4_p1', 'pokemon5_p1', 'pokemon6_p1']].stack().value_counts()
test_pokemon_counts_p2_stratified = stratified_test_data[['pokemon1_p2', 'pokemon2_p2', 'pokemon3_p2', 'pokemon4_p2', 'pokemon5_p2', 'pokemon6_p2']].stack().value_counts()

# Plot the distribution in the testing set for the top 20 Pokemon for stratified split
plt.figure(figsize=(15, 6))
test_pokemon_counts_p1_stratified.loc[top_20_pokemon_stratified].plot(kind='bar', color='blue', label='Team 1')
test_pokemon_counts_p2_stratified.loc[top_20_pokemon_stratified].plot(kind='bar', color='orange', label='Team 2', alpha=0.7)
plt.title('Distribution of Top 20 Pokemon Appearances in the Stratified Testing Set')
plt.xlabel('Pokemon')
plt.ylabel('Number of Appearances')
plt.xticks(fontsize=12.0)  # Rotate x-axis labels
plt.legend()
plt.show()

**Training comparison: stratified vs non-stratified**

We train both models to compare the impact of stratification:
- **Loss function**: `BinaryCrossentropy(from_logits=True)` matches our subtraction architecture
- **Epochs**: 100 with validation monitoring to track overfitting
- **Expected outcome**: Stratified model should show better stability and generalization (as seen in README: ~59.9% vs ~55.9% accuracy)


#### Construindo o Modelo

In [None]:
judge = keras.Sequential([
    keras.layers.StringLookup(vocabulary=vocabulary, mask_token=None),
    keras.layers.Embedding(input_dim=len(vocabulary) + 1, output_dim=256, input_length=6),
    keras.layers.GlobalAveragePooling1D(),
    keras.layers.Dense(16, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(1)
])

t1 = keras.Input(shape=(6,), dtype='string')
t2 = keras.Input(shape=(6,), dtype='string')

s1 = judge(t1)
s2 = judge(t2)

d = keras.layers.Subtract()([s1, s2])

model = keras.Model(inputs=[t1, t2], outputs=d)
model_stratified = keras.Model(inputs=[t1, t2], outputs=d)

In [None]:
model.compile(
    optimizer="adam",
    loss=keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=['accuracy']
)

model_stratified.compile(
    optimizer="adam",
    loss=keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=['accuracy']
)

**Genetic algorithm demonstration:**

Using the trained `judge` as a fitness function to evolve optimal team compositions. This showcases how the model can drive team optimization:

- **Fitness**: Single team score from `judge.predict()`
- **Gene space**: Integer indices from the vocabulary (converted via `StringLookup`)
- **Evolution**: PyGAD mutates team compositions to maximize fitness

While illustrative, this approach has limitations (only optimizes one team, ignores opponent adaptation) but demonstrates the integration between judge and genetic optimization mentioned in the README.


In [None]:
model.summary()

In [None]:
model_stratified.summary()

#### Treinando o Modelo e Avaliando

Nesta etapa serão treinados dois modelos, um utilizando os dados originais e outro utilizando dados stratificados. Após o treinamento, os modelos serão avaliados e comparados.

In [None]:
model_stratified.fit([X_train_p1_stratified, X_train_p2_stratified], y_train_stratified, epochs=100, verbose=1, batch_size=64, validation_split=0.2)
model.fit([X_train_p1, X_train_p2], y_train, epochs=100, verbose=1, batch_size=64, validation_split=0.2)

In [None]:
model_stratified.evaluate([X_test_p1_stratified, X_test_p2_stratified], y_test_stratified, verbose=1)

In [None]:
model.evaluate([X_test_p1, X_test_p2], y_test, verbose=1)

## Algoritmo genético

In [None]:
# Example of Judge prediction
score  =  judge.predict([['Regieleki', 'Pelipper', 'Floatzel', 'Poliwhirl', 'Flamigo', 'Skiploom']])
score[0][0]

In [None]:
#String to number
lookup_layer = judge.get_layer(index=0)
string_values = ['Regieleki', 'Pelipper', 'Floatzel', 'Poliwhirl', 'Flamigo', 'Skiploom']
indices = lookup_layer(string_values)
indices.numpy()

In [None]:
#Number to string
inverse_lookup_layer = keras.layers.StringLookup(vocabulary=lookup_layer.get_vocabulary(), invert=True)
string_values = inverse_lookup_layer(indices)
string_values.numpy()[0].decode()

In [None]:
# Random sample from vocabulary
import random

random.sample(vocabulary, 6)

In [None]:
import pygad

#fitness function
def fitness_func(ga_instance, solution, solution_idx):
   string_values = inverse_lookup_layer(solution).numpy()
   string_values = list(map(lambda x : x.decode(), string_values))
   score = judge.predict([string_values],verbose=0)
   return score[0][0]

# Generate a list of solutions
pokemons = [lookup_layer(random.sample(vocabulary, 6)).numpy() for i in range(100)]

# Create an instance of the GA class
ga_instance = pygad.GA(num_generations=10,
                     num_parents_mating=50,
                     fitness_func=fitness_func,
                     num_genes=6, 
                     gene_type=float,
                     mutation_type="random",
                     mutation_num_genes=6,
                     save_best_solutions=True,
                     initial_population=pokemons)

# Run the GA instance
ga_instance.run()


In [None]:
best_solutions = ga_instance.best_solutions
best_solutions_fit = ga_instance.best_solutions_fitness

In [None]:
print(best_solutions[0])
print(best_solutions_fit)

In [None]:
for i in range(best_solutions.shape[0]):
    string_values = inverse_lookup_layer(best_solutions[i]).numpy()
    string_values = map(lambda x : x.decode(), string_values)
    print(f"Pokemons : {list(string_values)}, fit: {best_solutions_fit[i]}")


### Como o modelo pode ser melhorado?

Durante o desenvolvimento do modelo, várias características que compõe um Pokémon foram abstraídas, sendo elas:

- `Nature`: A natureza do Pokémon, que aumenta um atributo e diminui outro.

- `Ability`: A habilidade do Pokémon, que pode alterar o funcionamento de alguns ataques.

- `Item`: O item que o Pokémon está segurando, que pode alterar o funcionamento de alguns ataques.

- `EVs`: Os EVs do Pokémon, que são pontos de atributos que podem ser distribuídos pelo jogador. Existem 510 EVs no total, sendo que cada atributo pode ter no máximo 252 EVs. Os EVs são:
    
    - `HP`: Pontos de vida.

    - `Atk`: Ataque.

    - `Def`: Defesa.

    - `SpA`: Ataque especial.

    - `SpD`: Defesa especial.

    - `Spe`: Velocidade.

- `IVs`: Os IVs do Pokémon, que são pontos de atributos que são distribuídos aleatoriamente quando o Pokémon é capturado. Existem 186 IVs no total, sendo que cada atributo pode ter no máximo 31 IVs. Os IVs são:
        
    - `HP`: Pontos de vida.

    - `Atk`: Ataque.

    - `Def`: Defesa.

    - `SpA`: Ataque especial.

    - `SpD`: Defesa especial.

    - `Spe`: Velocidade.

- `Moves`: Os ataques do Pokémon, que podem ser escolhidos pelo jogador. Existem mais de 900 ataques no total, sendo que cada Pokémon pode ter no máximo 4 ataques.


Para que o modelo possa ser melhorado, é necessário que essas características sejam adicionadas, fazendo com que o _judge_ possa diferenciar Pokémons de mesmo nome, porém com características diferentes.
