GROUP ID: 53
MEMBERS: peter.linder@ontariotechu.net

Learning Optimal Team Compositions in Pokemon: Perfect Pokemon Party Pattern Picker

Project Description
Problem: Given a partial team composition, determine the optimal next-pick Pokemon for the party.

Select a party composition to maximize team performance.

There is no pick order, so it must a supervised classification project.

In competitive Pokemon you have teams composed of 6 unique species, each with their own stats and elemental typing. Tournament usage statistics are available on tournament aggregating sites such as smogon, or pokedata.ovh The goal is to train a model capable of learning patterns in successful team compositions and in recommending the next optimal addition.

Given the sheer volume of stats a Pokemon can have, I'd like to limit the problem's features to species and elemental typing. There are 18 types, of which a single Pokemon can have 2. This results in 18 choose 2, or 153 type combinations. I'd similarly like to limit the data to competitively viable Pokemon-excluding obviously irrelevant picks, like Magikarp.

I'd ideally like to make a simple web interface for team input. I'd likely use flask cors.

Tournament Data obtained from: https://labmaus.net/tournaments/6587
The most recent Official VGC Tournament in Toronto

Images were scraped from porydex with a wget (who in turn ripped them directly from the game)
https://www.porydex.com/stats/2025-09/vgc-regulation-j/1760

https://www.pikalytics.com/pokedex/gen9vgc2025reghbo3




(PokemonEncoder + TeamPredictor + wrapper SixSlotModel)

In [None]:
# Imports, Path & version check

from platform import python_version
print(python_version())

import os
os.chdir("C:/Users/Linderwood/Desktop/ML")
print(os.getcwd())

from pathlib import Path

# notebook_dir = Path().resolve()
# print(notebook_dir)

from IPython.display import HTML # To display type images
import pandas as pd
import numpy as np

3.10.6
C:\Users\Linderwood\Desktop\ML


In [None]:
# Loading files

pokedf = pd.read_csv("showdown_pokemon_to_import.csv") # Specific glitched, redundant patterns, and event-only are excluded
typedf = pd.read_csv("types.csv")                      # Contains the 18 base elemental types
pokeTypedf = pd.read_csv("pokemon_types.csv")          # Contains type data for specific pokemon
typeMatchdf = pd.read_csv("type_matchups.csv")         # Contains the elemental combinations' strengths and resistness

#pokedf
#pokeTypedf

# img_path = "types/" # Folder containing the type images
# Pandas settings to display html images in notebook
# pd.set_option("display.max_colwidth", None)
# HTML(typedf.to_html(escape=False))


In [None]:
# Dataframe of all Pokemon and their types

# Remove Pokémon with missing names
pokedf_clean = pokedf.dropna(subset=['name'])

# Merge pokeTypedf with typedf to attach type names
pt = (
    pokeTypedf
    .merge(
        typedf[['id', 'identifier']],
        left_on='type_id',
        right_on='id',
        how='left'
    )
    .rename(columns={'identifier': 'type_name'})
)

# Pivot ids
tid_wide = (
    pt.pivot_table(index='pokemon_id',
                   columns='slot',
                   values='type_id',
                   aggfunc='first')
    .rename(columns={1: 'type_id_1', 2: 'type_id_2'})
    .reset_index()
)
# Pivot names
tname_wide = (
    pt.pivot_table(index='pokemon_id',
                   columns='slot',
                   values='type_name',
                   aggfunc='first')
    .rename(columns={1: 'type_1', 2: 'type_2'})
    .reset_index()
)

#  merge
ptf = (
    pokedf_clean[['pokemon_id', 'name']]
    .merge(tid_wide, on='pokemon_id', how='left')
    .merge(tname_wide, on='pokemon_id', how='left')
)

ptf = ptf[['name', 'type_1', 'type_2', 'pokemon_id','type_id_1', 'type_id_2']]
ptf

Unnamed: 0,name,type_1,type_2,pokemon_id,type_id_1,type_id_2
0,Bulbasaur,grass,poison,1,11.0,3.0
1,Ivysaur,grass,poison,2,11.0,3.0
2,Venusaur,grass,poison,3,11.0,3.0
3,Charmander,fire,,4,9.0,
4,Charmeleon,fire,,5,9.0,
...,...,...,...,...,...,...
1231,Gimmighoul-Roaming,ghost,,10222,7.0,
1232,Ursaluna-Bloodmoon,ground,,10223,4.0,
1233,Ogerpon-Wellspring,grass,water,10224,11.0,10.0
1234,Ogerpon-Hearthflame,grass,fire,10225,11.0,9.0


For each Pokémon:

Get its type_id_1 and (optional) type_id_2

For every attacking_type_id, fetch:

multiplier(type1)

multiplier(type2) if dual-typed

Final multiplier is the product.

That gives you a 18-dim vector per Pokémon:

[ x1, x2, ..., x18 ]


Where each entry is e.g.:

4.0 = double weakness

2.0 = weakness

1.0 = neutral

0.5 = resist

0.25 = double resist

0.0 = immunity

In [245]:
tm = typeMatchdf[['attacking_type_id', 'defending_type_id', 'multiplier']]

# Pivot to an 18 x 18 matrix of type coverage

# Rows = attacking type, columns = defending type
type_matrix = tm.pivot_table(
    index='attacking_type_id',
    columns='defending_type_id',
    values='multiplier',
    aggfunc='first'
)

type_matrix

# # For each Pokemon P, need to compute thgeir defensive vector
# defense_vector[t] = M[t, type_1] * ( M[t, type_2] if type_2 exists else 1 )

defending_type_id,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
attacking_type_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
0,1.0,1.0,1.0,1.0,1.0,0.5,1.0,0.0,0.5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,2.0,1.0,0.5,0.5,1.0,2.0,0.5,0.0,2.0,1.0,1.0,1.0,1.0,0.5,2.0,1.0,2.0,0.5
2,1.0,2.0,1.0,1.0,1.0,0.5,2.0,1.0,0.5,1.0,1.0,2.0,0.5,1.0,1.0,1.0,1.0,1.0
3,1.0,1.0,1.0,0.5,0.5,0.5,2.0,0.5,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0
4,1.0,1.0,0.0,2.0,1.0,2.0,0.5,1.0,2.0,2.0,1.0,0.5,2.0,1.0,1.0,1.0,1.0,1.0
5,1.0,0.5,2.0,1.0,0.5,1.0,2.0,1.0,0.5,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0
6,1.0,0.5,0.5,2.0,1.0,1.0,1.0,0.5,0.5,0.5,1.0,2.0,1.0,2.0,1.0,1.0,2.0,0.5
7,0.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,0.5,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.5,1.0
8,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.5,0.5,0.5,1.0,0.5,1.0,2.0,1.0,1.0,2.0
9,1.0,1.0,1.0,1.0,1.0,0.5,2.0,1.0,2.0,0.5,0.5,2.0,1.0,1.0,2.0,0.5,1.0,1.0


In [246]:
# Vectorized method to compute pokemon type defense vectors
# Start with an empty list to collect rows
rows = []

for _, row in ptf.iterrows():
    t1 = row['type_id_1']
    t2 = row['type_id_2']
    
    # Defensive multiplier from each attacking type against type_1
    vec1 = type_matrix[t1].values   # shape (18,)
    
    if pd.isna(t2):
        vec = vec1                        # Single-type Pokémon
    else:
        vec2 = type_matrix[int(t2)].values
        vec = vec1 * vec2                 # Multiply effects
    
    rows.append([row['pokemon_id'], row['name'], t1, t2] + list(vec))

# Build dataframe
cols = (
    ['pokemon_id', 'name', 'type_id_1', 'type_id_2'] +
    [f"{typedf.loc[i, 'identifier']}_dmg_taken" for i in type_matrix.index]
)

defense_df = pd.DataFrame(rows, columns=cols)
defense_df = defense_df.set_index('pokemon_id')
defense_df

Unnamed: 0_level_0,name,type_id_1,type_id_2,normal_dmg_taken,fighting_dmg_taken,flying_dmg_taken,poison_dmg_taken,ground_dmg_taken,rock_dmg_taken,bug_dmg_taken,...,steel_dmg_taken,fire_dmg_taken,water_dmg_taken,grass_dmg_taken,electric_dmg_taken,psychic_dmg_taken,ice_dmg_taken,dragon_dmg_taken,dark_dmg_taken,fairy_dmg_taken
pokemon_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Bulbasaur,11.0,3.0,1.0,0.5,2.0,1.0,1.0,1.0,4.0,...,1.0,2.0,0.50,0.25,0.5,2.0,2.0,1.0,1.0,0.5
2,Ivysaur,11.0,3.0,1.0,0.5,2.0,1.0,1.0,1.0,4.0,...,1.0,2.0,0.50,0.25,0.5,2.0,2.0,1.0,1.0,0.5
3,Venusaur,11.0,3.0,1.0,0.5,2.0,1.0,1.0,1.0,4.0,...,1.0,2.0,0.50,0.25,0.5,2.0,2.0,1.0,1.0,0.5
4,Charmander,9.0,,1.0,1.0,1.0,1.0,2.0,2.0,0.5,...,0.5,0.5,2.00,0.50,1.0,1.0,1.0,1.0,1.0,0.5
5,Charmeleon,9.0,,1.0,1.0,1.0,1.0,2.0,2.0,0.5,...,0.5,0.5,2.00,0.50,1.0,1.0,1.0,1.0,1.0,0.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10222,Gimmighoul-Roaming,7.0,,0.0,0.0,1.0,0.5,1.0,1.0,0.5,...,1.0,1.0,1.00,1.00,1.0,1.0,1.0,1.0,2.0,1.0
10223,Ursaluna-Bloodmoon,4.0,,1.0,1.0,1.0,0.5,1.0,0.5,1.0,...,1.0,1.0,2.00,2.00,0.0,1.0,2.0,1.0,1.0,1.0
10224,Ogerpon-Wellspring,11.0,10.0,1.0,1.0,2.0,2.0,0.5,1.0,2.0,...,0.5,1.0,0.25,1.00,1.0,1.0,1.0,1.0,1.0,1.0
10225,Ogerpon-Hearthflame,11.0,9.0,1.0,1.0,2.0,2.0,1.0,2.0,1.0,...,0.5,1.0,1.00,0.25,0.5,1.0,2.0,1.0,1.0,0.5


Next step is to make binary masks. A strong pokemon team has some resistance coverage for most types. The ideal 6th pokemon will have coverage for types the team is weak to.

In [247]:
weak_mask = (defense_df.filter(like='_dmg_taken') > 1).astype(int)
weak_mask.columns = [c.replace('_dmg_taken', '_weak') for c in weak_mask.columns]
weak_mask

Unnamed: 0_level_0,normal_weak,fighting_weak,flying_weak,poison_weak,ground_weak,rock_weak,bug_weak,ghost_weak,steel_weak,fire_weak,water_weak,grass_weak,electric_weak,psychic_weak,ice_weak,dragon_weak,dark_weak,fairy_weak
pokemon_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0
2,0,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0
3,0,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0
4,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0
5,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10222,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0
10223,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0
10224,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0
10225,0,0,1,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0


In [248]:
resist_mask = (defense_df.filter(like='_dmg_taken') < 1).astype(int)
resist_mask.columns = [c.replace('_dmg_taken', '_resist') for c in resist_mask.columns]
resist_mask

Unnamed: 0_level_0,normal_resist,fighting_resist,flying_resist,poison_resist,ground_resist,rock_resist,bug_resist,ghost_resist,steel_resist,fire_resist,water_resist,grass_resist,electric_resist,psychic_resist,ice_resist,dragon_resist,dark_resist,fairy_resist
pokemon_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1,0,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,1
2,0,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,1
3,0,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,1
4,0,0,0,0,0,0,1,0,1,1,0,1,0,0,0,0,0,1
5,0,0,0,0,0,0,1,0,1,1,0,1,0,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10222,1,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0
10223,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0
10224,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0
10225,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,1


In [249]:
mask_df = pd.concat([defense_df, weak_mask, resist_mask], axis=1)
mask_df

Unnamed: 0_level_0,name,type_id_1,type_id_2,normal_dmg_taken,fighting_dmg_taken,flying_dmg_taken,poison_dmg_taken,ground_dmg_taken,rock_dmg_taken,bug_dmg_taken,...,steel_resist,fire_resist,water_resist,grass_resist,electric_resist,psychic_resist,ice_resist,dragon_resist,dark_resist,fairy_resist
pokemon_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Bulbasaur,11.0,3.0,1.0,0.5,2.0,1.0,1.0,1.0,4.0,...,0,0,1,1,1,0,0,0,0,1
2,Ivysaur,11.0,3.0,1.0,0.5,2.0,1.0,1.0,1.0,4.0,...,0,0,1,1,1,0,0,0,0,1
3,Venusaur,11.0,3.0,1.0,0.5,2.0,1.0,1.0,1.0,4.0,...,0,0,1,1,1,0,0,0,0,1
4,Charmander,9.0,,1.0,1.0,1.0,1.0,2.0,2.0,0.5,...,1,1,0,1,0,0,0,0,0,1
5,Charmeleon,9.0,,1.0,1.0,1.0,1.0,2.0,2.0,0.5,...,1,1,0,1,0,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10222,Gimmighoul-Roaming,7.0,,0.0,0.0,1.0,0.5,1.0,1.0,0.5,...,0,0,0,0,0,0,0,0,0,0
10223,Ursaluna-Bloodmoon,4.0,,1.0,1.0,1.0,0.5,1.0,0.5,1.0,...,0,0,0,0,1,0,0,0,0,0
10224,Ogerpon-Wellspring,11.0,10.0,1.0,1.0,2.0,2.0,0.5,1.0,2.0,...,1,0,1,0,0,0,0,0,0,0
10225,Ogerpon-Hearthflame,11.0,9.0,1.0,1.0,2.0,2.0,1.0,2.0,1.0,...,1,0,0,1,1,0,0,0,0,1


In [None]:
# Sample 5-pokemon team fore testing
team_names = [
    "Ursaluna-Bloodmoon",
    "Urshifu-Rapid-Strike",
    "Smeargle",
    "Indeedee-F",
    "Amoonguss"
]

team_df = ptf[ptf['name'].isin(team_names)]
team_df

Unnamed: 0,name,type_1,type_2,pokemon_id,type_id_1,type_id_2
234,Smeargle,normal,,235,0.0,
590,Amoonguss,grass,poison,591,11.0,3.0
1189,Indeedee-F,psychic,normal,10179,13.0,0.0
1195,Urshifu-Rapid-Strike,fighting,water,10185,1.0,10.0
1232,Ursaluna-Bloodmoon,ground,,10223,4.0,


In [251]:
team_ids = team_df['pokemon_id'].tolist()

team_weak = weak_mask.loc[team_ids].max()
team_weak

normal_weak      0
fighting_weak    1
flying_weak      1
poison_weak      0
ground_weak      0
rock_weak        0
bug_weak         1
ghost_weak       0
steel_weak       0
fire_weak        1
water_weak       1
grass_weak       1
electric_weak    1
psychic_weak     1
ice_weak         1
dragon_weak      0
dark_weak        1
fairy_weak       1
dtype: int32

In [252]:
team_resist = resist_mask.loc[team_ids].max()
team_resist

normal_resist      0
fighting_resist    1
flying_resist      0
poison_resist      1
ground_resist      0
rock_resist        1
bug_resist         1
ghost_resist       1
steel_resist       1
fire_resist        1
water_resist       1
grass_resist       1
electric_resist    1
psychic_resist     1
ice_resist         1
dragon_resist      0
dark_resist        1
fairy_resist       1
dtype: int32

In [253]:
uncovered = (team_resist == 0).astype(int)  # Pokemon whose Types the team lacks resisting coverage of
scores = (resist_mask * uncovered).sum(axis=1)
scores

pokemon_id
1        0
2        0
3        0
4        0
5        0
        ..
10222    1
10223    0
10224    1
10225    0
10226    1
Length: 1236, dtype: int64

In [254]:
best_6th = scores.sort_values(ascending=False)

results = (
    #pokedf[['pokemon_id', 'name']]
    defense_df
    .join(scores.rename('score'), on='pokemon_id')
    .sort_values('score', ascending=False)
)

results.head(20)

Unnamed: 0_level_0,name,type_id_1,type_id_2,normal_dmg_taken,fighting_dmg_taken,flying_dmg_taken,poison_dmg_taken,ground_dmg_taken,rock_dmg_taken,bug_dmg_taken,...,fire_dmg_taken,water_dmg_taken,grass_dmg_taken,electric_dmg_taken,psychic_dmg_taken,ice_dmg_taken,dragon_dmg_taken,dark_dmg_taken,fairy_dmg_taken,score
pokemon_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
797,Celesteela,8.0,2.0,0.5,1.0,0.5,0.0,0.0,1.0,0.25,...,2.0,1.0,0.25,2.0,0.5,1.0,0.5,0.5,0.5,4
227,Skarmory,8.0,2.0,0.5,1.0,0.5,0.0,0.0,1.0,0.25,...,2.0,1.0,0.25,2.0,0.5,1.0,0.5,0.5,0.5,4
823,Corviknight,2.0,8.0,0.5,1.0,0.5,0.0,0.0,1.0,0.25,...,2.0,1.0,0.25,2.0,0.5,1.0,0.5,0.5,0.5,4
476,Probopass,5.0,8.0,0.25,4.0,0.25,0.0,4.0,0.5,0.5,...,1.0,2.0,1.0,1.0,0.5,0.5,0.5,0.5,0.5,3
600,Klang,8.0,,0.5,2.0,0.5,0.0,2.0,0.5,0.5,...,2.0,1.0,0.5,1.0,0.5,0.5,0.5,0.5,0.5,3
966,Revavroom,8.0,3.0,0.5,1.0,0.5,0.0,4.0,0.5,1.0,...,2.0,1.0,0.25,1.0,1.0,0.5,0.5,0.5,0.25,3
968,Orthworm,8.0,,0.5,2.0,0.5,0.0,2.0,0.5,0.5,...,2.0,1.0,0.5,1.0,0.5,0.5,0.5,0.5,0.5,3
385,Jirachi,8.0,13.0,0.5,1.0,0.5,0.0,2.0,0.5,1.0,...,2.0,1.0,0.5,1.0,0.25,0.5,0.5,1.0,0.5,3
983,Kingambit,16.0,8.0,0.5,4.0,0.5,0.0,2.0,0.5,1.0,...,2.0,1.0,0.5,1.0,0.0,0.5,0.5,0.25,1.0,3
990,Iron Treads,4.0,8.0,0.5,2.0,0.5,0.0,2.0,0.25,0.5,...,2.0,2.0,1.0,0.0,0.5,1.0,0.5,0.5,0.5,3


Unsurprisingly, Steel is the best defensive type when just looking at coverage.
Time to train a model to give some priority to tournament results over pure coverage.


Parse the tournament JSON and build a training dataset

Build the Pokémon feature vectors
-> add offensive types + tournament stats.

PyTorch Deep Sets model for unordered teams

Train it to predict the missing 6th Pokémon
Compare with coverage-only baseline

In [255]:
import json

with open("tournament_data.json", "r") as f:
    tournament_data = json.load(f)

#name_to_csv_id = dict(zip(pokedf["name"].str.lower(), pokedf["pokemon_id"]))
#name_to_csv_id

#type(tournament_data) # dict
#getattr(tournament_data, "keys", lambda: None)() #dict_keys(['composition', 'details', 'items', 'moves', 'pokemon', 'teams', 'tera_types'])

teams = tournament_data["teams"]
#list(teams.keys())[:10]
print("Total teams:", len(teams))

Total teams: 735


In [256]:
# Sample team record:

# {'country': 'us',
#   'id': 125213,
#   'placement': 1,
#   'player': 'Wolfe Glick',
#   'pokemon1': ['Amoonguss'],
#   'pokemon1Id': '591',
#   'pokemon2': ['Archaludon'],
#   'pokemon2Id': '1018',
#   'pokemon3': ['Kingdra'],
#   'pokemon3Id': '230',
#   'pokemon4': ['Politoed'],
#   'pokemon4Id': '186',
#   'pokemon5': ['Incineroar'],
#   'pokemon5Id': '727',
#   'pokemon6': ['Gothitelle'],
#   'pokemon6Id': '576',
#   'pokepaste': 'https://pokepast.es/9e19971b9e5bd817',
#   'record': '14-2'},

# Helper fn to grab the 6 ids
def extract_team_ids(team_entry):
    poke_ids = []
    for i in range(1, 7): # It seems some special forms of pokemon have separate tournament id from what was scraped off showdown. Was going to automate this, but there are few exceptions here. Only 18 special forms were used in tournament, so it just took a couple mins to manually change.
        entry = team_entry.get(f"pokemon{i}Id")
        if entry == "901-b":   # Ursaluna Bloodmoon
            entry = "10223"      
        elif entry == "038-a": # Ninetails Alola
            entry = "10115"      
        elif entry == "059-h": # Arcanine Hisui
            entry = "10193"
        elif entry == "549-h": # Liligant Hisui
            entry = "10202"
        elif entry == "876-f": # Indeedee female
            entry = "10179"
        elif entry == "157-h": # Typhlosion Hisui
            entry = "10196"
        elif entry == "110-g": # Weezing Galar
            entry = "10167"
        elif entry == "128-b": # Tauros Blaze
            entry = "10214"
        elif entry == "128-a": # Tauros Aqua
            entry = "10215"
        elif entry == "571-h": # Zoroark Hisui
            entry = "10205"
        elif entry == "479-w": # Rotom Wash
            entry = "10026"
        elif entry == "101-h": # Electrode Hisui
            entry = "10195"
        elif entry == "724-h": # Decidueye Hisui
            entry = "10210"
        elif entry == "199-g": # Slowking Galar
            entry = "10189"
        elif entry == "503-h": # Samurott Hisui
            entry = "10201"
        elif entry == "706-h": # Goodra Hisui
            entry = "10208"
        elif entry == "076-a": # Golem Alola
            entry = "10122"
        elif entry == "902-f": # Basculegion Female
            entry = "10211"
        if entry is None:
            return None
        poke_ids.append(int(entry))
    return poke_ids


In [257]:
import random

# Generate some training samples
samples = []

for entry in teams:
    poke_ids = extract_team_ids(entry)
    if poke_ids is None:
        continue
    
    if len(poke_ids)!=6:
        continue
    
    for i in range(6):
        input_team = poke_ids[:i] + poke_ids[i+1:] # Drop the ith pokemon
        target = poke_ids[i]                      # dropped pokemon is the target

        # randNum = random.randrange(0,5)
        # poke_ids.pop(randNum)
        # target = poke_ids[randNum]
        samples.append({
            "input": input_team, #poke_ids,
            "target": target,
            "placement": entry.get("placement", None),
            "country": entry.get("country", None),
            "player": entry.get("player", None),
            "record": entry.get("record", None)
        })

#teams
train_df = pd.DataFrame(samples)
train_df # 4410 sample teams

Unnamed: 0,input,target,placement,country,player,record
0,"[1018, 230, 186, 727, 576]",591,1,us,Wolfe Glick,14-2
1,"[591, 230, 186, 727, 576]",1018,1,us,Wolfe Glick,14-2
2,"[591, 1018, 186, 727, 576]",230,1,us,Wolfe Glick,14-2
3,"[591, 1018, 230, 727, 576]",186,1,us,Wolfe Glick,14-2
4,"[591, 1018, 230, 186, 576]",727,1,us,Wolfe Glick,14-2
...,...,...,...,...,...,...
4405,"[812, 423, 10205, 727, 10223]",941,735,us,Bryanne Berry,0-4
4406,"[812, 941, 10205, 727, 10223]",423,735,us,Bryanne Berry,0-4
4407,"[812, 941, 423, 727, 10223]",10205,735,us,Bryanne Berry,0-4
4408,"[812, 941, 423, 10205, 10223]",727,735,us,Bryanne Berry,0-4


In [None]:
# Might not get around to training with winrate
def parse_winrate(record):
    if not isinstance(record, str):
        return None
    try:
        w, l = record.split("-")
        w, l = int(w), int(l)
        if w + l == 0:
            return 0.0
        return w / (w + l)
    except:
        return None

def team_vector(poke_ids, ptf):
    rows = ptf.loc[poke_ids]
    return pd.concat([
        rows.sum(),     # coverage count
        rows.mean(),    # average defensive profile
    ])

# name_to_id = {
#     name.lower(): pid 
#     for pid, name in zip(defense_df.index, defense_df["name"])
# }

In [None]:
# Identify defensive columns
damage_cols = [c for c in defense_df.columns if c.endswith("_dmg_taken")]

# Remove dupes - keep first
defense_df_clean = defense_df.loc[~defense_df.index.duplicated(keep="first")]

#build defense_vectors
defense_vectors = {
    int(pid): defense_df_clean.loc[pid, damage_cols].values.astype(np.float32)
    for pid in defense_df_clean.index
}

# export df - save to csv
export_df = defense_df_clean[["name"] + damage_cols].copy()
export_df.index.name = "pokemon_id"
export_df.to_csv("defense_vectors_export.csv")
print("Saved:", export_df.shape)

# For each pokemon, need a vector of length 18 (18 elemental types)
# damage_cols = [c for c in defense_df.columns if c.endswith("_dmg_taken")]

# defense_vectors = {}

# for pid in defense_df.index.unique():
#     rows = defense_df.loc[pid, damage_cols]

#     # If multiple rows (e.g., shape (2,18)), take the first row
#     if isinstance(rows, pd.DataFrame):
#         row = rows.iloc[0]
#     else:
#         row = rows

#     defense_vectors[int(pid)] = row.values.astype(np.float32)

# # Replacing this -> Some duplicate special forms that had visual changes, but no other differences.
# defense_vectors = {
#     int(pid): defense_df.loc[pid, damage_cols].values.astype(np.float32)
#     for pid in defense_df.index
# }

# mappings
all_pokemon_ids = sorted(defense_vectors.keys())

id_to_index = {pid: i for i, pid in enumerate(all_pokemon_ids)}
index_to_id = {i: pid for pid, i in id_to_index.items()}

num_classes = len(all_pokemon_ids)

# save it
#train_df.to_json("train_dataset.json", orient="records")

Saved: (1232, 19)


In [268]:
# Datatset

import torch
import numpy as np
from torch.utils.data import Dataset

class TeamDataset(Dataset):
    def __init__(self, df, defense_vectors, id_to_index):
        self.df = df.reset_index(drop=True)
        self.defense_vectors = defense_vectors   # dict pid → (18,)
        self.id_to_index = id_to_index

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.loc[idx]

        team_ids = row["input"]        # list of 5 pokemon IDs
        target_pid = row["target"]     # single pokemon ID

        # (5, 18) stacked defense vectors
        x = torch.tensor(
            np.stack([self.defense_vectors[int(pid)] for pid in team_ids]),
            dtype=torch.float32
        )

        y = torch.tensor(
            self.id_to_index[int(target_pid)],
            dtype=torch.long
        )

        # Parse winrate: “W-L”
        rec = row.get("record", None)
        if isinstance(rec, str) and "-" in rec:
            w, l = rec.split("-")
            winrate = float(w) / (float(w) + float(l))
        else:
            winrate = 0.0

        winrate = torch.tensor([winrate], dtype=torch.float32)

        return x, y, winrate

In [269]:
# Build the loaders

from torch.utils.data import DataLoader, random_split

dataset = TeamDataset(train_df, defense_vectors, id_to_index)

train_size = int(0.9 * len(dataset))
val_size   = len(dataset) - train_size

train_ds, val_ds = random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train_ds, batch_size=64, shuffle=True)
val_loader   = DataLoader(val_ds, batch_size=64, shuffle=False)

import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from tqdm import tqdm

In [270]:
import torch.nn as nn
import torch.nn.functional as F

class PokemonEncoder(nn.Module):
    def __init__(self, in_dim=18, hidden=256, mlp_layers=2, dropout=0.1):
        super().__init__()
        layers = []
        dim = in_dim
        for _ in range(mlp_layers):
            layers.append(nn.Linear(dim, hidden))
            layers.append(nn.ReLU(inplace=True))
            layers.append(nn.Dropout(dropout))
            dim = hidden
        self.mlp = nn.Sequential(*layers)

    def forward(self, x):   # x: (B, 5, in_dim)
        B, S, D = x.shape
        x = x.reshape(B*S, D)
        out = self.mlp(x)
        return out.reshape(B, S, -1)


class TeamPredictor(nn.Module):
    def __init__(self, hidden=256, num_classes=1000, dropout=0.1):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Linear(hidden, hidden),
            nn.ReLU(inplace=True),
            nn.Dropout(dropout),
            nn.Linear(hidden, num_classes)
        )

    def forward(self, x):
        return self.fc(x)


class SixSlotModel(nn.Module):
    def __init__(self, in_dim, hidden, num_classes):
        super().__init__()
        self.encoder = PokemonEncoder(in_dim=in_dim, hidden=hidden)
        self.predictor = TeamPredictor(hidden=hidden, num_classes=num_classes)

    # x: (B, 5, dim)
    # wr: (B,1) or none
    def forward(self, x, winrate=None):
        enc = self.encoder(x)           # (B,5,H)
        pooled = enc.sum(dim=1)         # Deep Sets sum pooling     # Should maybe do mean pooling: pooled = enc.mean(dim=1)
        logits = self.predictor(pooled) # (B, num_classes)
        return logits

In [None]:
from collections import defaultdict
import math

def topk_accuracy(logits, targets, ks=(1,5)):
    maxk = max(ks)
    _, pred = logits.topk(maxk, dim=1, largest=True, sorted=True)  # (B, maxk)
    pred = pred.t()                                                # (maxk, B)
    correct = pred.eq(targets.view(1, -1).expand_as(pred))         # (maxk, B)
    res = {}
    for k in ks:
        correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
        res[k] = (correct_k.item() / targets.size(0))
    return res


def train_epoch(model, loader, optimizer, device):
    model.train()
    total_loss = 0.0
    total_samples = 0
    accs = defaultdict(float)

    for batch in loader:
        # dataset returns (x, y, winrate)
        x, y, winrate = batch
        x = x.to(device)            # (B,5,18)
        y = y.to(device)            # (B,)
        # winrate = winrate.to(device)

        optimizer.zero_grad()
        logits = model(x, winrate=None)
        loss = F.cross_entropy(logits, y)
        loss.backward()
        optimizer.step()

        bs = x.size(0)
        total_loss += loss.item() * bs
        total_samples += bs

        # metrics
        tacc = topk_accuracy(logits.detach().cpu(), y.detach().cpu(), ks=(1,5))
        accs[1] += tacc[1] * bs
        accs[5] += tacc[5] * bs

    avg_loss = total_loss / total_samples
    avg_top1 = accs[1] / total_samples
    avg_top5 = accs[5] / total_samples
    return avg_loss, avg_top1, avg_top5


def eval_epoch(model, loader, device):
    model.eval()
    total_loss = 0.0
    total_samples = 0
    accs = defaultdict(float)

    with torch.no_grad():
        for batch in loader:
            x, y, winrate = batch
            x = x.to(device)
            y = y.to(device)
            logits = model(x, winrate=None)
            loss = F.cross_entropy(logits, y)

            bs = x.size(0)
            total_loss += loss.item() * bs
            total_samples += bs

            tacc = topk_accuracy(logits.detach().cpu(), y.detach().cpu(), ks=(1,5))
            accs[1] += tacc[1] * bs
            accs[5] += tacc[5] * bs

    avg_loss = total_loss / total_samples
    avg_top1 = accs[1] / total_samples
    avg_top5 = accs[5] / total_samples
    return avg_loss, avg_top1, avg_top5

# bad = [(pid, vec.shape) for pid, vec in defense_vectors.items() if vec.shape != (18,)]
# bad[:20]
# [(10072, (2, 18)), (10213, (2, 18)), (10214, (2, 18)), (10215, (2, 18))]         # Looks like special forms are still causing problems...

# Floette-Eternal 10072 Floette-Eternal-Flower 10072 Tauros-Paldea 10213 Tauros-Paldea-Combat 10213 Tauros-Paldea-Blaze 10214 Tauros-Paldea-Fire 10214 Tauros-Paldea-Aqua 10215 Tauros-Paldea-Water 10215
# These are actually just visual form changes. The pokemon of the same id have identical stats and resistances. I'll just use the first and drop the second row, so it's not 2-D.

In [272]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

in_dim = 18
hidden = 256
num_classes = len(all_pokemon_ids)
model = SixSlotModel(in_dim=in_dim, hidden=hidden, num_classes=num_classes) # Should consider using winrates. flag like, use_winrate=False
model = model.to(device)

optimizer = Adam(model.parameters(), lr=1e-3, weight_decay=1e-5)

best_val = math.inf
for epoch in range(1, 20):
    train_loss, train_top1, train_top5 = train_epoch(model, train_loader, optimizer, device)
    val_loss, val_top1, val_top5 = eval_epoch(model, val_loader, device)

    print(f"Epoch {epoch}  Train loss {train_loss:.4f}  top1 {train_top1:.4f} top5 {train_top5:.4f}")
    print(f"           Val   loss {val_loss:.4f}  top1 {val_top1:.4f} top5 {val_top5:.4f}")

    if val_loss < best_val:
        best_val = val_loss
        torch.save({
            "model_state": model.state_dict(),
            "optimizer": optimizer.state_dict(),
            "id_to_index": id_to_index,
            "index_to_id": index_to_id
        }, "sixslot_best.pt")

Epoch 1  Train loss 4.2895  top1 0.0821 top5 0.2802
           Val   loss 4.0369  top1 0.1270 top5 0.3447
Epoch 2  Train loss 3.6376  top1 0.1587 top5 0.4099
           Val   loss 3.7995  top1 0.1814 top5 0.4467
Epoch 3  Train loss 3.3269  top1 0.2288 top5 0.5044
           Val   loss 3.5840  top1 0.2404 top5 0.4966
Epoch 4  Train loss 3.1052  top1 0.2832 top5 0.5581
           Val   loss 3.4575  top1 0.2562 top5 0.5329
Epoch 5  Train loss 2.8954  top1 0.3167 top5 0.6067
           Val   loss 3.3189  top1 0.3311 top5 0.5646
Epoch 6  Train loss 2.7473  top1 0.3545 top5 0.6316
           Val   loss 3.2655  top1 0.3469 top5 0.6213
Epoch 7  Train loss 2.6341  top1 0.3699 top5 0.6536
           Val   loss 3.1789  top1 0.3197 top5 0.6213
Epoch 8  Train loss 2.5643  top1 0.3741 top5 0.6684
           Val   loss 3.2021  top1 0.3175 top5 0.6259
Epoch 9  Train loss 2.4752  top1 0.3966 top5 0.6896
           Val   loss 3.0723  top1 0.3787 top5 0.6281
Epoch 10  Train loss 2.4017  top1 0.3993 top5 

36% top1 seems reasonable

In [None]:
# pokedf - dataframe to map pid->name 
def recommend_sixth(model, team_ids, defense_vectors, index_to_id, pokedf=None, top_k=10, device="cpu"):
    model.eval()
    arr = np.stack([defense_vectors[pid] for pid in team_ids]).astype(np.float32)
    x = torch.tensor(arr).unsqueeze(0).to(device)  # (1,5,18)

    with torch.no_grad():
        logits = model(x)
        probs = F.softmax(logits, dim=1).cpu().numpy().squeeze(0)
        
    top_idx = probs.argsort()[::-1][:top_k]
    results = []
    for idx in top_idx:
        pid = index_to_id[idx]
        name = None
        if pokedf is not None:
            try:
                name = pokedf.loc[pokedf['pokemon_id'] == pid, 'name'].values[0]
            except Exception:
                name = None
        results.append((pid, name, float(probs[idx])))

    return results

In [None]:
# load saved model
ck = torch.load("sixslot_best.pt", map_location=device)
model.load_state_dict(ck["model_state"])
model.to(device)

sample_team5 = train_df.iloc[0]["input"]  # their 5 IDs
top = recommend_sixth(model, sample_team5, defense_vectors, index_to_id, pokedf=pokedf, top_k=10, device=device)
print(sample_team5)
print(top)
# Archaludon, Kingdra, Politoed, Incineroar, Gothitelle
# Recommends
# (979, 'Annihilape', 0.24940809607505798), (903, 'Sneasler', 0.24432924389839172), (279, 'Pelipper', 0.07007797807455063), (10223, 'Ursaluna-Bloodmoon', 0.06189211457967758), (764, 'Comfey', 0.05251350626349449), (186, 'Politoed', 0.04214489459991455)...

[1018, 230, 186, 727, 576]
[(979, 'Annihilape', 0.24940809607505798), (903, 'Sneasler', 0.24432924389839172), (279, 'Pelipper', 0.07007797807455063), (10223, 'Ursaluna-Bloodmoon', 0.06189211457967758), (764, 'Comfey', 0.05251350626349449), (186, 'Politoed', 0.04214489459991455), (934, 'Garganacl', 0.03696926310658455), (547, 'Whimsicott', 0.03405490145087242), (1000, 'Gholdengo', 0.027404852211475372), (576, 'Gothitelle', 0.020749319344758987)]


It's recommending strong Pokemon that often pair with that core of 5