# Data Preparation

In [51]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

### Reading in the original dataframe

In [52]:
pokedf = pd.read_csv('pokemon.csv')
pokedf.head()

Unnamed: 0,Dex No,Name,Base Name,Type 1,Type 2,BST,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed
0,1,Bulbasaur,Bulbasaur,GRASS,POISON,318,45,49,49,65,65,45
1,2,Ivysaur,Ivysaur,GRASS,POISON,405,60,62,63,80,80,60
2,3,Venusaur,Venusaur,GRASS,POISON,525,80,82,83,100,100,80
3,3,Mega Venusaur,Venusaur,GRASS,POISON,625,80,100,123,122,120,80
4,4,Charmander,Charmander,FIRE,-,309,39,52,43,60,50,65


Clearly we see that there is a placeholder for the "Type 2 column" for some Pokemon. Let's make sure there are no missing values in the dataframe.

Missing Values (and Pokemon!) Check

In [53]:
pokedf.isna().sum()

Dex No         0
Name           0
Base Name      0
Type 1         0
Type 2         0
BST            0
HP             0
Attack         0
Defense        0
Sp. Attack     0
Sp. Defense    0
Speed          0
dtype: int64

We should check whether the information is up to date, or in other words, how many Pokemon are missing.

In [54]:
pokedf.tail()

Unnamed: 0,Dex No,Name,Base Name,Type 1,Type 2,BST,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed
1189,1006,Iron Valiant,Iron Valiant,FAIRY,FIGHTING,590,74,130,90,120,60,116
1190,1007,Koraidon,Koraidon,FIGHTING,DRAGON,670,100,135,115,85,100,135
1191,1008,Miraidon,Miraidon,ELECTRIC,DRAGON,670,100,85,100,135,115,135
1192,1009,Walking Wake,Walking Wake,WATER,DRAGON,590,99,83,91,125,83,109
1193,1010,Iron Leaves,Iron Leaves,GRASS,PSYCHIC,590,90,130,88,70,108,104


The dataset is missing a couple pokemon, which normally wouldn't be too much of an issue, but a disproportionate amount of the Pokemon belong to special classes, so I will add them.

### Adding New Pokemon

In [55]:

# new Pokémon information
new_pokedata = {
    'Dex No': [1011, 1012, 1013, 1014, 1015, 1016, 1017],
    'Name': ["Dipplin", "Poltchageist", "Sinistcha", "Okidogi", "Munkidori", "Fezandipiti", "Ogerpon"],
    'Base Name': ["Dipplin", "Poltchageist", "Sinistcha", "Okidogi", "Munkidori", "Fezandipiti", "Ogerpon"],
    'Type 1': ["Grass", "Grass", "Grass", "Poison", "Poison", "Poison", "Grass"],
    'Type 2': ["Dragon", "Ghost", "Ghost", "Fight", "Psychic", "Fairy", "-"],
    'BST': [485, 308, 508, 555, 555, 555, 550],
    'HP': [80, 40, 71, 88, 88, 88, 80],
    'Attack': [80, 45, 60, 128, 75, 91, 120],
    'Defense': [110, 45, 106, 115, 66, 82, 84],
    'Sp. Attack': [95, 74, 121, 58, 130, 70, 60],
    'Sp. Defense': [80, 54, 80, 86, 90, 125, 96],
    'Speed': [40, 50, 70, 80, 106, 99, 110]
}

# store new data in another dataframe
new_pokedf = pd.DataFrame(new_pokedata)

# append the new pokemon dataframe to the main one
pokedf = pd.concat([pokedf, new_pokedf], ignore_index=True)

# check
pokedf.tail(9)


Unnamed: 0,Dex No,Name,Base Name,Type 1,Type 2,BST,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed
1192,1009,Walking Wake,Walking Wake,WATER,DRAGON,590,99,83,91,125,83,109
1193,1010,Iron Leaves,Iron Leaves,GRASS,PSYCHIC,590,90,130,88,70,108,104
1194,1011,Dipplin,Dipplin,Grass,Dragon,485,80,80,110,95,80,40
1195,1012,Poltchageist,Poltchageist,Grass,Ghost,308,40,45,45,74,54,50
1196,1013,Sinistcha,Sinistcha,Grass,Ghost,508,71,60,106,121,80,70
1197,1014,Okidogi,Okidogi,Poison,Fight,555,88,128,115,58,86,80
1198,1015,Munkidori,Munkidori,Poison,Psychic,555,88,75,66,130,90,106
1199,1016,Fezandipiti,Fezandipiti,Poison,Fairy,555,88,91,82,70,125,99
1200,1017,Ogerpon,Ogerpon,Grass,-,550,80,120,84,60,96,110


### Extra Columns and Data

##### Generation \#

As later we want to do some analysis on the average power by generation, lets add in a column for what generation the pokemon are from. We take advantage of online sources to get the cutoffs for each generation, and account for the fact that alternate forms exist, so we base the cutoffs off of Dex No, not number of rows.

In [56]:
# corresponding Pokedex numbers of the LAST pokemon in each generation 1-9
cutoffs = [151, 251, 386, 493, 649, 721, 809, 905, 1017]

# function to determine the generation based on pokedex #, aka the 'Dex No' column
def get_generation(row):
    for i, cutoff in enumerate(cutoffs):
        if row['Dex No'] <= cutoff:
            return i + 1  # increment gens
    return 9

# Create a new column 'Generation' based on the 'Dex No' and cutoffs
pokedf['Generation'] = pokedf.apply(get_generation, axis=1)


Certain forms of earlier Pokemon are added in later than the Pokemon was originally released. For the purposes of this analysis, those forms will be considered a part of the generation that they were added in, rather than the generation of the original Pokemon. This is not arbitrary, but instead because generation # is used as a chronological measurement of Pokemon in this analysis.

In [57]:
# All mega forms were added in generation 6, so they will be treated as generation 6 Pokemon
# This will be a notable difference because mega forms tend to be stronger than their base forms
pokedf.loc[pokedf['Base Name'].str.contains("Mega "), 'Generation'] = 6

# Tauros alternate forms added in gen 9
pokedf.loc[pokedf['Base Name'].str.contains(" Breed"), 'Generation'] = 9

##### Special Classes

We also want to add columns denoting whether a pokemon is legendary, mythical, or what I will define as "Notable". Notable pokemon will be any pokemon that are special in some way but do not fit under the banner of mythical or legendary. This includes Ultra Beasts in from generation 7 and Paradox Pokemon from generation 9.

Note: Some people consider Ultra Beasts to be "sub legendaries", but for the purposes of this analysis I will put them under the "notable" column since the debate is still out. Pseudo legendaries will be counted, because they are arguably "notable", even though there is exactly one per generation.

In [58]:
# let's first add a legendary, mythical, and notable boolean columns and fill them with false

pokedf['Legendary'] = False
pokedf['Mythical'] = False
pokedf['Notable'] = False

# manually created strings with all the names
legendary_names = ["Articuno", "Zapdos", "Moltres", "Raikou", "Entei", "Suicune", "Regirock", "Regice", "Registeel", "Latias", "Latios", "Uxie", "Mespirit", "Azelf", "Heatran", "Regigigas", "Cresselia", "Cobalion", "Terrakion", "Virizion", "Tornadus", "Thundurus", "Landorus", "Type: Null", "Silvally", "Tapu Koko", "Tapu Lele", "Tapu Bulu", "Tapu Fini", "Kubfu", "Urshifu", "Regieleki", "Regidrago", "Glastrier", "Spectrier", "Enamorus", "Wo-Chien", "Chien-Pao", "Ting-Lu", "Chi-Yu", "Okidogi", "Munkidori", "Fezandipiti", "Ogerpon", "Mewtwo", "Lugia", "Ho-Oh", "Kyogre", "Groudon", "Rayquaza", "Dialga", "Palkia", "Giratina", "Reshiram", "Zekrom", "Kyurem", "Xerneas", "Yveltal", "Zygarde", "Cosmog", "Cosmoem", "Solgaleo", "Lunala", "Necrozma", "Zacian", "Zamazenta", "Eternatus", "Calyrex", "Koraidon", "Miraidon"]
mythical_names = ["Mew", "Celebi", "Jirachi", "Deoxys", "Phione", "Manaphy", "Darkrai", "Shaymin", "Arceus", "Victini", "Keldeo", "Meloetta", "Genesect", "Diancie", "Hoopa", "Volcanion", "Magearna", "Marshadow", "Zeraora", "Meltan", "Melmetal", "Zarude"]
notable_names = ["Dragonite", "Tyranitar", "Salamence", "Metagross", "Garchomp", "Hydreigon", "Goodra", "Kommo-o", "Dragapult", "Baxcalibur", "Nihilego", "Buzzwole", "Pheromosa", "Xurkitree", "Celesteela", "Kartana", "Guzzlord", "Poipole", "Stakataka", "Blacephalon", "Great Tusk", "Scream Tail", "Brute Bonnet", "Flutter Mane", "Slither Wing", "Sandy Shocks", "Iron Treads", "Iron Bundle", "Iron Hands", "Iron Jugulis", "Iron Moth", "Iron Thorns", "Roaring Moon", "Iron Valiant", "Walking Wake", "Iron Leaves"]

# set 'Legendary' column to True for legendary Pokémon
pokedf.loc[pokedf['Base Name'].isin(legendary_names), 'Legendary'] = True

# set 'Mythical' column to True for mythical Pokémon
pokedf.loc[pokedf['Base Name'].isin(mythical_names), 'Mythical'] = True

# set 'Notable' column to True for notable Pokémon
pokedf.loc[pokedf['Base Name'].isin(notable_names), 'Notable'] = True


In [59]:
pokedf.head(963)

Unnamed: 0,Dex No,Name,Base Name,Type 1,Type 2,BST,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed,Generation,Legendary,Mythical,Notable
0,1,Bulbasaur,Bulbasaur,GRASS,POISON,318,45,49,49,65,65,45,1,False,False,False
1,2,Ivysaur,Ivysaur,GRASS,POISON,405,60,62,63,80,80,60,1,False,False,False
2,3,Venusaur,Venusaur,GRASS,POISON,525,80,82,83,100,100,80,1,False,False,False
3,3,Mega Venusaur,Venusaur,GRASS,POISON,625,80,100,123,122,120,80,1,False,False,False
4,4,Charmander,Charmander,FIRE,-,309,39,52,43,60,50,65,1,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
958,800,Necrozma,Necrozma,PSYCHIC,-,600,97,107,101,127,89,79,7,True,False,False
959,800,Dusk Mane Necrozma,Necrozma,PSYCHIC,STEEL,680,97,157,127,113,109,77,7,True,False,False
960,800,Dawn Wings Necrozma,Necrozma,PSYCHIC,GHOST,680,97,113,109,157,127,77,7,True,False,False
961,800,Ultra Necrozma,Necrozma,PSYCHIC,DRAGON,754,97,167,97,167,97,129,7,True,False,False
