### Prepping Data Challenge: Pokémon Evolution Stats (Week 8)

This week's Challenge be diving into the Pokédex to explore how Pokémon combat stats change when they evolve.

### Requirements
- Import the data (excel file)
- From pkmn_stats dataset remove the columns height, weight and evolves from
- Pivot (wide to long) pkmn stats so that hp, attack, defense, special_attack, special_defense, and speed become a column called 'combat_factors'
- Using the evolutions data look up the combat_factors for each Pokémon at each stage, making sure that the combat_factors match across the row, i.e. we should be able to see the hp for Bulbasaur, Ivysaur and Venusaur on one row
- Remove any columns for 'pokedex_number' and 'gen_introduced' that were from joins at Stage 2 & 3
- If a Pokémon doesn't evolve remove it from the dataset
- Find the combat power values relating to the Pokémon's last evolution stage
- Sum together each Pokémon's combat_factors
- Find the percentage increase in combat power from the first & last evolution stage
- Sort the dataset, ascending by percentage increase
- Output the data
- Which Pokémon stats decrease from evolving?

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Input the data.
with pd.ExcelFile('wk8-Input.xlsx') as xlsx:
    stats = pd.read_excel(xlsx, 'pkmn_stats')
    evol = pd.read_excel(xlsx, 'pkmn_evolutions')

In [3]:
stats.sample(n=10, random_state = 40)

Unnamed: 0,name,pokedex_number,gen_introduced,hp,attack,defense,special_attack,special_defense,speed,height,weight,evolves_from
802,Poipole,803,7,67,73,67,73,67,73,6,18,
888,Zamazenta,889,8,92,130,115,80,115,138,29,2100,
104,Marowak,105,1,60,80,110,50,80,45,10,450,Cubone
472,Mamoswine,473,4,110,130,80,70,60,80,25,2910,Piloswine
829,Eldegoss,830,8,60,50,90,80,120,60,5,25,Gossifleur
524,Boldore,525,5,70,105,105,50,40,20,9,1020,Roggenrola
635,Larvesta,636,5,55,85,55,50,55,60,11,288,
345,Cradily,346,3,86,81,97,81,107,43,15,604,Lileep
493,Victini,494,5,100,100,100,100,100,100,4,40,
423,Ambipom,424,4,75,100,66,60,66,115,12,203,Aipom


In [4]:
#From pkmn_stats dataset remove the columns height, weight and evolves from
stats.drop(columns=['height','weight',"evolves_from"], inplace=True)

In [5]:
#Pivot (wide to long) pkmn stats so that hp, attack, defense, special_attack, special_defense, 
#and speed become a column called 'combat_factors'
stats_pivot = pd.melt(stats, id_vars=['name','pokedex_number','gen_introduced'], var_name='combat_factors')

In [6]:
#Find the combat power values relating to the Pokémon's last evolution stage
#Sum together each Pokémon's combat_factors
stats_pivot['combat_power'] = stats_pivot.groupby('name')['value'].transform('sum')

In [7]:
stats_pivot.head()

Unnamed: 0,name,pokedex_number,gen_introduced,combat_factors,value,combat_power
0,Bulbasaur,1,1,hp,45,318
1,Ivysaur,2,1,hp,60,405
2,Venusaur,3,1,hp,80,525
3,Charmander,4,1,hp,39,309
4,Charmeleon,5,1,hp,58,405


In [8]:
power_dict = dict(zip(stats_pivot['name'], stats_pivot["combat_power"]))

In [9]:
evol.head()

Unnamed: 0,Stage_1,Stage_2,Stage_3
0,Bulbasaur,Ivysaur,Venusaur
1,Charmander,Charmeleon,Charizard
2,Squirtle,Wartortle,Blastoise
3,Caterpie,Metapod,Butterfree
4,Weedle,Kakuna,Beedrill


In [10]:
evol.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 493 entries, 0 to 492
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Stage_1  493 non-null    object
 1   Stage_2  330 non-null    object
 2   Stage_3  110 non-null    object
dtypes: object(3)
memory usage: 11.7+ KB


In [11]:
#If a Pokémon doesn't evolve remove it from the dataset
evol.dropna(subset=["Stage_2"], inplace=True)

In [12]:
evol.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 330 entries, 0 to 485
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Stage_1  330 non-null    object
 1   Stage_2  330 non-null    object
 2   Stage_3  110 non-null    object
dtypes: object(3)
memory usage: 10.3+ KB


In [13]:
#Using the evolutions data look up the combat_factors for each Pokémon at each stage, 
#making sure that the combat_factors match across the row, 
#i.e. we should be able to see the hp for Bulbasaur, Ivysaur and Venusaur on one row
df = pd.merge(evol, stats_pivot, how='left', left_on='Stage_1', right_on='name').drop(['name',"combat_factors","value"], axis =1)\
        .rename(columns={"combat_power":"initial_combat_power"})

In [14]:
df.drop_duplicates(inplace=True)

In [15]:
df.head(10)

Unnamed: 0,Stage_1,Stage_2,Stage_3,pokedex_number,gen_introduced,initial_combat_power
0,Bulbasaur,Ivysaur,Venusaur,1,1,318
6,Charmander,Charmeleon,Charizard,4,1,309
12,Squirtle,Wartortle,Blastoise,7,1,314
18,Caterpie,Metapod,Butterfree,10,1,195
24,Weedle,Kakuna,Beedrill,13,1,195
30,Pidgey,Pidgeotto,Pidgeot,16,1,251
36,Rattata,Raticate,,19,1,253
42,Spearow,Fearow,,21,1,262
48,Ekans,Arbok,,23,1,288
54,Pichu,Pikachu,Raichu,172,2,205


In [16]:
#Find the combat power values relating to the Pokémon's last evolution stage
df['stage2evol'] = df["Stage_2"].map(power_dict)
df['stage3evol'] = df["Stage_3"].map(power_dict)
df['final_combat_power'] = np.where(pd.isnull(df["Stage_3"]), df['stage2evol'], df['stage3evol'])
df.drop(columns=['stage2evol',"stage3evol"], axis =1, inplace=True)

In [17]:
#Find the percentage increase in combat power from the first & last evolution stage
df["combat_power_increase"] = (df["final_combat_power"]-df["initial_combat_power"]) / df["initial_combat_power"]

In [18]:
#Sort the dataset, ascending by percentage increase
df.sort_values("combat_power_increase", inplace=True)

In [19]:
df = df[["Stage_1","Stage_2","Stage_3","pokedex_number","gen_introduced","initial_combat_power","final_combat_power","combat_power_increase"]]

In [20]:
df.head(10)

Unnamed: 0,Stage_1,Stage_2,Stage_3,pokedex_number,gen_introduced,initial_combat_power,final_combat_power,combat_power_increase
810,Nincada,Shedinja,,290,3,266,236.0,-0.112782
372,Scyther,Scizor,,123,1,500,500.0,0.0
378,Scyther,Kleavor,,123,1,500,500.0,0.0
1782,Type: Null,Silvally,,772,7,534,570.0,0.067416
696,Stantler,Wyrdeer,,234,2,465,525.0,0.129032
600,Misdreavus,Mismagius,,200,2,435,495.0,0.137931
1296,Basculin,Basculegion,,550,5,460,530.0,0.152174
630,Qwilfish,Overqwil,,211,2,440,510.0,0.159091
618,Gligar,Gliscor,,207,2,430,510.0,0.186047
636,Sneasel,Weavile,,215,2,430,510.0,0.186047


In [21]:
#output the dataset
df.to_csv('wk8-output.csv', index=False)