# Problem Introduction

As a fan of the pokemon franchise that grew up in the 90's I have often spent many sleepless nights wondering what it would be like to live in the pokemon universe. Most of the thoughts centered around what team I would assemble to accompany my 10 year old self on my personal pokemon adventure. With which companions would I attempt to win the Pokemon Master title. Using python and the pandas library I will attempt to formulate the strongst team that would give my 10 year old self a shot at the title while attempting to maintain some sense of realism.

## Questions to answer with dataset

1: What is the strongest non-legendary pokemon to have for each stat category?

2: On average what is the strongest pokemon type for each stat category?

3: On average which generation has the strongst pokemon? (loosely corresponds to region. ex: Gen 1 = Kanto, Gen 2 = Jhoto, etc.)

4: 

5: Based on the answers to the prevous questions what is the strongest possible combination of 6 non-legendary pokemon to bring on my pokemon adventure?

- Construct a shortlist of the strongest pokemon from the previous answers.
    - Prioritize pokemon that appear in the answers more often as they can fill multiple roles and will have better efficiency across different situations.

## Import Libraries and Data

In [27]:
import pandas as pd
import matplotlib.pyplot as plt
import re
import os

In [20]:
pk = pd.read_csv('pokemon_stats.csv')
pk.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


## Data Preprocessing

### Remove all NaN values from Data

In [21]:
pk['Type 2'] = pk['Type 2'].fillna('[None]')

### Ensure the NaN values are now gone from the data

In [22]:
print(pk.isna().sum())

#             0
Name          0
Type 1        0
Type 2        0
Total         0
HP            0
Attack        0
Defense       0
Sp. Atk       0
Sp. Def       0
Speed         0
Generation    0
Legendary     0
dtype: int64


### In the interest of preserving CPU space convert the type columns to category dtypes

In [23]:
cast_columns = ['Type 1', 'Type 2']
pk[cast_columns] = pk[cast_columns].astype('category')
pk.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   #           800 non-null    int64   
 1   Name        800 non-null    object  
 2   Type 1      800 non-null    category
 3   Type 2      800 non-null    category
 4   Total       800 non-null    int64   
 5   HP          800 non-null    int64   
 6   Attack      800 non-null    int64   
 7   Defense     800 non-null    int64   
 8   Sp. Atk     800 non-null    int64   
 9   Sp. Def     800 non-null    int64   
 10  Speed       800 non-null    int64   
 11  Generation  800 non-null    int64   
 12  Legendary   800 non-null    bool    
dtypes: bool(1), category(2), int64(9), object(1)
memory usage: 66.3+ KB


### DISCLAIMER : PERSONAL PREFERENCE

### Exclude all legendary pokemon and mega evolutions

This is a personal choice because I want to be as realistic as possible and I do not think a legendary pokemon would ever listen to or take direction from 10 year old me. If you disagree you can choose to ignore this section of code and bend the gods of time (Dialga) and space (Palkia) to your 10 year old will. Also I do not recognize the legitimacy of mega evolutions beyond a gameplay mechanic and will be excluding them from my fantasy.

In [31]:
pk = pk[~pk['Legendary']]
pk = pk[~pk['Name'].str.contains(r'Mega')]
pk.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
4,4,Charmander,Fire,[None],309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,[None],405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
9,7,Squirtle,Water,[None],314,44,48,65,50,64,43,1,False
10,8,Wartortle,Water,[None],405,59,63,80,65,80,58,1,False
11,9,Blastoise,Water,[None],530,79,83,100,85,105,78,1,False
13,10,Caterpie,Bug,[None],195,45,30,35,20,20,45,1,False


### Add the region column

I am aware that pokemon from previous generations may still appear in different regions, however I am mapping them to the generation they were introduced in because that would be the region they are predominant in.

In [36]:
gen_to_reg = {
    1: 'Kanto', 
    2: 'Jhoto',
    3: 'Hoenn',
    4: 'Sinnoh',
    5: 'Unova',
    6: 'Kalos',
    7: 'Alola'
}

pk['Region'] = pk['Generation'].map(gen_to_reg)
pk

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Region
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,Kanto
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,Kanto
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,Kanto
4,4,Charmander,Fire,[None],309,39,52,43,60,50,65,1,False,Kanto
5,5,Charmeleon,Fire,[None],405,58,64,58,80,65,80,1,False,Kanto
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
787,711,GourgeistSuper Size,Ghost,Grass,494,85,100,122,58,75,54,6,False,Kalos
788,712,Bergmite,Ice,[None],304,55,69,85,32,35,28,6,False,Kalos
789,713,Avalugg,Ice,[None],514,95,117,184,44,46,28,6,False,Kalos
790,714,Noibat,Flying,Dragon,245,40,30,35,45,40,55,6,False,Kalos


## Question 1: What is the strongest non-legendary pokemon to have for each stat category?

Taking the top 5 in each category

In [44]:
hp = pk.sort_values(by='HP', ascending=False).head()
hp

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Region
261,242,Blissey,Normal,[None],540,255,10,10,75,135,55,2,False,Jhoto
121,113,Chansey,Normal,[None],450,250,5,5,35,105,50,1,False,Kanto
217,202,Wobbuffet,Psychic,[None],405,190,33,58,33,58,33,2,False,Jhoto
351,321,Wailord,Water,[None],500,170,90,45,90,45,60,3,False,Hoenn
655,594,Alomomola,Water,[None],470,165,75,80,40,45,65,5,False,Unova


In [45]:
atk = pk.sort_values(by='Attack', ascending=False).head()
atk

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Region
454,409,Rampardos,Rock,[None],495,97,165,60,65,50,58,4,False,Sinnoh
313,289,Slaking,Normal,[None],670,150,160,100,95,65,100,3,False,Hoenn
750,681,AegislashBlade Forme,Steel,Ghost,520,60,150,50,150,50,60,6,False,Kalos
673,612,Haxorus,Dragon,[None],540,76,147,90,60,70,97,5,False,Unova
594,534,Conkeldurr,Fighting,[None],505,105,140,95,55,65,45,5,False,Unova


In [46]:
defense = pk.sort_values(by='Defense', ascending=False).head()
defense

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Region
230,213,Shuckle,Bug,Rock,505,20,10,230,10,230,5,2,False,Jhoto
223,208,Steelix,Steel,Ground,510,75,85,200,55,65,30,2,False,Jhoto
789,713,Avalugg,Ice,[None],514,95,117,184,44,46,28,6,False,Kalos
98,91,Cloyster,Water,Ice,525,50,95,180,85,45,70,1,False,Kanto
332,306,Aggron,Steel,Rock,530,70,110,180,60,60,50,3,False,Hoenn


In [47]:
spatk = pk.sort_values(by='Sp. Atk', ascending=False).head()
spatk

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Region
750,681,AegislashBlade Forme,Steel,Ghost,520,60,150,50,150,50,60,6,False,Kalos
670,609,Chandelure,Ghost,Fire,520,60,55,90,145,90,80,5,False,Unova
616,555,DarmanitanZen Mode,Fire,Psychic,540,105,30,105,140,105,55,5,False,Unova
70,65,Alakazam,Psychic,[None],500,55,50,45,135,95,120,1,False,Kanto
525,474,Porygon-Z,Normal,[None],535,85,80,70,135,75,90,4,False,Sinnoh


In [48]:
spdef = pk.sort_values(by='Sp. Def', ascending=False).head()
spdef

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Region
230,213,Shuckle,Bug,Rock,505,20,10,230,10,230,5,2,False,Jhoto
739,671,Florges,Fairy,[None],552,78,65,68,112,154,75,6,False,Kalos
751,681,AegislashShield Forme,Steel,Ghost,520,60,50,150,50,150,60,6,False,Kalos
776,706,Goodra,Dragon,[None],600,90,100,70,110,150,80,6,False,Kalos
528,476,Probopass,Rock,Steel,525,60,55,145,75,150,40,4,False,Sinnoh


In [49]:
spd = pk.sort_values(by='Speed', ascending=False).head()
spd

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Region
315,291,Ninjask,Bug,Flying,456,61,90,45,50,50,160,3,False,Hoenn
678,617,Accelgor,Bug,[None],495,80,70,40,100,60,145,5,False,Unova
109,101,Electrode,Electric,[None],480,60,50,70,80,80,140,1,False,Kanto
183,169,Crobat,Poison,Flying,535,85,90,80,70,80,130,2,False,Jhoto
153,142,Aerodactyl,Rock,Flying,515,80,105,65,60,75,130,1,False,Kanto


#### Concerns about results

Just because a pokemon scores the highest in a specific category does not mean that pokemon is the strongest competitor overall. Therefore when applying this answer to the 5th question I will take into account the highest total stat. For example, in the attack results Slaking is a more well-rounded pokemon with a negligible difference in attack values. So most people in this case would take Slaking over Rampardos, despite Rampardos having the higher attack stat.

#### **Answer**

**Health Points:**        Blissey

**Attack:**               Rampardos

**Defense:**              Shuckle

**Special Attack:**       Aegislash

**Special Defense:**      Shuckle

**Speed:**                Ninjask

## Question 2: On average what is the strongest primary pokemon type for each stat category?

In [55]:
pk.groupby(['Type 1']).mean('HP').round(2).sort_values(by='HP', ascending=False).head()

Unnamed: 0_level_0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Type 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Normal,317.48,387.84,75.95,70.86,56.9,53.71,61.01,69.41,3.04,0.0
Fairy,432.88,396.5,70.88,57.19,63.88,75.25,83.88,45.44,4.0,0.0
Ground,333.75,405.71,70.82,88.0,79.82,47.75,59.57,59.75,3.04,0.0
Ice,441.95,412.52,70.38,71.0,68.29,73.62,67.9,61.33,3.71,0.0
Water,307.33,412.19,70.31,70.26,69.77,70.57,66.63,64.65,2.9,0.0
