# Pokemon Cleaning Walkthrough

The previous version of this notebook used a json file that I found to be missing a considerable amount of data and so this notebook will now be using a new dataset taken from [here](https://www.kaggle.com/rounakbanik/pokemon).

Also, in the last notebook we were mostly using base python plus matplotlib to accomplish the tasks. I'm currently working on learning pandas so let's approach this data with those tools.

Let's load in the data and see what it looks like.

## Acquire Data

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_columns', None)

In [2]:
pokemon = pd.read_csv('pokemon.csv')
pokemon.head()

Unnamed: 0,abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,against_grass,against_ground,against_ice,against_normal,against_poison,against_psychic,against_rock,against_steel,against_water,attack,base_egg_steps,base_happiness,base_total,capture_rate,classfication,defense,experience_growth,height_m,hp,japanese_name,name,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
0,"['Overgrow', 'Chlorophyll']",1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,0.25,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.5,49,5120,70,318,45,Seed Pokémon,49,1059860,0.7,45,Fushigidaneフシギダネ,Bulbasaur,88.1,1,65,65,45,grass,poison,6.9,1,0
1,"['Overgrow', 'Chlorophyll']",1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,0.25,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.5,62,5120,70,405,45,Seed Pokémon,63,1059860,1.0,60,Fushigisouフシギソウ,Ivysaur,88.1,2,80,80,60,grass,poison,13.0,1,0
2,"['Overgrow', 'Chlorophyll']",1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,0.25,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.5,100,5120,70,625,45,Seed Pokémon,123,1059860,2.0,80,Fushigibanaフシギバナ,Venusaur,88.1,3,122,120,80,grass,poison,100.0,1,0
3,"['Blaze', 'Solar Power']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,0.5,2.0,0.5,1.0,1.0,1.0,2.0,0.5,2.0,52,5120,70,309,45,Lizard Pokémon,43,1059860,0.6,39,Hitokageヒトカゲ,Charmander,88.1,4,60,50,65,fire,,8.5,1,0
4,"['Blaze', 'Solar Power']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,0.5,2.0,0.5,1.0,1.0,1.0,2.0,0.5,2.0,64,5120,70,405,45,Flame Pokémon,58,1059860,1.1,58,Lizardoリザード,Charmeleon,88.1,5,80,65,80,fire,,19.0,1,0


## Cleaning

So there is alot of information to dig through here. lets get a summary of the stats for the data. We are going to break these into two parts, objects and then numeric. First let's look at objects.

In [3]:
pokemon.describe(include=object)

Unnamed: 0,abilities,capture_rate,classfication,japanese_name,name,type1,type2
count,801,801,801,801,801,801,417
unique,482,34,588,801,801,18,18
top,['Levitate'],45,Dragon Pokémon,Churineチュリネ,Watchog,water,flying
freq,29,250,8,1,1,114,95


So there are a few things to notice here. First is a something that we will have to fix. Capture Rate is an object but it appears to be an integer so we are gonna transform that during cleaning. It also would help to know what capture rate is and if there is a unit for it so we know how to handle it.

### Capture Rate

According to the dataset description capture rate is "Capture Rate of the Pokemon". This isn't very helpful so I went to the 
actual source of the data and found [this](https://www.serebii.net/games/capture.shtml) entry about what capture rate is and found that it is value given to each pokemon that determines how hard it is to catch. Capture can range anywhere from 0 and 255 and the higher it is, the better. If the number is 255 then the catch is guaranteed. Otherwise an equation is ran that takes into account a lot of different factors along with the capture rate. So we can use this value as a rarity or difficulty value.

### Types

Just something else to notice is that water is the most common first type. What makes this interesting is that there are 18 different types and water is over 1/8th of the total.

Also, we can see that a little under half of pokemon don't have a second type. 

Next, lets take a look at the numeric stats.

In [4]:
pokemon.describe(include=[int, float])

Unnamed: 0,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,against_grass,against_ground,against_ice,against_normal,against_poison,against_psychic,against_rock,against_steel,against_water,attack,base_egg_steps,base_happiness,base_total,defense,experience_growth,height_m,hp,percentage_male,pokedex_number,sp_attack,sp_defense,speed,weight_kg,generation,is_legendary
count,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,781.0,801.0,703.0,801.0,801.0,801.0,801.0,781.0,801.0,801.0
mean,0.996255,1.057116,0.968789,1.07397,1.068976,1.065543,1.135456,1.192884,0.985019,1.03402,1.098002,1.208177,0.887016,0.975343,1.005306,1.250312,0.983458,1.058365,77.857678,7191.011236,65.362047,428.377029,73.008739,1054996.0,1.163892,68.958801,55.155761,401.0,71.305868,70.911361,66.334582,61.378105,3.690387,0.087391
std,0.597248,0.438142,0.353058,0.654962,0.522167,0.717251,0.691853,0.604488,0.558256,0.788896,0.738818,0.735356,0.266106,0.549375,0.495183,0.697148,0.500117,0.606562,32.15882,6558.220422,19.598948,119.203577,30.769159,160255.8,1.080326,26.576015,20.261623,231.373075,32.353826,27.942501,28.907662,109.354766,1.93042,0.282583
min,0.25,0.25,0.0,0.0,0.25,0.0,0.25,0.25,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.25,0.25,0.25,5.0,1280.0,0.0,180.0,5.0,600000.0,0.1,1.0,0.0,1.0,10.0,20.0,5.0,0.1,1.0,0.0
25%,0.5,1.0,1.0,0.5,1.0,0.5,0.5,1.0,1.0,0.5,1.0,0.5,1.0,0.5,1.0,1.0,0.5,0.5,55.0,5120.0,70.0,320.0,50.0,1000000.0,0.6,50.0,50.0,201.0,45.0,50.0,45.0,9.0,2.0,0.0
50%,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,75.0,5120.0,70.0,435.0,70.0,1000000.0,1.0,65.0,50.0,401.0,65.0,66.0,65.0,27.3,4.0,0.0
75%,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,1.0,1.0,100.0,6400.0,70.0,505.0,90.0,1059860.0,1.5,80.0,50.0,601.0,91.0,90.0,85.0,64.8,5.0,0.0
max,4.0,4.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,1.0,4.0,4.0,4.0,4.0,4.0,185.0,30720.0,140.0,780.0,230.0,1640000.0,14.5,255.0,100.0,801.0,194.0,230.0,180.0,999.9,7.0,1.0


### Too much data

So we can see here that there is a lot of against types columns and we need to get rid of those because they aren't going to help us get what we want out of this analysis. We also won't be using the abilites, classification, japanese_name columns for the questions we want to answer.

## Cleaning Steps

### Eliminate columns

In [5]:
pokemon = pokemon.drop(columns=['abilities', 'against_bug', 'against_dark', 'against_dragon',
       'against_electric', 'against_fairy', 'against_fight', 'against_fire',
       'against_flying', 'against_ghost', 'against_grass', 'against_ground',
       'against_ice', 'against_normal', 'against_poison', 'against_psychic',
       'against_rock', 'against_steel', 'against_water', 'classfication', 'japanese_name'])
pokemon

Unnamed: 0,attack,base_egg_steps,base_happiness,base_total,capture_rate,defense,experience_growth,height_m,hp,name,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
0,49,5120,70,318,45,49,1059860,0.7,45,Bulbasaur,88.1,1,65,65,45,grass,poison,6.9,1,0
1,62,5120,70,405,45,63,1059860,1.0,60,Ivysaur,88.1,2,80,80,60,grass,poison,13.0,1,0
2,100,5120,70,625,45,123,1059860,2.0,80,Venusaur,88.1,3,122,120,80,grass,poison,100.0,1,0
3,52,5120,70,309,45,43,1059860,0.6,39,Charmander,88.1,4,60,50,65,fire,,8.5,1,0
4,64,5120,70,405,45,58,1059860,1.1,58,Charmeleon,88.1,5,80,65,80,fire,,19.0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
796,101,30720,0,570,25,103,1250000,9.2,97,Celesteela,,797,107,101,61,steel,flying,999.9,7,1
797,181,30720,0,570,255,131,1250000,0.3,59,Kartana,,798,59,31,109,grass,steel,0.1,7,1
798,101,30720,0,570,15,53,1250000,5.5,223,Guzzlord,,799,97,53,43,dark,dragon,888.0,7,1
799,107,30720,0,600,3,101,1250000,2.4,97,Necrozma,,800,127,89,79,psychic,,230.0,7,1


### Change one value that will interfere with changing capture rate to int

In [6]:
pokemon.at[773, 'capture_rate'] = '30'
pokemon.capture_rate = pokemon.capture_rate.astype(int)

This will include 