# Machine Learning 01 - Exploratory Data Analysis of all Pokemon
In this notebook, we will be analyzing data about all 1028 pokemon, from their name to their stats we will be answers questions like:
1. Best Pokemon Stat Wise from each Generation
2. How does being a duel type Pokemon effect Stats?
3. Do legendary and mythic Pokemon have higher stats then the others? And if so by how much?
4. Does being a Starter Pokemon effect anything?
5. By how much do mega-evolved Pokemon stats improve?


## Creating and Loading an Environment in Python
We need to make a virtual environment to download the packages we need for python. To do this, you need to run the following commands into a terminal:

`python -m venv .venv`      (where .venv is the name of the environment, you do need the . beforehand)

`.venv\Scripts\activate`    (where .venv is whatever you named your environment)

To deactivate your virtual environment, just type `deactivate` into the terminal

## Downloading the necessary Packages
In your terminal, run the commands `pip install -U pip` and `pip install -r requirements.txt`. This will download all necessary packages that we will use for the following Data Analysis

## Loading the Data and Neccessary Libraries

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

pd.set_option("display.max_columns", None) #Beause their are alot of columns, we set max columns to none
# Read the given data from https://www.kaggle.com/datasets/mariotormo/complete-pokemon-dataset-updated-090420?select=pokedex_%28Update_05.20%29.csv
df = pd.read_csv("../data/pokedex_(Update_05.20).csv")

### Initial Data Exploration
Lets take a small look at the first couple rows of our pokemon dataset.

In [2]:
df = df.drop(columns=['Unnamed: 0', 'german_name', 'japanese_name', 'height_m', 'weight_kg', 'base_friendship', 'base_experience', 'egg_type_number',	'egg_type_1',	'egg_type_2',	'percentage_male',	'egg_cycles'	]) # Getting rid of unecessary columns of data
df.head(n=190)

Unnamed: 0,pokedex_number,name,generation,status,species,type_number,type_1,type_2,abilities_number,ability_1,ability_2,ability_hidden,total_points,hp,attack,defense,sp_attack,sp_defense,speed,catch_rate,growth_rate,against_normal,against_fire,against_water,against_electric,against_grass,against_ice,against_fight,against_poison,against_ground,against_flying,against_psychic,against_bug,against_rock,against_ghost,against_dragon,against_dark,against_steel,against_fairy
0,1,Bulbasaur,1,Normal,Seed Pokémon,2,Grass,Poison,2,Overgrow,,Chlorophyll,318.0,45.0,49.0,49.0,65.0,65.0,45.0,45.0,Medium Slow,1.0,2.0,0.5,0.5,0.25,2.0,0.5,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5
1,2,Ivysaur,1,Normal,Seed Pokémon,2,Grass,Poison,2,Overgrow,,Chlorophyll,405.0,60.0,62.0,63.0,80.0,80.0,60.0,45.0,Medium Slow,1.0,2.0,0.5,0.5,0.25,2.0,0.5,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5
2,3,Venusaur,1,Normal,Seed Pokémon,2,Grass,Poison,2,Overgrow,,Chlorophyll,525.0,80.0,82.0,83.0,100.0,100.0,80.0,45.0,Medium Slow,1.0,2.0,0.5,0.5,0.25,2.0,0.5,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5
3,3,Mega Venusaur,1,Normal,Seed Pokémon,2,Grass,Poison,1,Thick Fat,,,625.0,80.0,100.0,123.0,122.0,120.0,80.0,45.0,Medium Slow,1.0,1.0,0.5,0.5,0.25,1.0,0.5,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5
4,4,Charmander,1,Normal,Lizard Pokémon,1,Fire,,2,Blaze,,Solar Power,309.0,39.0,52.0,43.0,60.0,50.0,65.0,45.0,Medium Slow,1.0,0.5,2.0,1.0,0.50,0.5,1.0,1.0,2.0,1.0,1.0,0.5,2.0,1.0,1.0,1.0,0.5,0.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
185,147,Dratini,1,Normal,Dragon Pokémon,1,Dragon,,2,Shed Skin,,Marvel Scale,300.0,41.0,64.0,45.0,50.0,50.0,50.0,45.0,Slow,1.0,0.5,0.5,0.5,0.50,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,2.0
186,148,Dragonair,1,Normal,Dragon Pokémon,1,Dragon,,2,Shed Skin,,Marvel Scale,420.0,61.0,84.0,65.0,70.0,70.0,70.0,45.0,Slow,1.0,0.5,0.5,0.5,0.50,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,2.0
187,149,Dragonite,1,Normal,Dragon Pokémon,2,Dragon,Flying,2,Inner Focus,,Multiscale,600.0,91.0,134.0,95.0,100.0,100.0,80.0,45.0,Slow,1.0,0.5,0.5,1.0,0.25,4.0,0.5,1.0,0.0,1.0,1.0,0.5,2.0,1.0,2.0,1.0,1.0,2.0
188,150,Mewtwo,1,Legendary,Genetic Pokémon,1,Psychic,,2,Pressure,,Unnerve,680.0,106.0,110.0,90.0,154.0,90.0,130.0,3.0,Slow,1.0,1.0,1.0,1.0,1.00,1.0,0.5,1.0,1.0,1.0,0.5,2.0,1.0,2.0,1.0,2.0,1.0,1.0


## Creating a Schema Table for our Data

We make a schema table of our data to better understand the type of data we are working with, the percentage of missing values and the amount of unique data we have.

In [3]:
def schema_table(df: pd.DataFrame) -> pd.DataFrame:
    out = pd.DataFrame({
        "dtype": df.dtypes.astype(str),
        "missing_%": (df.isna().mean()*100).round(2),
        "n_unique": df.nunique(),
        "example": df.apply(lambda s: s.dropna().iloc[0] if s.dropna().size else None)
    })
    return out.sort_index()

schema = schema_table(df)
schema

Unnamed: 0,dtype,missing_%,n_unique,example
abilities_number,int64,0.0,4,2
ability_1,object,0.29,202,Overgrow
ability_2,object,50.1,126,Tangled Feet
ability_hidden,object,21.21,154,Chlorophyll
against_bug,float64,0.0,7,1.0
against_dark,float64,0.0,7,1.0
against_dragon,float64,0.0,4,1.0
against_electric,float64,0.0,7,0.5
against_fairy,float64,0.0,6,0.5
against_fight,float64,0.0,7,0.5


This block of code lets us better understand the data by printing the amount of rows and columns, how many duplicate rows their are and which columns are thought to be numeric.

In [4]:
n_rows, n_cols = df.shape
dup_exact = df.duplicated().sum()

print(f"Rows: {n_rows:,} | Columns: {n_cols}")
print(f"Exact duplicate rows: {dup_exact}")


Rows: 1,028 | Columns: 39
Exact duplicate rows: 0


### Helper Code to Answer our Questions About the Dataset
Going back to what we want answers from this dataset, we should create helper code to organize and clean the dataset and make it ready for analysis.

#### A) Standardize the Two Type Columns
Remove white space and capatalize the type, then if no type 2, replace with None.

In [5]:
def std(s): 
    return s.astype(str).str.strip().str.title()

for c in ["type_1", "type_2"]:
    if c in df.columns:
        df[c] = std(df[c])
        if c == "type_2":
            df[c] = df[c].fillna("None").replace({"Nan": "None"})
