# Pokemon EDA

The data comes from https://www.kaggle.com/datasets/abcsds/pokemon. This dataset features pokemon from the first 6 generations.

Below are the descriptions for each of the columns:

- **#**: ID for each pokemon
- **Name**: Name of each pokemon
- **Type 1**: Each pokemon has a type, this determines weakness/resistance to attacks
- **Type 2**: Some pokemon are dual type and have 2
- **Total**: sum of all stats that come after this, a general guide to how strong a pokemon is
- **HP**: hit points, or health, defines how much damage a pokemon can withstand before fainting
- **Attack**: the base modifier for normal attacks (eg. Scratch, Punch)
- **Defense**: the base damage resistance against normal attacks
- **SP Atk**: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
- **SP Def**: the base damage resistance against special attacks
- **Speed**: determines which pokemon attacks first each round

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import seaborn as sns

stats = ["HP", "Attack", "Defense", "Sp. Atk", "Sp. Def", "Speed"]

In [None]:
df = pd.read_csv("Pokemon.csv")
df

From above, we can see that this dataset also includes mega variants. Some other pokemon have other variants. For example, Hoopa has 2 forms: confined form, which is the default, and unbound form.

In [None]:
df[df.Name.str.contains("Mega")]

In [None]:
df.dtypes

In [None]:
df.describe()

In [None]:
sns.pairplot(df[stats + ["Legendary"]], hue="Legendary")

In [None]:
sns.pairplot(df[stats + ["Total", "Legendary"]], hue="Legendary")

In [None]:
sns.pairplot(df[stats + ["Generation"]], hue="Generation", palette="tab10")

In [None]:
fig = px.treemap(df["Type 1"])
fig.show()

In [None]:
# Normalize the stats
# https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#matching-broadcasting-behavior
df_normalized = df.copy()
df_normalized[stats] = df_normalized[stats].div(df["Total"], axis=0)
df_normalized

In [None]:
sns.pairplot(df_normalized[stats + ["Total", "Legendary"]], hue="Legendary")

In [None]:
from pandas.plotting import scatter_matrix

In [None]:
scatter_matrix(df_normalized[stats], figsize=(12, 12))