<a href="https://www.kaggle.com/code/sayansh001/pokemon-advanced-eda-visualization?scriptVersionId=100487428" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# <center> Pokemon Advanced üìä EDA üìà </center>
## <center>If you find this notebook useful, support with an upvoteüëç</center>
![](https://img.redbull.com/images/c_fill,g_auto,w_1380,h_920/q_auto,f_auto/redbullcom/2016/09/20/1331818966444_2/pok%C3%A9mon-super-mystery-dungeon)

# **About the Dataset**

This dataset contains information on all 802 pok√©mon from all seven generations. The information contained in this dataset includes base stats, performances against other types, heights, weights, classifications, egg steps, experience points, abilities, etc.

Features:

- abilities: A stringified list of abilities that the pok√©mon is capable of having.
- against_?: Eighteen features that denote the amount of damage taken against an attack of a particular type of pok√©mon.
- attack: The base attack of the pok√©mon.
- base_egg_steps: The number of steps required to hatch an egg of the pok√©mon.
- base_happiness: Base happiness of the pok√©mon.
- base_total: Sum of hp, attack, defense, sp_attack, sp_defense and speed.
- capture_rate: Capture rate of the pok√©mon.
- classification: The classification of the pok√©mon as described by the Sun and Moon pok√©dex.
- defense: The base defense of the pok√©mon.
- experience_growth: The experience growth of the pok√©mon.
- height_m: Height of the pok√©mon in metres.
- hp: The base HP of the pokemon. It is short for Hit Point, which determines how much damage a pok√©mon can receive before fainting.
- japanese_name: The original Japanese name of the pok√©mon.
- name: The English name of the pok√©mon.
- percentage_male: The percentage of the species that are male. Blank if the pok√©mon is genderless.
- pokedex_number: The entry number of the pok√©mon in the National Pok√©dex.
- sp_attack: The base special attack of the pok√©mon.
- sp_defense: The base special defense of the pok√©mon.
- speed: The base speed of the pok√©mon.
- type1: The primary type of the pok√©mon.
- type2: The secondary type of the pok√©mon.
- weight_kg: The weight of the pok√©mon in kilograms.
- generation: The numbered generation which the pok√©mon was first introduced.
- is_legendary: Denotes if the pok√©mon is legendary.

<a id="2"></a>
# **<center><span style="color:#00BFC4;">Importing the libraries  </span></center>**

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("darkgrid")
import squarify
from ast import literal_eval

<a id="2"></a>
# **<center><span style="color:#00BFC4;"> Descriptive Statistics  </span></center>**

In [None]:
df = pd.read_csv(r"../input/pokemon/pokemon.csv")
df.head()

In [None]:
df.shape

In [None]:
df.describe().T

**Total Missing values**

In [None]:
print(df.isnull().sum()[df.columns[df.isnull().any()]])
print("Total Missing values : ",df.isna().sum().sum())

<a id="2"></a>
# **<center><span style="color:#00BFC4;"> Data Preprocessing  </span></center>**

We will be discarding unnecessary rows 

In [None]:
df.drop(columns=['japanese_name', 'pokedex_number', 'base_egg_steps', 'classfication', 'percentage_male'], axis=1, inplace=True)

In [None]:
df["abilities"] = df.apply(lambda x: literal_eval(x["abilities"]), axis=1)

In [None]:
df["n_abilities"] = df.apply(lambda x: len(x["abilities"]), axis=1)

In [None]:
df["type"] = df["type1"].astype(str) +' ' + df["type2"]
df['type']

![](https://psa.gov.ph/sites/default/files/BMI_1.jpg)
- **Calculating the BMI**

In [None]:
df['bmi']=df.apply(lambda x: x['weight_kg']/(x['height_m']**2),axis=1)
df['bmi']

In [None]:
df[df['capture_rate']=='30 (Meteorite)255 (Core)'][['capture_rate','name']]

The pokemon *Minior* has two capture rate so we replace it with np.Nan

In [None]:
df["capture_rate"].replace({'30 (Meteorite)255 (Core)': np.nan}, inplace=True)
df['capture_rate'] = df['capture_rate'].fillna(0)
df['capture_rate'] = df['capture_rate'].astype('int')
df['capture_rate']

<a id="2"></a>
# **<center><span style="color:#00BFC4;"> Data Visualization  </span></center>**

<a id="3.1"></a>
# **<span style="color:#02BFE6;"> Pokemons Per generation  </span>**

In [None]:
plt.figure(figsize=(12,6))
ax = sns.countplot(x='generation',data=df,order=df['generation'].value_counts().index)
ax.set_title('Pokemons per Generation')
ax.set(xlabel='Generation',ylabel='Count')

Odd Generations tend to have more Pokemons than Even Generations, 7 is an exception

In [None]:
valc_type1 = df['type1'].value_counts()

In [None]:
plt.figure(figsize=(20,10))
ax = squarify.plot(valc_type1,
                  label = valc_type1.index,
                  color = sns.color_palette('husl',len(valc_type1)),
                  pad=0.8,
                  text_kwargs={'fontsize':12})
ax.set_title("Most prominent primary types !", size=20)
plt.axis('off')

Water Type Pokemons are in abundance while flying type pokemons are rarely found

# **Lets take a look at type2**

-  It is to be taken into consideration that a large number of Pokemons do not have a secondary type 

In [None]:
df['type2'].isnull().sum()

In [None]:
valc_type2 = df['type2'].value_counts()
types_df = pd.concat([valc_type1,valc_type2],axis=1)

In [None]:
types_df.plot(kind='bar',stacked=True, color=['red', 'green'],figsize=(12,6))

The most prominent Type2 feature is flying , followed by poision and ground

We will now see the top 15 types of pokemons whose type2 is not null

In [None]:
type_top15 = df[~df['type2'].isnull()]["type"].value_counts()[:15]

In [None]:
plt.figure(figsize=(12,6))
ax = sns.barplot(y=type_top15.index,x=type_top15.values)
ax.set_title("Most Common Type Combinations")
for container in ax.containers:
    ax.bar_label(container)

In [None]:
legendary=df[df['is_legendary']==1]
legendary_top1 = legendary['type1'].value_counts().head(5)
legendary_top2 = legendary['type2'].value_counts().head(5)
legendary_top3 = legendary['type'].value_counts().head(5)

In [None]:
plt.figure(figsize=(24,6))

plt.subplot(1,2,1)
sns.barplot(y=legendary_top1.index,x=legendary_top1.values)

    
plt.subplot(1,2,2)
sns.barplot(y=legendary_top2.index,x=legendary_top2.values)
plt.show()

Generation wise easiest Pokemons to catch

In [None]:
plt.figure(figsize=(20,6))
ax = sns.boxplot(x='generation',y='capture_rate',hue='is_legendary',data=df)

ax.set_title("Generation wise Capture_rate amongst Legendary Pokemons")
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles,["Non-legendary", "Legendary"],loc='upper right')

Generation 4 is the hardest to catch,while Generation 3 is the easiest to catch

In [None]:
plt.figure(figsize=(20,6))
sns.boxplot(x='type1',y='capture_rate',hue='is_legendary',data=df)

How many abilities do Pokemons generally tend to have

In [None]:
df["n_abilities"] = df.apply(lambda x: len(x["abilities"]), axis=1)

In [None]:
plt.figure(figsize=(20,6))
ax = sns.countplot(data=df, x="n_abilities", hue="is_legendary")
ax.set_title("How many abilities do pok√©mon usually have?", size=20)
ax.set(xlabel="Number of Abilities", ylabel="Count");
ax.legend(["Non-legendary", "Legendary"], loc='upper right');

In [None]:
plt.figure(figsize=(10,6))
ax = sns.scatterplot(x='weight_kg',y='height_m',data=df,hue='is_legendary',palette=['red','blue'])
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, ["Non-legendary", "Legendary"])

top5_weight_height_merged = pd.concat([df.nlargest(5, 'height_m'), df.nlargest(5, 'weight_kg')]).drop_duplicates(subset=['name'])
for index, row in top5_weight_height_merged.iterrows():
    plt.annotate(row['name'], xy=(row['weight_kg'], row['height_m']), fontsize=10)


In [None]:
top10_highest_bmi = df[["name", "bmi"]].sort_values(by="bmi",ascending=False).head(10)

In [None]:
plt.figure(figsize=(12,6))
ax = sns.barplot(x='name',y='bmi',data=top10_highest_bmi)

for container in ax.containers:
    ax.bar_label(container)
plt.xticks(rotation=90)
plt.show()

<a id="2"></a>
# **<center><span style="color:#00BFC4;">Which Pokemon has the lowest BMI</span></center>**

In [None]:
top10_highest_bmi = df[["name", "bmi"]].sort_values(by="bmi",ascending=True)[:10]
top10_highest_bmi

In [None]:
plt.figure(figsize=(12,6))
ax = sns.barplot(x='name',y='bmi',data=top10_highest_bmi)
for containers in ax.containers:
    ax.bar_label(containers)
plt.xticks(rotation=90)
plt.show()

In [None]:
plt.figure(figsize=(12,6))
ax = sns.boxplot(x='generation',y='base_total',hue='is_legendary',data=df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, ["Non-legendary", "Legendary"]);

<a id="2"></a>
# **<center><span style="color:#00BFC4;"> Correlation Between Attributes of Non-legendary Pok√©mon </span></center>**

In [None]:
grid_kws = {"height_ratios": (.9, .05), "hspace": .25}
f, (ax, cbar_ax) = plt.subplots(2, gridspec_kw=grid_kws, figsize=(20,10))
sns.heatmap((df[df['is_legendary']==0][['hp','sp_attack','sp_defense','attack','defense','speed']]).corr(),
           annot=True,
           fmt=".2f",
           ax=ax,
           cbar_ax=cbar_ax,
           cmap='tab20c')

<a id="2"></a>
# **<center><span style="color:#00BFC4;"> Correlation Between Attributes of Legendary Pok√©mon </span></center>**

In [None]:
grid_kws = {"height_ratios": (.9, .05), "hspace": .25}
f, (ax, cbar_ax) = plt.subplots(2, gridspec_kw=grid_kws, figsize=(20,10))
sns.heatmap((df[df['is_legendary']==1][['hp','sp_attack','sp_defense','attack','defense','speed']]).corr(),
           annot=True,
           fmt=".2f",
           ax=ax,
           cbar_ax=cbar_ax,
           cmap='tab20c')

In [None]:
non_legendary_pokemon_attributes = df[df["is_legendary"]==0].groupby(['type1']).median()[["attack", "sp_attack", "defense", "sp_defense", "hp", "speed", "base_total"]]

In [None]:
grid_kws = {"height_ratios": (.9, .05), "hspace": .25}
f, (ax, cbar_ax) = plt.subplots(2, gridspec_kw=grid_kws, figsize=(20,10))
sns.heatmap(non_legendary_pokemon_attributes,
            annot= True,
            fmt = ".2f",
            vmin = 0,
            vmax = 150,
            ax=ax,
            cbar_ax=cbar_ax,
            cbar_kws={"orientation": "horizontal"},
            cmap="YlOrRd")
ax.set_title('Median of Attributes by Type of Non-legendary Pok√©mon', size = 20)
ax.set(ylabel="Type1", xlabel="Attribute");

In [None]:
legendary_pokemon_attributes = df[df["is_legendary"]==1].groupby(['type1']).median()[["attack", "sp_attack", "defense", "sp_defense", "hp", "speed", "base_total"]]

In [None]:
grid_kws = {"height_ratios": (.9, .05), "hspace": .25}
f, (ax, cbar_ax) = plt.subplots(2, gridspec_kw=grid_kws, figsize=(20,10))
sns.heatmap(legendary_pokemon_attributes,
            annot= True,
            fmt = ".2f",
            vmin = 0,
            vmax = 150,
            ax=ax,
            cbar_ax=cbar_ax,
            cbar_kws={"orientation": "horizontal"},
            cmap="YlOrRd")
ax.set_title('Median of Attributes by Type of Legendary Pok√©mon', size = 20)
ax.set(ylabel="Type1", xlabel="Attribute")

In [None]:
top10_pokemon_base_total = df.sort_values(by="base_total", ascending=False).reset_index()[:10]

In [None]:
plt.figure(figsize=(20,10))
ax = sns.barplot(y=top10_pokemon_base_total["name"], x=top10_pokemon_base_total["base_total"], orient='h')
ax.set_title("Which is the best pok√©mon?", size=20)
ax.set(xlabel="Base Total", ylabel="Name")

# Annotate value labels to each pok√©mon
for index, row in top10_pokemon_base_total.iterrows(): 
    plt.annotate(row["base_total"], xy=(row["base_total"]-20, index), color='white') 

<a id="2"></a>
# **<center><span style="color:#00BFC4;"> References </span></center>**

I took a lot of references from this notebook to prepare accordingly https://www.kaggle.com/joaopdrg/pok-mon-data-visualization