# Introduction

Welcome to the Pokémon Exploratory Data Analysis (EDA) project!  
In this notebook, we will analyze the Pokémon dataset to uncover patterns, trends, and insights about various Pokémon species.  
We will start by loading and cleaning the data, followed by statistical analysis and visualizations to better understand the characteristics and distributions within the dataset.  
Let's dive in and explore the world of Pokémon!

In [92]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#Show plots inline
%matplotlib inline

#Set the style of seaborn
sns.set(style="whitegrid")

In [None]:
## Loading the Data

Let's begin by loading the Pokémon dataset into a pandas DataFrame.  
We'll inspect the first few rows to get an overview of the data structure and its features.

In [93]:
# Define the file path for the Pokémon dataset
file_path = 'Pokemon.csv'

# Load the Pokémon dataset into a pandas DataFrame
df = pd.read_csv(file_path, index_col='Name')

# Display the first 10 rows of the DataFrame
df.head(10)

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Bulbasaur,1,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,2,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False
VenusaurMega Venusaur,3,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmander,4,Fire,,309,39,52,43,60,50,65,1,False
Charmeleon,5,Fire,,405,58,64,58,80,65,80,1,False
Charizard,6,Fire,Flying,534,78,84,78,109,85,100,1,False
CharizardMega Charizard X,6,Fire,Dragon,634,78,130,111,130,85,100,1,False
CharizardMega Charizard Y,6,Fire,Flying,634,78,104,78,159,115,100,1,False
Squirtle,7,Water,,314,44,48,65,50,64,43,1,False


In [94]:
df.columns = df.columns.str.upper()  # Convert column names to uppercase

df.head()

Unnamed: 0_level_0,#,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Bulbasaur,1,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,2,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False
VenusaurMega Venusaur,3,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmander,4,Fire,,309,39,52,43,60,50,65,1,False


In [None]:
## Data Cleaning

Before proceeding with our analysis, it's important to clean the dataset to ensure accuracy and consistency.  
In this section, we will:

- Check for missing values and handle them appropriately
- Identify and address any duplicate entries
- Ensure data types are correct for each column
- Standardize categorical values if necessary

Let's prepare our Pokémon data for deeper exploration!

In [99]:
## The index of Mega Pokemons has extra text. Remove all text before "Mega"
df.index = df.index.str.replace(".*(?=Mega)", "", regex=True)

df.head(10)

Unnamed: 0_level_0,#,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Bulbasaur,1,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,2,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False
Mega Venusaur,3,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmander,4,Fire,,309,39,52,43,60,50,65,1,False
Charmeleon,5,Fire,,405,58,64,58,80,65,80,1,False
Charizard,6,Fire,Flying,534,78,84,78,109,85,100,1,False
Mega Charizard X,6,Fire,Dragon,634,78,130,111,130,85,100,1,False
Mega Charizard Y,6,Fire,Flying,634,78,104,78,159,115,100,1,False
Squirtle,7,Water,,314,44,48,65,50,64,43,1,False


In [100]:
# 1. Check for missing values
print("Missing values per column:")
print(df.isnull().sum())

# 2. Handle missing values in 'Type 2'
df['TYPE 2'] = df['TYPE 2'].fillna('None')

# 3. Check for duplicates
duplicates = df.duplicated()
print(f"Number of duplicate rows: {duplicates.sum()}")

# Remove duplicates based on all columns
df = df[~duplicates]

# 4. Validate data types
print("\nData types after cleaning:")
print(df.dtypes)

# 5. Standardize categorical values
df['LEGENDARY'] = df['LEGENDARY'].astype(bool)

# 6. Summary after cleaning
print("\nMissing values after cleaning:")
print(df.isnull().sum())
print(f"Number of duplicate rows after cleaning: {df.duplicated().sum()}")

Missing values per column:
#               0
TYPE 1          0
TYPE 2        386
TOTAL           0
HP              0
ATTACK          0
DEFENSE         0
SP. ATK         0
SP. DEF         0
SPEED           0
GENERATION      0
LEGENDARY       0
dtype: int64
Number of duplicate rows: 2

Data types after cleaning:
#              int64
TYPE 1        object
TYPE 2        object
TOTAL          int64
HP             int64
ATTACK         int64
DEFENSE        int64
SP. ATK        int64
SP. DEF        int64
SPEED          int64
GENERATION     int64
LEGENDARY       bool
dtype: object

Missing values after cleaning:
#             0
TYPE 1        0
TYPE 2        0
TOTAL         0
HP            0
ATTACK        0
DEFENSE       0
SP. ATK       0
SP. DEF       0
SPEED         0
GENERATION    0
LEGENDARY     0
dtype: int64
Number of duplicate rows after cleaning: 0


In [102]:
print(df.loc['Raikou']) # Display the row for Raikou
print(df.iloc[243]) # Display the row for Raikou using iloc

#                  243
TYPE 1        Electric
TYPE 2            None
TOTAL              580
HP                  90
ATTACK              85
DEFENSE             75
SP. ATK            115
SP. DEF            100
SPEED              115
GENERATION           2
LEGENDARY         True
Name: Raikou, dtype: object
#                225
TYPE 1           Ice
TYPE 2        Flying
TOTAL            330
HP                45
ATTACK            55
DEFENSE           45
SP. ATK           65
SP. DEF           45
SPEED             75
GENERATION         2
LEGENDARY      False
Name: Delibird, dtype: object


In [103]:
# Display only the legendary Pokémon
legendary_pokemon = df[df['LEGENDARY'] == True]
display(legendary_pokemon.head())

# Display the Pokémon with the maxium defense
print("Max Defense Pokémon: ", df['DEFENSE'].idxmax())
# Diplay the maxium attack value
print("Max Attack Value: ", df['ATTACK'].argmax())

Unnamed: 0_level_0,#,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Articuno,144,Ice,Flying,580,90,85,100,95,125,85,1,True
Zapdos,145,Electric,Flying,580,90,90,85,125,90,100,1,True
Moltres,146,Fire,Flying,580,90,100,90,125,85,90,1,True
Mewtwo,150,Psychic,,680,106,110,90,154,90,130,1,True
Mega Mewtwo X,150,Psychic,Fighting,780,106,190,100,154,100,130,1,True


Max Defense Pokémon:  Mega Steelix
Max Attack Value:  163


In [104]:
#Display the DataFrame sorted by the 'TOTAL' column in descending order
print("Pokémon sorted by TOTAL stats:")
df.sort_values(by='TOTAL', ascending=False)

Pokémon sorted by TOTAL stats:


Unnamed: 0_level_0,#,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Mega Rayquaza,384,Dragon,Flying,780,105,180,100,180,100,115,3,True
Mega Mewtwo X,150,Psychic,Fighting,780,106,190,100,154,100,130,1,True
Mega Mewtwo Y,150,Psychic,,780,106,150,70,194,120,140,1,True
KyogrePrimal Kyogre,382,Water,,770,100,150,90,180,160,90,3,True
GroudonPrimal Groudon,383,Ground,Fire,770,100,180,160,150,90,90,3,True
...,...,...,...,...,...,...,...,...,...,...,...,...
Weedle,13,Bug,Poison,195,40,35,30,20,20,50,1,False
Caterpie,10,Bug,,195,45,30,35,20,20,45,1,False
Kricketot,401,Bug,,194,37,25,41,25,41,25,4,False
Azurill,298,Normal,Fairy,190,50,20,40,20,40,20,3,False


In [109]:
# Display all types of Pokémon
print("Types:", df['TYPE 1'].unique())
# Display the number of types
print("The number of types are: ", df['TYPE 1'].nunique())

Types: ['Grass' 'Fire' 'Water' 'Bug' 'Normal' 'Poison' 'Electric' 'Ground'
 'Fairy' 'Fighting' 'Psychic' 'Rock' 'Ghost' 'Ice' 'Dragon' 'Dark' 'Steel'
 'Flying']
The number of types are:  18


In [110]:
# Display the count of each TYPE 1
print(df['TYPE 1'].value_counts())

print("\n" + "="*50)

# Display the count of each TYPE 2
print("TYPE 2 counts:")
print(df['TYPE 2'].value_counts())

print("\n" + "="*50)

# Combine TYPE 1 and TYPE 2 counts together
print("Combined TYPE 1 and TYPE 2 counts:")
# Concatenate both type columns, excluding 'None' values for TYPE 2
all_types = pd.concat([
    df['TYPE 1'], 
    df['TYPE 2'][df['TYPE 2'] != 'None']
])
combined_counts = all_types.value_counts()
print(combined_counts)

TYPE 1
Water       111
Normal       98
Grass        70
Bug          69
Psychic      56
Fire         52
Rock         44
Electric     44
Ground       32
Ghost        32
Dragon       32
Dark         31
Poison       28
Fighting     27
Steel        27
Ice          24
Fairy        17
Flying        4
Name: count, dtype: int64

TYPE 2 counts:
TYPE 2
None        385
Flying       97
Ground       35
Poison       34
Psychic      33
Fighting     25
Grass        25
Fairy        23
Steel        22
Dark         20
Dragon       18
Ice          14
Rock         14
Water        14
Ghost        14
Fire         12
Electric      6
Normal        4
Bug           3
Name: count, dtype: int64

Combined TYPE 1 and TYPE 2 counts:
Water       125
Normal      102
Flying      101
Grass        95
Psychic      89
Bug          72
Ground       67
Fire         64
Poison       62
Rock         58
Fighting     52
Dark         51
Dragon       50
Electric     50
Steel        49
Ghost        46
Fairy        40
Ice          38
Na

In [111]:
#Display the dataframe columns
print("The columns of the dataframe are: ", df.columns)

# Display summary statistics of the DataFrame
print("Summary Statistics:")
display(df.describe())

# Display the shape of the DataFrame
print(f"\nDataset Shape: {df.shape}")

The columns of the dataframe are:  Index(['#', 'TYPE 1', 'TYPE 2', 'TOTAL', 'HP', 'ATTACK', 'DEFENSE', 'SP. ATK',
       'SP. DEF', 'SPEED', 'GENERATION', 'LEGENDARY'],
      dtype='object')
Summary Statistics:


Unnamed: 0,#,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION
count,798.0,798.0,798.0,798.0,798.0,798.0,798.0,798.0,798.0
mean,362.062657,434.882206,69.225564,79.048872,73.819549,72.736842,71.868421,68.182957,3.318296
std,208.061341,119.998562,25.554513,32.478525,31.217254,32.700741,27.854551,29.03516,1.659599
min,1.0,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.25,330.0,50.0,55.0,50.0,49.25,50.0,45.0,2.0
50%,363.5,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,537.75,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0



Dataset Shape: (798, 12)
