In [None]:
# Introduction

Welcome to the Pokémon Exploratory Data Analysis (EDA) project!  
In this notebook, we will analyze the Pokémon dataset to uncover patterns, trends, and insights about various Pokémon species.  
We will start by loading and cleaning the data, followed by statistical analysis and visualizations to better understand the characteristics and distributions within the dataset.  
Let's dive in and explore the world of Pokémon!

In [2]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#Show plots inline
%matplotlib inline

#Set the style of seaborn
sns.set(style="whitegrid")

In [None]:
## Loading the Data

Let's begin by loading the Pokémon dataset into a pandas DataFrame.  
We'll inspect the first few rows to get an overview of the data structure and its features.

In [12]:
# Define the file path for the Pokémon dataset
file_path = 'Pokemon.csv'

# Load the Pokémon dataset into a pandas DataFrame
df = pd.read_csv(file_path, index_col='Name')

# Display the first 10 rows of the DataFrame
df.head(10)

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Bulbasaur,1,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,2,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False
VenusaurMega Venusaur,3,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmander,4,Fire,,309,39,52,43,60,50,65,1,False
Charmeleon,5,Fire,,405,58,64,58,80,65,80,1,False
Charizard,6,Fire,Flying,534,78,84,78,109,85,100,1,False
CharizardMega Charizard X,6,Fire,Dragon,634,78,130,111,130,85,100,1,False
CharizardMega Charizard Y,6,Fire,Flying,634,78,104,78,159,115,100,1,False
Squirtle,7,Water,,314,44,48,65,50,64,43,1,False


In [30]:
# Display summary statistics of the DataFrame
print("Summary Statistics:")
display(df.describe())

# Display the shape of the DataFrame
print(f"\nDataset Shape: {df.shape}")

Summary Statistics:


Unnamed: 0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,208.343798,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.75,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,364.5,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,539.25,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0



Dataset Shape: (800, 12)


In [None]:
## Data Cleaning

Before proceeding with our analysis, it's important to clean the dataset to ensure accuracy and consistency.  
In this section, we will:

- Check for missing values and handle them appropriately
- Identify and address any duplicate entries
- Ensure data types are correct for each column
- Standardize categorical values if necessary

Let's prepare our Pokémon data for deeper exploration!

In [31]:
# 1. Check for missing values
print("Missing values per column:")
print(df.isnull().sum())

# 2. Handle missing values in 'Type 2'
df['Type 2'] = df['Type 2'].fillna('None')

# 3. Check for duplicates
duplicates = df.duplicated()
print(f"Number of duplicate rows: {duplicates.sum()}")

# Remove duplicates based on all columns
df = df[~duplicates]

# 4. Validate data types
print("\nData types after cleaning:")
print(df.dtypes)

# 5. Standardize categorical values
df['Type 1'] = df['Type 1'].str.strip().str.capitalize()
df['Type 2'] = df['Type 2'].str.strip().str.capitalize()
df['Legendary'] = df['Legendary'].astype(bool)

# 6. Summary after cleaning
print("\nMissing values after cleaning:")
print(df.isnull().sum())
print(f"Number of duplicate rows after cleaning: {df.duplicated().sum()}")

Missing values per column:
#               0
Type 1          0
Type 2        386
Total           0
HP              0
Attack          0
Defense         0
Sp. Atk         0
Sp. Def         0
Speed           0
Generation      0
Legendary       0
dtype: int64
Number of duplicate rows: 2

Data types after cleaning:
#              int64
Type 1        object
Type 2        object
Total          int64
HP             int64
Attack         int64
Defense        int64
Sp. Atk        int64
Sp. Def        int64
Speed          int64
Generation     int64
Legendary       bool
dtype: object

Missing values after cleaning:
#             0
Type 1        0
Type 2        0
Total         0
HP            0
Attack        0
Defense       0
Sp. Atk       0
Sp. Def       0
Speed         0
Generation    0
Legendary     0
dtype: int64
Number of duplicate rows after cleaning: 0
