# Exercise 3

<img src="https://img.itch.zone/aW1nLzM0MzUxOTUuanBn/original/FuMxog.jpg" />

In this exercise, you will perform EDA on the popular Pokémon dataset, which contains detailed information about hundreds of Pokémon, including their stats, types, generation, and whether they are legendary.

The Pokémon dataset is ideal for practicing EDA because it includes a mix of numerical features (such as Attack, Defense, Speed, HP) and categorical features (such as Type 1, Type 2, and Generation). This combination allows you to apply a wide range of EDA techniques, including summary statistics, data visualizations, grouping and aggregation, correlation analysis, and comparisons across categories.

Throughout the exercise, you will explore questions such as:
- Which Pokémon tend to have the highest or lowest stats?
- How do different Pokémon types compare in terms of strength or defense?
- Do Legendary Pokémon differ significantly from non-Legendary ones?
- How are various stats distributed across the entire dataset?
- Are there relationships or trade-offs between certain attributes (e.g., Attack vs. Defense)?

In [133]:
import kagglehub
import os
import pandas as pd

In [134]:
# Download latest version
path = kagglehub.dataset_download("abcsds/pokemon")
print("Path to dataset files:", path)

Using Colab cache for faster access to the 'pokemon' dataset.
Path to dataset files: /kaggle/input/pokemon


In [135]:
if os.path.isdir(path):
  print(True)

contents = os.listdir(path)
contents

mydataset = path + "/" + contents[0]
mydataset


df = pd.read_csv(mydataset)

True



## 1: Data Understanding (4 pts)

1. Display the first 10 rows.

In [136]:
df.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


2. Show dataset shape.

In [137]:
df.shape

(800, 13)

3. Show all columns and its data types.

In [138]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


4. Identify which columns contain missing values.

In [139]:
df.isnull().sum()

Unnamed: 0,0
#,0
Name,0
Type 1,0
Type 2,386
Total,0
HP,0
Attack,0
Defense,0
Sp. Atk,0
Sp. Def,0


## 2. Summary Statistics (4 pts)

1. Generate `df.describe()`.

In [140]:
df.describe()

Unnamed: 0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,208.343798,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.75,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,364.5,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,539.25,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


2. Get mean, median, and mode of Attack.

In [141]:
print("Mean:", df['Attack'].mean())
print("Median:", df['Attack'].median())
print("Mode:", df['Attack'].mode())


Mean: 79.00125
Median: 75.0
Mode: 0    100
Name: Attack, dtype: int64


3. Compute 25th and 75th percentiles for HP.

In [142]:
df['HP'].quantile([0.25, 0.75])

Unnamed: 0,HP
0.25,50.0
0.75,80.0


4. Compute standard deviation and variance of Speed.

In [143]:

pd.DataFrame([df['Speed'].std(), df['Speed'].var()], index=['std', 'var'], columns=['Speed'])

Unnamed: 0,Speed
std,29.060474
var,844.511133


## 3. Filtering & Selection `(7 pts)`

Select all Pokémon with Attack > 100.

In [144]:
df[['Name', 'Attack']][df['Attack'] > 100]

Unnamed: 0,Name,Attack
7,CharizardMega Charizard X,130
8,CharizardMega Charizard Y,104
12,BlastoiseMega Blastoise,103
19,BeedrillMega Beedrill,150
39,Nidoking,102
...,...,...
793,Yveltal,131
796,DiancieMega Diancie,160
797,HoopaHoopa Confined,110
798,HoopaHoopa Unbound,160


Select all Pokémon whose primary type (Type 1) is "Fire".

In [145]:
df['Type 1'][df['Type 1'] == 'Fire'].count()

np.int64(52)

In [146]:
df[['Name', 'Type 1']][df['Type 1'] == 'Fire']

Unnamed: 0,Name,Type 1
4,Charmander,Fire
5,Charmeleon,Fire
6,Charizard,Fire
7,CharizardMega Charizard X,Fire
8,CharizardMega Charizard Y,Fire
42,Vulpix,Fire
43,Ninetales,Fire
63,Growlithe,Fire
64,Arcanine,Fire
83,Ponyta,Fire


Select all Pokémon that are Legendary.

In [147]:
df['Legendary'][df['Legendary']].count()


np.int64(65)

In [148]:
df[['Name', 'Legendary']][df['Legendary']]

Unnamed: 0,Name,Legendary
156,Articuno,True
157,Zapdos,True
158,Moltres,True
162,Mewtwo,True
163,MewtwoMega Mewtwo X,True
...,...,...
795,Diancie,True
796,DiancieMega Diancie,True
797,HoopaHoopa Confined,True
798,HoopaHoopa Unbound,True


Select all Pokémon that are Generation 1 AND Legendary.

In [149]:
df[['Name', 'Generation', 'Legendary']][df['Legendary'] & (df['Generation'] == 1)]

Unnamed: 0,Name,Generation,Legendary
156,Articuno,1,True
157,Zapdos,1,True
158,Moltres,1,True
162,Mewtwo,1,True
163,MewtwoMega Mewtwo X,1,True
164,MewtwoMega Mewtwo Y,1,True


Select all Pokémon that are Water type OR Grass type.

In [150]:
df[['Name', 'Type 1']][(df['Type 1'] == 'Water') | (df['Type 1'] == 'Grass')]

Unnamed: 0,Name,Type 1
0,Bulbasaur,Grass
1,Ivysaur,Grass
2,Venusaur,Grass
3,VenusaurMega Venusaur,Grass
9,Squirtle,Water
...,...,...
726,Greninja,Water
740,Skiddo,Grass
741,Gogoat,Grass
762,Clauncher,Water


Select all Pokémon that are Fire type AND Attack > 120.

In [151]:
df[['Name', 'Type 1', 'Attack']][(df['Type 1'] == 'Fire') & (df['Attack'] > 120)]

Unnamed: 0,Name,Type 1,Attack
7,CharizardMega Charizard X,Fire,130
147,Flareon,Fire,130
270,Ho-oh,Fire,130
279,BlazikenMega Blaziken,Fire,160
559,Emboar,Fire,123
615,DarmanitanStandard Mode,Fire,140


Select all Pokémon whose type is in this list:
`["Dragon", "Ghost", "Dark"]`.

In [152]:
df[['Name', 'Type 2']][df['Type 2'].isin(["Dragon", "Ghost", "Dark"])]

Unnamed: 0,Name,Type 2
7,CharizardMega Charizard X,Dragon
141,GyaradosMega Gyarados,Dark
196,AmpharosMega Ampharos,Dragon
249,Kingdra,Dragon
267,Tyranitar,Dark
268,TyranitarMega Tyranitar,Dark
275,SceptileMega Sceptile,Dragon
297,Nuzleaf,Dark
298,Shiftry,Dark
316,Shedinja,Ghost


## 4. Categorical Exploration `(9 pts)`

Find the number of Pokémon per primary type.

In [153]:
df['Type 1'].groupby(df['Type 1']).count()

Unnamed: 0_level_0,Type 1
Type 1,Unnamed: 1_level_1
Bug,69
Dark,31
Dragon,32
Electric,44
Fairy,17
Fighting,27
Fire,52
Flying,4
Ghost,32
Grass,70


Find the number of Pokémon per generation.

In [154]:
df['Generation'].groupby(df['Generation']).count()

Unnamed: 0_level_0,Generation
Generation,Unnamed: 1_level_1
1,166
2,106
3,160
4,121
5,165
6,82


Which type appears the most? Which appears the least?

In [155]:
type_count = df['Type 1'].groupby(df['Type 1']).count()

final = pd.DataFrame({
    'Type': [type_count.idxmax(), type_count.idxmin()],
    'Count': [type_count.max(), type_count.min()]
})

final


Unnamed: 0,Type,Count
0,Water,112
1,Flying,4


How many unique primary types (Type 1) exist?

In [156]:
df['Type 1'].nunique()

18

How many unique secondary types (Type 2) exist?

In [157]:
df['Type 2'].nunique()

18


Which primary types have the most dual-type combinations?

In [176]:
df[['Type 1', 'Type 2']].groupby(['Type 1', 'Type 2']).size().sort_values(ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,0
Type 1,Type 2,Unnamed: 2_level_1
Normal,Flying,24
Grass,Poison,15
Bug,Flying,14
Bug,Poison,12
Ghost,Grass,10
...,...,...
Rock,Fighting,1
Steel,Fighting,1
Steel,Flying,1
Steel,Dragon,1


In [177]:
df[['Type 1', 'Type 2']].groupby(['Type 1', 'Type 2']).size().sort_values(ascending=False).idxmax()

('Normal', 'Flying')

Which type has the highest mean Attack?

In [190]:
df.groupby('Type 1')['Attack'].mean().idxmax()

'Dragon'

Which type has the lowest mean Defense?

In [191]:
df.groupby('Type 1')['Defense'].mean().idxmin()

'Normal'


Which generation has the highest average Speed?

In [192]:
df.groupby('Generation')['Speed'].mean().idxmax()

np.int64(1)

## 5. Groupby & Aggregation `(13 pts)`

Compute the average Attack per primary type.

In [193]:
df.groupby('Type 1')['Attack'].mean()

Unnamed: 0_level_0,Attack
Type 1,Unnamed: 1_level_1
Bug,70.971014
Dark,88.387097
Dragon,112.125
Electric,69.090909
Fairy,61.529412
Fighting,96.777778
Fire,84.769231
Flying,78.75
Ghost,73.78125
Grass,73.214286


Compute the maximum HP per generation.


In [194]:
df.groupby('Generation')['HP'].max()

Unnamed: 0_level_0,HP
Generation,Unnamed: 1_level_1
1,250
2,255
3,170
4,150
5,165
6,126


Compute the total number of Pokémon per primary type.

In [197]:
df.groupby('Type 1').size()

Unnamed: 0_level_0,0
Type 1,Unnamed: 1_level_1
Bug,69
Dark,31
Dragon,32
Electric,44
Fairy,17
Fighting,27
Fire,52
Flying,4
Ghost,32
Grass,70


For each type, compute:

- mean Attack

- mean Defense

- mean Speed

In [198]:
df.groupby('Type 1')[['Attack', 'Defense', 'Speed']].mean()

Unnamed: 0_level_0,Attack,Defense,Speed
Type 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bug,70.971014,70.724638,61.681159
Dark,88.387097,70.225806,76.16129
Dragon,112.125,86.375,83.03125
Electric,69.090909,66.295455,84.5
Fairy,61.529412,65.705882,48.588235
Fighting,96.777778,65.925926,66.074074
Fire,84.769231,67.769231,74.442308
Flying,78.75,66.25,102.5
Ghost,73.78125,81.1875,64.34375
Grass,73.214286,70.8,61.928571


For each generation, compute:

- count

- mean Total

- number of Legendary Pokémon (hint: use sum on Boolean)

In [216]:
count = df['Generation'].groupby(df['Generation']).count()

total = df.groupby('Generation')['Total'].mean()

legendary = df.groupby('Generation')['Legendary'].sum()

pd.DataFrame({
    'count': count,
    'mean Total': total,
    'Legendary': legendary
})


Unnamed: 0_level_0,count,mean Total,Legendary
Generation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,166,426.813253,6
2,106,418.283019,5
3,160,436.225,18
4,121,459.016529,13
5,165,434.987879,15
6,82,436.378049,8


Which type combination (Type 1 + Type 2) has the highest average Attack?

In [218]:
df.groupby(['Type 1', 'Type 2'])['Attack'].mean().idxmax()

('Ground', 'Fire')

Which generation has the highest proportion of Legendary Pokémon?

In [228]:
df.groupby(['Generation'])['Legendary'].mean().sort_values(ascending=False).head(1)

Unnamed: 0_level_0,Legendary
Generation,Unnamed: 1_level_1
3,0.1125



Which primary type has the largest variance in HP?

In [230]:
df.groupby('Type 1')['HP'].var().sort_values(ascending=False).head(1)

Unnamed: 0_level_0,HP
Type 1,Unnamed: 1_level_1
Normal,1312.861456


Which primary type has the highest median Speed?

In [235]:
df.groupby('Type 1')['Speed'].median().sort_values(ascending=False).head(1)

Unnamed: 0_level_0,Speed
Type 1,Unnamed: 1_level_1
Flying,116.0


Group Pokémon by whether they are Legendary or not. Compare:

- mean Total

- mean Attack

- mean Defense

- mean Speed

In [232]:
df.groupby('Legendary')[['Total', 'Attack', 'Defense', 'Speed']].mean()

Unnamed: 0_level_0,Total,Attack,Defense,Speed
Legendary,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
False,417.213605,75.669388,71.559184,65.455782
True,637.384615,116.676923,99.661538,100.184615


Show the top 5 strongest types by mean Total.

In [236]:
df.groupby('Type 1')['Total'].mean().sort_values(ascending=False).head(5)

Unnamed: 0_level_0,Total
Type 1,Unnamed: 1_level_1
Dragon,550.53125
Steel,487.703704
Flying,485.0
Psychic,475.947368
Fire,458.076923


Rank generations by their average Attack.

In [237]:
df.groupby('Generation')['Attack'].mean().sort_values(ascending=False)

Unnamed: 0_level_0,Attack
Generation,Unnamed: 1_level_1
4,82.867769
5,82.066667
3,81.625
1,76.638554
6,75.804878
2,72.028302


Show the top 10 fastest Pokémon using nlargest(10, "Speed").

In [238]:
df.nlargest(10, "Speed")

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
431,386,DeoxysSpeed Forme,Psychic,,600,50,95,90,95,90,180,3,True
315,291,Ninjask,Bug,Flying,456,61,90,45,50,50,160,3,False
71,65,AlakazamMega Alakazam,Psychic,,590,55,50,65,175,95,150,1,False
154,142,AerodactylMega Aerodactyl,Rock,Flying,615,80,135,85,70,95,150,1,False
428,386,DeoxysNormal Forme,Psychic,,600,50,150,50,150,50,150,3,True
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False
275,254,SceptileMega Sceptile,Grass,Dragon,630,70,110,75,145,85,145,3,False
678,617,Accelgor,Bug,,495,80,70,40,100,60,145,5,False
109,101,Electrode,Electric,,480,60,50,70,80,80,140,1,False
