# 05 - Data Cleaning

## Import Package

In [1]:
import pandas as pd
import numpy as np

## Practice Section

Perform the following steps to conduct a simple missing values inspection and imputation:
- Import the Pokemon dataset and bind it to another variable.
- Inject missing values into 'Attack' column for the dataframe if 'Attack' 
   value is more than 50
- Inspect the missing values of the dataset.
- Perform missing value imputation using a constant.
- Perform missing value imputation using mode of the column(s).
- Perform missing value imputation using forward fill method.
- Drop all the rows with missing value.

In [2]:
# - Import the Pokemon dataset and bind it to another variable.
dataframe = pd.read_csv("https://raw.githubusercontent.com/KianYang-Lee/pandas-tutorial/main/datasets/Pokemon.csv")

In [3]:
# - Inject missing values into 'Attack' column for the dataframe if 'Attack' 
#   value is more than 50
dataframe.Attack = dataframe.Attack.map(lambda x: np.nan if x > 50 else x)

In [4]:
# - Inspect the missing values of the dataset.
dataframe.isnull().sum()

#               0
Name            0
Type 1          0
Type 2        386
Total           0
HP              0
Attack        630
Defense         0
Sp. Atk         0
Sp. Def         0
Speed           0
Generation      0
Legendary       0
dtype: int64

In [5]:
# - Perform missing value imputation using a constant.
dataframe.Attack.fillna(999)

0       49.0
1      999.0
2      999.0
3      999.0
4      999.0
       ...  
795    999.0
796    999.0
797    999.0
798    999.0
799    999.0
Name: Attack, Length: 800, dtype: float64

In [6]:
# - Perform missing value imputation using mode of the column(s).
dataframe.Attack.fillna(value=dataframe.Attack.mode()[0])
dataframe["Type 2"].fillna(dataframe["Type 2"].mode()[0])

0      Poison
1      Poison
2      Poison
3      Poison
4      Flying
        ...  
795     Fairy
796     Fairy
797     Ghost
798      Dark
799     Water
Name: Type 2, Length: 800, dtype: object

In [7]:
# - Perform missing value imputation using forward fill method.
dataframe.fillna(method='ffill')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49.0,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,49.0,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,49.0,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,49.0,123,122,120,80,1,False
4,4,Charmander,Fire,Poison,309,39,49.0,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,30.0,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,30.0,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,30.0,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,30.0,60,170,130,80,6,True


In [8]:
# - Drop all the rows with missing value.
dataframe.dropna()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49.0,49,65,65,45,1,False
15,12,Butterfree,Bug,Flying,395,60,45.0,50,90,80,70,1,False
16,13,Weedle,Bug,Poison,195,40,35.0,30,20,20,50,1,False
17,14,Kakuna,Bug,Poison,205,45,25.0,50,25,25,35,1,False
20,16,Pidgey,Normal,Flying,251,40,45.0,40,35,35,56,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
735,667,Litleo,Fire,Normal,369,62,50.0,58,73,54,72,6,False
751,681,AegislashShield Forme,Steel,Ghost,520,60,50.0,150,50,150,60,6,False
764,694,Helioptile,Electric,Normal,289,44,38.0,33,61,43,70,6,False
773,703,Carbink,Rock,Fairy,500,50,50.0,150,50,150,50,6,False


**Copyright (C) 2021  Lee Kian Yang**

This program is licensed under MIT license.