# **Pandas: Cleaning Empty Cells**

## Empty Cells
Epmty cells can permanently give a wrong result during data analysis.

## Remove Rows
One simple way to deal with empty cells is to remove the cells that contain an empty cell.  

This is usually OK, since data sets can be very big, and removing a few rows will not have a big impacton the result.  

To remove the rows with empty cells, we use `dropna()` method on the DataFrame.  
Which returns a new dataframe with empty cells.

#### Example: Return a new data frame with no empty cells

In [14]:
import pandas as pd

#loading dataset
df = pd.read_csv('CSVs/workout_sessions.csv')

#cleaning empty cells
new_df = df.dropna()

new_df

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,2020/12/01,110,130,409.1
1,60,2020/12/02,117,145,479.0
2,60,2020/12/03,103,135,340.0
3,45,2020/12/04,109,175,282.4
4,45,2020/12/05,117,148,406.0
5,60,2020/12/06,102,127,300.0
6,60,2020/12/07,110,136,374.0
7,450,2020/12/08,104,134,253.3
8,30,2020/12/09,109,133,195.1
9,60,2020/12/10,98,124,269.0


**Note:** To Change the original dataframe we can use the argument 'inplace = True'.

#### Example: Modifying the original dataframe

In [5]:
#changing original
df.dropna(inplace = True)

df

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,2020/12/01,110,130,409.1
1,60,2020/12/02,117,145,479.0
2,60,2020/12/03,103,135,340.0
3,45,2020/12/04,109,175,282.4
4,45,2020/12/05,117,148,406.0
5,60,2020/12/06,102,127,300.0
6,60,2020/12/07,110,136,374.0
7,450,2020/12/08,104,134,253.3
8,30,2020/12/09,109,133,195.1
9,60,2020/12/10,98,124,269.0


## Replace Empty Values
Another way of dealing with empty cells is to insert a new value instead.  
This way we can avoid having to delete entire rows over a few empty cells. 

For this we use **fillna()** method, which fills empty cells with the value.  

#### Example: Replacing all NULL values with 130

In [6]:
#new dataframe
df2 = pd.read_csv('CSVs/workout_sessions.csv')

#replacing
df2.fillna(130, inplace = True)
df2

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,2020/12/01,110,130,409.1
1,60,2020/12/02,117,145,479.0
2,60,2020/12/03,103,135,340.0
3,45,2020/12/04,109,175,282.4
4,45,2020/12/05,117,148,406.0
5,60,2020/12/06,102,127,300.0
6,60,2020/12/07,110,136,374.0
7,450,2020/12/08,104,134,253.3
8,30,2020/12/09,109,133,195.1
9,60,2020/12/10,98,124,269.0


## Replacing Only for Specific Coulumns
The example above replaces all empty cells in the whole Data Frame.

To only replace empty values for one column, specify the column name for the DataFrame.

#### Example: Replacing empty cells under 'Calories' with 130

In [11]:
#new dataframe
df3 = pd.read_csv('CSVs/workout_sessions.csv')

df3.fillna({"Calories": 130}, inplace = True)

df3

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,2020/12/01,110,130,409.1
1,60,2020/12/02,117,145,479.0
2,60,2020/12/03,103,135,340.0
3,45,2020/12/04,109,175,282.4
4,45,2020/12/05,117,148,406.0
5,60,2020/12/06,102,127,300.0
6,60,2020/12/07,110,136,374.0
7,450,2020/12/08,104,134,253.3
8,30,2020/12/09,109,133,195.1
9,60,2020/12/10,98,124,269.0


## Replacing using Mean, Meadian, or Mod

A common way to replace empty cells, is to calculate the mean, median, or mode of the column.

Pandas uses the **mean()**, **median()**, amd **mode()** methods to calculate the respective values for a specified column:  

#### Example: Calculating the mean and replacing any missing values with it

In [17]:
#new dataframe
df4 = pd.read_csv('CSVs/workout_sessions.csv')

#calculating mean of "calories" column 
mean = df4["Calories"].mean()

#replacing empty cells with mean
df4.fillna({"Calories":mean}, inplace = True)

df4

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,2020/12/01,110,130,409.1
1,60,2020/12/02,117,145,479.0
2,60,2020/12/03,103,135,340.0
3,45,2020/12/04,109,175,282.4
4,45,2020/12/05,117,148,406.0
5,60,2020/12/06,102,127,300.0
6,60,2020/12/07,110,136,374.0
7,450,2020/12/08,104,134,253.3
8,30,2020/12/09,109,133,195.1
9,60,2020/12/10,98,124,269.0


#### Example: Calculating and relacing empty cells with median 

In [19]:
#new dataframe
df5 = pd.read_csv('CSVs/workout_sessions.csv') 

#calculating median
median = df5["Calories"].median()

#replacing empty cells
df5.fillna({"Calories":median} , inplace = True)

df5

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,2020/12/01,110,130,409.1
1,60,2020/12/02,117,145,479.0
2,60,2020/12/03,103,135,340.0
3,45,2020/12/04,109,175,282.4
4,45,2020/12/05,117,148,406.0
5,60,2020/12/06,102,127,300.0
6,60,2020/12/07,110,136,374.0
7,450,2020/12/08,104,134,253.3
8,30,2020/12/09,109,133,195.1
9,60,2020/12/10,98,124,269.0


#### Example: Calculating and replacing empty cells with mode

In [26]:
#new dataframe
df6 = pd.read_csv('CSVs/workout_sessions.csv')

#calculating mode
mode = df6["Calories"].mode()[0]

#replacing the empty cells
df6.fillna({"Calories":mode} , inplace = True)

df6

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,2020/12/01,110,130,409.1
1,60,2020/12/02,117,145,479.0
2,60,2020/12/03,103,135,340.0
3,45,2020/12/04,109,175,282.4
4,45,2020/12/05,117,148,406.0
5,60,2020/12/06,102,127,300.0
6,60,2020/12/07,110,136,374.0
7,450,2020/12/08,104,134,253.3
8,30,2020/12/09,109,133,195.1
9,60,2020/12/10,98,124,269.0


**Note:**: The value that appears most frequently.