![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

#### **HANDLING NULL VALUES**

In [4]:
import pandas as pd 
import numpy as np

In [49]:
coffee  = pd.read_csv('/Users/abdoulayebocoum/Desktop/Projects/pandas-tutorial/warmup-data/coffee.csv')

In [50]:
coffee.head()

Unnamed: 0,Day,Coffee Type,Units Sold,Price,Revenue
0,Monday,Espresso,25,3.0,75.0
1,Monday,Latte,15,4.5,67.5
2,Tuesday,Espresso,30,3.0,90.0
3,Tuesday,Latte,20,4.5,90.0
4,Wednesday,Espresso,35,3.0,105.0


let's imagine we didn't have all this information. Let's say we have some null values

In [38]:
coffee.loc[[0,1], 'Units Sold'] = np.nan

In [22]:
coffee

Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,
1,Monday,Latte,
2,Tuesday,Espresso,30.0
3,Tuesday,Latte,20.0
4,Wednesday,Espresso,35.0
5,Wednesday,Latte,25.0
6,Thursday,Espresso,40.0
7,Thursday,Latte,30.0
8,Friday,Espresso,45.0
9,Friday,Latte,35.0


In [23]:
coffee.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Day          14 non-null     object 
 1   Coffee Type  14 non-null     object 
 2   Units Sold   12 non-null     float64
dtypes: float64(1), object(2)
memory usage: 468.0+ bytes


In [39]:
coffee.isna().sum()

Day            0
Coffee Type    0
Units Sold     2
dtype: int64

> Well here there's few things we can do. one of them is you can do **'fillna'**

In [40]:
coffee.fillna(coffee['Units Sold'].mean())

Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,35.0
1,Monday,Latte,35.0
2,Tuesday,Espresso,30.0
3,Tuesday,Latte,20.0
4,Wednesday,Espresso,35.0
5,Wednesday,Latte,25.0
6,Thursday,Espresso,40.0
7,Thursday,Latte,30.0
8,Friday,Espresso,45.0
9,Friday,Latte,35.0


> another cool thing you can do is use **'interpolate'** instead of the mean


In [41]:
coffee['Units Sold'] = coffee['Units Sold'].interpolate()

In [42]:
coffee

Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,
1,Monday,Latte,
2,Tuesday,Espresso,30.0
3,Tuesday,Latte,20.0
4,Wednesday,Espresso,35.0
5,Wednesday,Latte,25.0
6,Thursday,Espresso,40.0
7,Thursday,Latte,30.0
8,Friday,Espresso,45.0
9,Friday,Latte,35.0


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

In [43]:
coffee.isna().sum()

Day            0
Coffee Type    0
Units Sold     2
dtype: int64

> Or maybe you just wanted to throw out any data that has 'Nan', you could do:

In [44]:
coffee.dropna()

Unnamed: 0,Day,Coffee Type,Units Sold
2,Tuesday,Espresso,30.0
3,Tuesday,Latte,20.0
4,Wednesday,Espresso,35.0
5,Wednesday,Latte,25.0
6,Thursday,Espresso,40.0
7,Thursday,Latte,30.0
8,Friday,Espresso,45.0
9,Friday,Latte,35.0
10,Saturday,Espresso,45.0
11,Saturday,Latte,35.0


> Maybe you have to be carefull on this because it drops the full entire row, maybe you wanted to do something like only if 'Units sold' was 'NaN'. Do you want to do whereas if the price is 'NaN' you could fit fill it or you can use "Subset"

In [34]:
#coffee.dropna(subset=['Units Sold'],inplace=True)

In [45]:
coffee.isna().sum()

Day            0
Coffee Type    0
Units Sold     2
dtype: int64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

> What about ia just want to get the rows with null values and those that didn't have

In [47]:
coffee[coffee['Units Sold'].isna()]

Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,
1,Monday,Latte,


In [48]:
coffee[coffee['Units Sold'].notna()]

Unnamed: 0,Day,Coffee Type,Units Sold
2,Tuesday,Espresso,30.0
3,Tuesday,Latte,20.0
4,Wednesday,Espresso,35.0
5,Wednesday,Latte,25.0
6,Thursday,Espresso,40.0
7,Thursday,Latte,30.0
8,Friday,Espresso,45.0
9,Friday,Latte,35.0
10,Saturday,Espresso,45.0
11,Saturday,Latte,35.0


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)