# Treating missing values

In [1]:
import csv
import pandas as pd
import numpy as np

In [3]:

beer = pd.read_csv ('beer_na.csv')
beer.head(2)

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,names,Id,brewerId,ABV,style,appearance,aroma,palate,taste,overall,time,profile_name,text,time2,day,month,year
0,0,0,Sausa Weizen,47986,10325,5.0,Hefeweizen,2.5,2.0,1.5,1.5,1.5,1234817823,stcules,""" A lot of foam. But a lot.\tIn the smell some...",2009-02-16,16,2,2009
1,1,1,Red Moon,48213,10325,6.2,English Strong Ale,3.0,2.5,3.0,3.0,3.0,1235915097,stcules,""" Dark red color, light beige foam, average.\t...",2009-03-01,1,3,2009


### Is there any column with missing values?

In [4]:
beer.columns[beer.isnull().any()].tolist() 

['ABV', 'profile_name', 'text']

### How many missing values are in each of those columns?

In [5]:
pd.isnull(beer).sum()

Unnamed: 0          0
Unnamed: 0.1        0
names               0
Id                  0
brewerId            0
ABV             67756
style               0
appearance          0
aroma               0
palate              0
taste               0
overall             0
time                0
profile_name      348
text              353
time2               0
day                 0
month               0
year                0
dtype: int64

### How to proceed with the missing values?
1.We are going to delete the profile name rows with missing values, since if we do not know which consumer it is, the associated data to that rows are not useful for our  recommendation system   
2.The missing  ABV  values will be replace for a "99" which is easy to identify and is not going to mixed up with the real values   
3.Lastly, the missing text values are going to be recoded as "no comments"

In [6]:
beer.ABV=beer.ABV.fillna(value=99)
beer.ABV.isnull().sum()

0

In [7]:
beer.text=beer.text.fillna(value="no comments")
beer.text.isnull().sum()

0

In [8]:
beer=beer.dropna(axis=0)

In [9]:
beer.columns[beer.isnull().any()].tolist() 

[]

In [10]:
beer.head(2)

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,names,Id,brewerId,ABV,style,appearance,aroma,palate,taste,overall,time,profile_name,text,time2,day,month,year
0,0,0,Sausa Weizen,47986,10325,5.0,Hefeweizen,2.5,2.0,1.5,1.5,1.5,1234817823,stcules,""" A lot of foam. But a lot.\tIn the smell some...",2009-02-16,16,2,2009
1,1,1,Red Moon,48213,10325,6.2,English Strong Ale,3.0,2.5,3.0,3.0,3.0,1235915097,stcules,""" Dark red color, light beige foam, average.\t...",2009-03-01,1,3,2009


In [11]:
beer.shape

(1585827, 19)

In [12]:
beer.to_csv ('beer_clean.csv')