# Handle Missing Data Part 2

In [1]:
import pandas as pd

In [14]:
df = pd.read_csv('05_weather_data.csv')
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-99999,Snow
3,1/4/2017,-99999,7,no event
4,1/5/2017,32,-99999,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,-88888,no event


Many companies across the world use this **-99999** convention for missing or NA values

We can simply change this value to 0 by using replace function.
Numpy library is a library that pandas uses underneath and numpy had a function called **nan**

In [15]:
import numpy as np
df.replace(-99999,np.nan)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,no event
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,-88888.0,no event


we can also achieve it by using only pandas as:

In [17]:
df.replace(-99999,pd.NA)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,no event
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,-88888.0,no event


In [21]:
# now see we have two values that are not correct. so we can do it by just passing these 2 values in the list.
df = pd.read_csv('05_weather_data.csv')
import numpy as np
df.replace(to_replace = [-88888,-99999], value = np.nan)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,no event
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,,no event


In [22]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-99999,Snow
3,1/4/2017,-99999,7,no event
4,1/5/2017,32,-99999,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,-88888,no event


We can also replace all the values of all the rows and columns with different datatypes, by the help of dictionary.

In [25]:
df.replace({
    'temperature':-99999,
    'windspeed':[-99999,-88888],
    'event': 'no event'
},np.nan)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,,


We can achieve that in the same manner

In [27]:
df.replace({
    -99999:np.nan,
    -88888:np.nan,
    'no event': 'Very Hot' #or np.nan
})

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,Very Hot
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,,Very Hot


so **replace** allows you to specifiy old value and new value, so in the dictionary the key is old and the value is the new value.

In [29]:
new_df = pd.DataFrame({
        'student':['Asim','Shaq','Munir','khan','Danish'],
    'score':['average','exceptional','good','average','poor']
})
new_df

Unnamed: 0,student,score
0,Asim,average
1,Shaq,exceptional
2,Munir,good
3,khan,average
4,Danish,poor


Assume that you have been hired by a tech company where they conduct all these tests and you have this dataframe where the **test** are **qualitative** i-e exceptional, good, average, poor etc. and you want to make them **quantitative** i-e to assign numbers to them.

In [35]:
new_df.replace(to_replace = ['poor','average','good','exceptional'], value = [30,50,70,90])

  new_df.replace(to_replace = ['poor','average','good','exceptional'], value = [30,50,70,90])


Unnamed: 0,student,score
0,Asim,50
1,Shaq,90
2,Munir,70
3,khan,50
4,Danish,30
