## <font color="maroon"><h4 align="center">Handling Missing Data - replace method</font>

Instead of NaN values if we have other numbers we cannot use all the function which we have seen earlier. so we can replace the speacial values with the Nan values.

In [2]:
import pandas as pd
import numpy as np
df = pd.read_csv("/content/sample_data/weather_data.csv")
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-88888,Snow
3,1/4/2017,-88888,7,0
4,1/5/2017,32,-99999,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


**Replacing single value**

In [3]:
new_df = df.replace(-99999, value=np.NaN)  # we are replacing 99999 with NaN values
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,-88888.0,Snow
3,1/4/2017,-88888.0,7.0,0
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,0


**Replacing list with single value**

we use to_replace function to give what values we want to replace.

In [4]:
new_df = df.replace(to_replace=[-99999,-88888], value=0) # if we want to replace the list of values with single value then use this
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,0,7,Sunny
2,1/3/2017,28,0,Snow
3,1/4/2017,0,7,0
4,1/5/2017,32,0,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


In [5]:
new_df = df.replace(to_replace=[-99999,-88888], value=[0,np.nan]) # if we want to replace the list of values with single value then use this
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,0.0,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,0
4,1/5/2017,32.0,0.0,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,0


**Replacing per column**

In [6]:
new_df = df.replace({
        'temperature': -99999,
        'windspeed': -88888,
        'event': '0'
    }, np.NaN)   # replacing different column values with np.Nan
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,-88888.0,7.0,
4,1/5/2017,32.0,-99999.0,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,


**Replacing by using mapping**

In [7]:
new_df = df.replace({
        -99999: np.nan,
        -88888:np.nan,
        'no event': 'Sunny',
    })  # mapping the values we want to other values
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,0
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,0


**Regex**

In [10]:
df=pd.read_csv("/content/sample_data/weather_data_regex.csv")
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32F,6,Rain
1,1/2/2017,-99999,7mph,Sunny
2,1/3/2017,28,-88888,Snow
3,1/4/2017,-88888,7,0
4,1/5/2017,32,-99999,Rain
5,1/6/2017,31F,2,Sunny
6,1/6/2017,34,5mph,0


In [12]:
# when windspeed some values are in mph. & temperature is in F remove those by using regular expression syntaxes.
new_df = df.replace({'temperature': '[A-Za-z]', 'windspeed': '[a-z]'},'', regex=True)   
# for the temp column when the values have the alphabets a-zor any A-Z and the windspeed has alphabets a-z then replace them with blank ''
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-88888,Snow
3,1/4/2017,-88888,7,0
4,1/5/2017,32,-99999,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


**Replacing list with another list**

In [14]:
df = pd.DataFrame({
    'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
    'student': ['ramya', 'ramya_1' ,'ramya_2', 'ramya_3', 'ramya_4', 'ramya_4']
})
df

Unnamed: 0,score,student
0,exceptional,ramya
1,average,ramya_1
2,good,ramya_2
3,poor,ramya_3
4,average,ramya_4
5,exceptional,ramya_4


In [15]:
df.replace(['poor', 'average', 'good', 'exceptional'], [1,2,3,4]) #whenever average is there it gets replaced by 2..here we are replacing one list with other list

Unnamed: 0,score,student
0,4,ramya
1,2,ramya_1
2,3,ramya_2
3,1,ramya_3
4,2,ramya_4
5,4,ramya_4
