###### replace()
This pandas tutorial covers how the dataframe.replace method can be used to replace specific values with some other values. It supports replacement using a single value, a list, a regular expression and a dictionary. Often times you get data in one form and want to transform data into some other form as far as values are concerned. At this time replace method can be used to perform this transformation.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('D:\\Pandas\\CodeBasics\\datasets\\06weather.csv', delimiter = '\t')

In [3]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-99999,Snow
3,1/4/2017,-99999,7,0
4,1/5/2017,32,-88888,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


-99999 - I think he called this spatial value. It means that the data is missing but rather than leave the cell blank, they have the spatial values inserted. This apparently happens a lot so it is very useful to learn how to deal with them

Scenario - We are going to replace all these spatial values with NaN

###### Replacing a single value

In [4]:
new_df = df.replace(-99999, np.NaN) 

replace() takes two parameters, the value that you want to replace and the value that you want to replace it with

In [5]:
new_df # Now we can see that those spatial values (-99999) have been replaced with NaN

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,0
4,1/5/2017,32.0,-88888.0,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,0


That worked really well but what happens if we have two different spatial values? 
###### Using a list to replace more than one spatial value
We have added the other spatial value, -88888, to see what can be done about it too.

In [7]:
newer_df = df.replace([-99999, -88888], np.NaN)
newer_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,0
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,0


We can see that all the spatial values have been replaced. We were able to pass the replace method a list of values that we wanted replacing with NaN.
 
However, we still have a problem. We want to do something about the zero value in the event column. We could add it to the list of stuff that replace() will take care of but as replace() will work on ALL columns, it will replace any zeros in the temp & windspeed columns. This is problematic as they are valiud values for those columns, we just want to replace the zeros in the event column.

###### Replacement based on columns
 
To replace the values based on their columns, we can use a python dictionary...

In [9]:
newest_df = df.replace({
        'temperature': -99999,
        'windspeed': -88888,
        'event': '0'
    }, np.nan)
newest_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,-99999.0,Snow
3,1/4/2017,,7.0,
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,


So most of the spatial values have been replaced and the zeros in the event column but one spatial value still persists and I am not sure why

###### Using a mapping
This is incredibly similar to the dictionary that we used above but it is worth noting...

In [17]:
anew_df = df.replace({
        -99999: np.nan,
        -88888: np.nan,
        '0': 'Sunny',
    })
anew_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,Sunny
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,Sunny


This has worked really well! All the missing values have a value, that we set, attributed to them, bo exceptions!

###### Using regex to get rid of trailing characters
First, we need a new dataframe...

In [18]:
df02 = pd.read_csv('D:\\Pandas\\CodeBasics\\datasets\\06regex.csv', delimiter = '\t')

In [19]:
df02 # Here we want to get rid of the 'F', 'C' & 'mph' that follows our cell values

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32F,6mph,Rain
1,1/2/2017,-99999,7mph,Sunny
2,1/3/2017,28F,-99999,Snow
3,1/4/2017,-99999,7mph,0
4,1/5/2017,32C,-88888,Rain
5,1/6/2017,31C,2mph,Sunny
6,1/6/2017,34F,5mph,0


###### Scenario 1 - Wherever we see a non-digit character, a character between Aa-Zz, we want to replace that with a blank

In [20]:
new_df02 = df02.replace('[A-Za-z]', '', regex = True)
# [A-Za-z] - Means that all non-digit characters ie all letters will be replaced
# '' - They will be replaced by a blank space
#regex = True - Turns on the regular expressions option. Without this, the command would not work

In [21]:
new_df02 # OK, so this did work (kinda) as expected

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,
1,1/2/2017,-99999,7,
2,1/3/2017,28,-99999,
3,1/4/2017,-99999,7,0.0
4,1/5/2017,32,-88888,
5,1/6/2017,31,2,
6,1/6/2017,34,5,0.0


It did remove all 'F', 'C' and 'mph' as we intended. However, it also removed all the characters from the event column which was not intended. So for this to work most effectively, we have to do it based on columns...
###### Using columns based regex 

In [22]:
newer_df02 = df02.replace({'temperature': '[A-Za-z]', 'windspeed': '[A-Za-z]'}, '', regex = True)

In [23]:
newer_df02 # This allows us to keep the stuff in the event column but lose the letters elsewhere

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-99999,Snow
3,1/4/2017,-99999,7,0
4,1/5/2017,32,-88888,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


###### Replacing a list with another list

In [24]:
students = pd.DataFrame({
    'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
    'student': ['rob', 'maya', 'parthiv', 'tom', 'julian', 'erica']
})
students

Unnamed: 0,score,student
0,exceptional,rob
1,average,maya
2,good,parthiv
3,poor,tom
4,average,julian
5,exceptional,erica


###### Scenario - We want to replace the values in the score column with the numbers that are associated with those values

In [26]:
students.replace(['poor', 'average', 'good', 'exceptional'], [1,2,3,4])
# To the replace method...
# we supply the current values that we want to change
# Then we supply the values that we weant to replace them

Unnamed: 0,score,student
0,4,rob
1,2,maya
2,3,parthiv
3,1,tom
4,2,julian
5,4,erica


This is a very powerful concept and will be used a lot in data analytics!