## <font color="maroon"><h4 align="center">Handling Missing Data - replace method</font>

In [1]:
import pandas as pd
import numpy as np
df = pd.read_csv("weather_data.csv")
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-99999,Snow
3,1/4/2017,-99999,7,0
4,1/5/2017,32,-99999,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   day          7 non-null      object
 1   temperature  7 non-null      int64 
 2   windspeed    7 non-null      int64 
 3   event        7 non-null      object
dtypes: int64(2), object(2)
memory usage: 352.0+ bytes


In [7]:
for i in df.columns:
    try:
        print(i," : " ,df[i].mean())
    except:
        print("Column " + i + " is not numeric")

Column day is not numeric
temperature  :  -28548.714285714286
windspeed  :  -28567.285714285714
Column event is not numeric


In [18]:
mintemp = -100
maxtemp = 100 
df [ (df.temperature < mintemp) | (df.temperature > maxtemp)]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,-99999,7,Sunny
3,1/4/2017,-99999,7,0


**Replacing single value**

In [3]:
new_df = df.replace(-99999, value=5)
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,5,7,Sunny
2,1/3/2017,28,5,Snow
3,1/4/2017,5,7,0
4,1/5/2017,32,5,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


**Replacing list with single value**

In [4]:
new_df = df.replace(to_replace=[-99999,-88888, 10, '0'], value=5)
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,5,7,Sunny
2,1/3/2017,28,5,Snow
3,1/4/2017,5,7,5
4,1/5/2017,32,5,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,5


**Replacing per column**

In [5]:
new_df = df.replace({
        'temperature': -99999,
        'windspeed': -99999,
        'day': '1/1/2017',
        'event': '0'
    }, np.nan)
new_df

Unnamed: 0,day,temperature,windspeed,event
0,,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,


**Replacing by using mapping**

In [6]:
new_df = df.replace({
        -99999: np.nan,
        '0': 'Sunny',
    })
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,Sunny
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,Sunny


**Regex**

In [6]:
# when windspeed is 6 mph, 7 mph etc. & temperature is 32 F, 28 F etc.
new_df = df.replace({'temperature': '[A-Za-z]', 'windspeed': '[a-z]'},'', regex=True) 
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-99999,Snow
3,1/4/2017,-99999,7,0
4,1/5/2017,32,-99999,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


**Replacing list with another list**

In [2]:
df = pd.DataFrame({
    'student': ['rob', 'maya', 'parthiv', 'tom', 'julian', 'erica'],
     'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional']
})
df

Unnamed: 0,student,score
0,rob,exceptional
1,maya,average
2,parthiv,good
3,tom,poor
4,julian,average
5,erica,exceptional


In [3]:
df.score.value_counts()

exceptional    2
average        2
good           1
poor           1
Name: score, dtype: int64

In [4]:
df.replace(['poor', 'average', 'good', 'exceptional'], [4,8,16,30])

Unnamed: 0,student,score
0,rob,30
1,maya,8
2,parthiv,16
3,tom,4
4,julian,8
5,erica,30


**Task 4: replace data in specific columns with specific values**
* Example: *

Name     Age 
Ahmed     23
Ali       20
Omar      70





In [10]:
df.replace(['poor', 'average', 'good', 'exceptional'], [2,4,10,30] , inplace=True)

In [11]:
df

Unnamed: 0,student,score
0,rob,30
1,maya,4
2,parthiv,10
3,tom,2
4,julian,4
5,erica,30


In [44]:
len(df)

6

In [14]:
def replcevalues():
    result = []
    for i in df.score:

        if int(i) < 5 :
            result.append("Failed")
        elif int(i) > 5 and int(i) < 20:
            result.append("Weak")
        else:
            result.append("Okay")
            
    print(result)
    return result 

In [15]:
df["numnericalscores"]= replcevalues()
df

['Okay', 'Failed', 'Weak', 'Failed', 'Failed', 'Okay']


Unnamed: 0,student,score,numnericalscores
0,rob,30,Okay
1,maya,4,Failed
2,parthiv,10,Weak
3,tom,2,Failed
4,julian,4,Failed
5,erica,30,Okay


**Task: how to detect and remove outliers using pandas**