## Topic: Robust Scaling Method

### OUTCOMES

- 1. Introduction of Robust Scaling Method.


- 2. Code Implementation of Robust Scaling Method.


### 1. Introduction of Robust Scaling Method.

- Defintion:
    - A robust technique to convert features values to a specific range.
    - range can be (0- ..) [standard]

- formula:
    - Robust_scale = (xi - median)/ IQR

    - where 
        - median sort middle value
        - IQR = Q3 - Q1
    
- Use:
    - A better Choice when data distribution are skewed or contain extreme value (outliers).

    - Use median and IQR to reduce outliers influence.



- NOTE:
    - The Robust scaling value are not always range (0-1). thereform we can easily detect outliers.


### 2. Code Implementation of Robust Scaling Method.


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
# Robust scaling for real world data

df = pd.read_csv('Data.csv')

df

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,72000.0,No
1,Spain,27.0,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
4,Germany,40.0,,Yes
5,France,35.0,58000.0,Yes
6,Spain,,52000.0,No
7,France,48.0,79000.0,Yes
8,Germany,50.0,83000.0,No
9,France,37.0,67000.0,Yes


In [None]:
# identify and handle(remove) the missing value

df1 = df.dropna()

df1

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,72000.0,No
1,Spain,27.0,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
5,France,35.0,58000.0,Yes
7,France,48.0,79000.0,Yes
8,Germany,50.0,83000.0,No
9,France,37.0,67000.0,Yes


In [None]:
# apply robus scaling method (rb = (xi - median) / IQR)

# find median (sort and median value)

df1.sort_values(by = ['Age', 'Salary'], ascending = [1,1], inplace = True)

# median Age column
median_age = df1['Age'].median()

# median Salary column
median_salary = df1['Salary'].median()



37.5 64000.0


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1.sort_values(by = ['Age', 'Salary'], ascending = [1,1], inplace = True)


In [16]:
print(median_age, median_salary)

37.5 64000.0


In [18]:
# IQR Age column
q1_age = df1['Age'].quantile(0.25)
q3_age = df1['Age'].quantile(0.75)

iqr_age = q3_age - q1_age


# IQR Salary column
q1_salary = df1['Salary'].quantile(0.25)
q3_salary = df1['Salary'].quantile(0.75)

iqr_salary = q3_salary - q1_salary


print("IQR Age: ", iqr_age)
print("IQR Salary: ", iqr_salary)


IQR Age:  11.25
IQR Salary:  16750.0


- decision?
    - middle 50% value of Age column (age) = 11.25
    - middle 50% value of Salary column (Salry) = 16750.0


In [None]:
# apply robus scaling method (rb = (xi - median) / IQR)

rb_age_scale = (df1['Age'] - median_age)/iqr_age

rb_salary_scale = (df1['Salary'] - median_salary)/iqr_salary

print(rb_age_scale)

print(rb_salary_scale)

1   -0.933333
2   -0.666667
5   -0.222222
9   -0.044444
3    0.044444
0    0.577778
7    0.933333
8    1.111111
Name: Age, dtype: float64
1   -0.955224
2   -0.597015
5   -0.358209
9    0.179104
3   -0.179104
0    0.477612
7    0.895522
8    1.134328
Name: Salary, dtype: float64


In [24]:
# add to column

df1['Age_Scaling'] = rb_age_scale

df1['Salry_Scaling'] = rb_salary_scale

df1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['Age_Scaling'] = rb_age_scale
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['Salry_Scaling'] = rb_salary_scale


Unnamed: 0,Country,Age,Salary,Purchased,Age_Scaling,Salry_Scaling
1,Spain,27.0,48000.0,Yes,-0.933333,-0.955224
2,Germany,30.0,54000.0,No,-0.666667,-0.597015
5,France,35.0,58000.0,Yes,-0.222222,-0.358209
9,France,37.0,67000.0,Yes,-0.044444,0.179104
3,Spain,38.0,61000.0,No,0.044444,-0.179104
0,France,44.0,72000.0,No,0.577778,0.477612
7,France,48.0,79000.0,Yes,0.933333,0.895522
8,Germany,50.0,83000.0,No,1.111111,1.134328
