##  Feature Scaling

->Feature Scaling is a technique to standardize the independent variables present in the data in a fixed range. 

->It is the step performed during the data pre-processing. 

->Example :

If we have weight of a person in a dataset with values in the range 15kg to 100kg, then feature scaling transforms all the values to the range 0 to 1 where 0 represents lowest weight and 1 represents highest weight instead of representing the weights in kgs.

->Feature scaling is essential for machine learning algorithms that calculate distances between data.

## Techniques Of Feature Scaling 

I- Standardization

II- Normalization

III- Robust Scaling

-> fit() or fit_transform() method is used for feature scaling.

## Standardization

-> Standardization is the feature scaling technique that scales the features/variables based on standard normal or gaussian distribution.

-> It is not bounded to certain range.

-> It is also known as z-score normalization.

-> Mean and standard deviation is used for this technique.

z=(x-mean)/s.d ,

where mean=0 and s.d=1

-> Scikit-Learn provides a transformer called StandardScaler for standardization.

## Normalization

-> Normalization is the feature scaling technique that scales the features/variables where the distribution is not known.

-> It ranges the features between 0,1 or -1,1.

-> Minimum and maximum value of features are used for scaling.

X norm = (X - X min)/(X max - X min) ,

where X min= minimum value and X max= maximum value

-> It is also known as scaling normalization.

-> Scikit-Learn provides a transformer called MinMaxScaler for Normalization.

## Robust Scaling

-> Robust scaling is the feature scaling technique that scales features using statistics that are robust to outliers.

-> It is used to scale the feature to median and quantiles Scaling using median and quantiles consists of substracting the median to all the observations, and then dividing by the interquantile difference. The interquantile difference is the difference between the 75th and 25th quantile.

-> IQR = 75th quantile - 25th quantile

and 

X_scaled = (X - X_median) / IQR

## Implementating On Dataset 

In [1]:
import pandas as pd
import numpy as np

In [2]:
df=pd.read_csv("titanic.csv")
df.head(5)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [3]:
df=pd.read_csv("titanic.csv", usecols = ["Pclass","Age","Survived","Fare"])
df.head(5)

Unnamed: 0,Survived,Pclass,Age,Fare
0,0,3,22.0,7.25
1,1,1,38.0,71.2833
2,1,3,26.0,7.925
3,1,1,35.0,53.1
4,0,3,35.0,8.05


In [4]:
df.isnull().sum()

Survived      0
Pclass        0
Age         177
Fare          0
dtype: int64

In [5]:
df.fillna(df["Age"].median(),inplace=True)

In [6]:
df.isnull().sum()

Survived    0
Pclass      0
Age         0
Fare        0
dtype: int64

### Standardization Method : 

In [7]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()

In [8]:
df_std=scaler.fit_transform(df)

In [9]:
df_std

array([[-0.78927234,  0.82737724, -0.56573646, -0.50244517],
       [ 1.2669898 , -1.56610693,  0.66386103,  0.78684529],
       [ 1.2669898 ,  0.82737724, -0.25833709, -0.48885426],
       ...,
       [-0.78927234,  0.82737724, -0.1046374 , -0.17626324],
       [ 1.2669898 , -1.56610693, -0.25833709, -0.04438104],
       [-0.78927234,  0.82737724,  0.20276197, -0.49237783]])

In [10]:
df_s=pd.DataFrame(df_std,columns=df.columns)

In [11]:
df_s.head(5)

Unnamed: 0,Survived,Pclass,Age,Fare
0,-0.789272,0.827377,-0.565736,-0.502445
1,1.26699,-1.566107,0.663861,0.786845
2,1.26699,0.827377,-0.258337,-0.488854
3,1.26699,-1.566107,0.433312,0.42073
4,-0.789272,0.827377,0.433312,-0.486337


### Normalization Method :

In [12]:
from sklearn.preprocessing import MinMaxScaler
min_max=MinMaxScaler()

In [13]:
df_mm=min_max.fit_transform(df)

In [14]:
df_mm

array([[0.        , 1.        , 0.27117366, 0.01415106],
       [1.        , 0.        , 0.4722292 , 0.13913574],
       [1.        , 1.        , 0.32143755, 0.01546857],
       ...,
       [0.        , 1.        , 0.34656949, 0.04577135],
       [1.        , 0.        , 0.32143755, 0.0585561 ],
       [0.        , 1.        , 0.39683338, 0.01512699]])

In [15]:
df_m=pd.DataFrame(df_mm,columns=df.columns)

In [16]:
df_m.head(5)

Unnamed: 0,Survived,Pclass,Age,Fare
0,0.0,1.0,0.271174,0.014151
1,1.0,0.0,0.472229,0.139136
2,1.0,1.0,0.321438,0.015469
3,1.0,0.0,0.434531,0.103644
4,0.0,1.0,0.434531,0.015713


### Robust Scaling Method :

In [17]:
from sklearn.preprocessing import RobustScaler
rs=RobustScaler()

In [18]:
df_rs=rs.fit_transform(df)

In [19]:
df_rs

array([[ 0.        ,  0.        , -0.46153846, -0.3120106 ],
       [ 1.        , -2.        ,  0.76923077,  2.46124229],
       [ 1.        ,  0.        , -0.15384615, -0.28277666],
       ...,
       [ 0.        ,  0.        ,  0.        ,  0.38960398],
       [ 1.        , -2.        , -0.15384615,  0.67328148],
       [ 0.        ,  0.        ,  0.30769231, -0.29035583]])

In [20]:
df_rob=pd.DataFrame(df_rs,columns=df.columns)

In [21]:
df_rob.head(5)

Unnamed: 0,Survived,Pclass,Age,Fare
0,0.0,0.0,-0.461538,-0.312011
1,1.0,-2.0,0.769231,2.461242
2,1.0,0.0,-0.153846,-0.282777
3,1.0,-2.0,0.538462,1.673732
4,0.0,0.0,0.538462,-0.277363
