<h1 style="text-align : center"> <font color="red" size=7>FEATURE SCALING </h1>

## <font color="dark blue">WHAT IS FEATURE SCALING?<font color/>
- Feature Scaling is a technique to standardize the independent features present in the data in a fixed range.

## <font color="dark blue">WHY FEATURE SCALING IS IMPORTANT?<font color/>
- Improves Model Convergence (Especially for Gradient-Based Algorithms)
- Prevents Bias Toward Larger Magnitudes.
- Avoids Numerical Instability
- Improves Performance of Regularization Techniques
- Enhances Interpretability of Results

## <font color="dark blue">TYPES OF FEATURE SCALING?<font color/>

![FEATURE%20SCALING.png](attachment:FEATURE%20SCALING.png)

### 

<h1 style="text-align : center"> <font color="purple" size=7>1. STANDARDIZATION </h1>

## <font color="dark blue">WHAT IS STANDARDIZATION?<font color/>
- Standardization is a feature scaling technique that transforms data to have a `Mean` of __0__ and a `Standard Deviation` of __1__. 
- It helps machine learning models work better by making all features comparable.
- It also known as __Z-Score Normalization__.

## <font color="blue">FORMULA<font color/>
$$\large X' = \frac{X_i-\mu}{\sigma} $$
    $$\\$$
  $$ where, $$
    $$ \\ $$
    $$ X_i \rightarrow Observations $$
    $$ \\ $$
    $$ \mu \rightarrow Mean $$
    $$ \\ $$
    $$ \sigma \rightarrow Standard Deviation $$

## <font color="blue">CODE<font color/>

In [3]:
# IMPORTING LIBRARIES
import pandas as pd
import numpy as np

In [6]:
# GATHERING DATA
df=pd.read_csv(r"D:\DUDUL DS\CAMPUSX\ML\3. ML TECHNIQUES\3. FEATURE SCALING\2. STANDARDIZATION\Social_Network_Ads (1).csv")

In [7]:
df

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0
...,...,...,...,...,...
395,15691863,Female,46,41000,1
396,15706071,Male,51,23000,1
397,15654296,Female,50,20000,1
398,15755018,Male,36,33000,0


In [8]:
df=df.iloc[:,2:]

In [9]:
df

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0
...,...,...,...
395,46,41000,1
396,51,23000,1
397,50,20000,1
398,36,33000,0


In [10]:
# INDEPENDENT FEATURES
X=df.drop(["Purchased"],axis=1)

In [11]:
# TARGET FEATURE
y=df["Purchased"]

In [12]:
# SPLIT THE DATA
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)

In [13]:
x_train.shape

(280, 2)

In [14]:
x_test.shape

(120, 2)

In [15]:
# STANDARDIZATION

from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()

# Fit the Scaler to the train set, it will learn the parameters
scaler.fit(x_train)

# Transform train & test sets
x_train_scaled=scaler.transform(x_train)
x_test_scaled=scaler.transform(x_test)

In [16]:
scaler.mean_

array([3.78642857e+01, 6.98071429e+04])

In [17]:
# Convert numpy array to Pandas Dataframe
x_train_scaled=pd.DataFrame(x_train_scaled,columns=x_train.columns)
x_test_scaled=pd.DataFrame(x_test_scaled,columns=x_test.columns)

### 

<h1 style="text-align : center"> <font color="purple" size=7>2. NORMALIZATION </h1>

## <font color="dark blue">WHAT IS NORMALIZATION?<font color/>
- Normalization is a technique often applied as part of data preparation for machine learning.
- The goal of Normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.

![NORMALIZATION.png](attachment:NORMALIZATION.png)

## <font color="dark blue">WHAT IS MINMAX SCALING?<font color/>
- It is a feature scaling technique that transforms data to a range of __0__ to __1__. 
- It first take the Numerical Feature then know about the Minimum & Maximum Value of that feature then apply the formula.

## <font color="blue">FORMULA<font color/>
$$\large X_i' = \frac{X_i - X_{min}}{X_{max} - X_{min}}$$

## <font color="blue">CODE<font color/>

In [18]:
df=pd.read_csv(r"D:\DUDUL DS\CAMPUSX\ML\3. ML TECHNIQUES\3. FEATURE SCALING\1. NORMALIZATION\wine_data (1).csv",header=None,usecols=[0,1,2])
df.columns=["Class label","Alcohol","Malic Acid"]

In [19]:
df

Unnamed: 0,Class label,Alcohol,Malic Acid
0,1,14.23,1.71
1,1,13.20,1.78
2,1,13.16,2.36
3,1,14.37,1.95
4,1,13.24,2.59
...,...,...,...
173,3,13.71,5.65
174,3,13.40,3.91
175,3,13.27,4.28
176,3,13.17,2.59


In [20]:
X=df.drop(["Class label"],axis=1)
y=df["Class label"]

In [21]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)

In [22]:
x_train.shape

(124, 2)

In [23]:
x_test.shape

(54, 2)

In [24]:
from sklearn.preprocessing import MinMaxScaler
mms=MinMaxScaler()

mms.fit(x_train)

x_train_scaled=mms.transform(x_train)
x_test_scaled=mms.transform(x_test)

In [25]:
x_train_scaled=pd.DataFrame(x_train_scaled,columns=x_train.columns)
x_test_scaled=pd.DataFrame(x_test_scaled,columns=x_test.columns)

## <font color="dark blue">WHAT IS MEAN NORMALIZATION?<font color/>
- It is a feature scaling technique that transforms data to a range of -1 to 1.
- It is rarely used technique.
- There is no class in scikit-learn.

## <font color="blue">FORMULA<font color/>
$$\large X_i' = \frac{X_i - X_{mean}}{X_{max} - X_{min}}$$

## <font color="dark blue">WHAT IS MAX ABSOLUTE SCALING?<font color/>
- It is a feature scaling technique that transforms data.
- It is also rarely used technique.
- It is used in Sparse Dataset.

## <font color="blue">FORMULA<font color/>
$$\large X_i' = \frac{X_i}{|X_{max}|}$$

## <font color="blue">CODE<font color/>

In [27]:
df=pd.read_csv(r"D:\DUDUL DS\CAMPUSX\ML\3. ML TECHNIQUES\3. FEATURE SCALING\1. NORMALIZATION\wine_data (1).csv",header=None,usecols=[0,1,2])
df.columns=["Class label","Alcohol","Malic Acid"]

In [28]:
df

Unnamed: 0,Class label,Alcohol,Malic Acid
0,1,14.23,1.71
1,1,13.20,1.78
2,1,13.16,2.36
3,1,14.37,1.95
4,1,13.24,2.59
...,...,...,...
173,3,13.71,5.65
174,3,13.40,3.91
175,3,13.27,4.28
176,3,13.17,2.59


In [29]:
X=df.drop(["Class label"],axis=1)
y=df["Class label"]

In [30]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)

In [31]:
x_train.shape

(124, 2)

In [32]:
x_test.shape

(54, 2)

In [33]:
from sklearn.preprocessing import MaxAbsScaler
mas=MaxAbsScaler()

mas.fit(x_train)

x_train_scaled=mas.transform(x_train)
x_test_scaled=mas.transform(x_test)

In [34]:
x_train_scaled=pd.DataFrame(x_train_scaled,columns=x_train.columns)
x_test_scaled=pd.DataFrame(x_test_scaled,columns=x_test.columns)

## <font color="dark blue">WHAT IS ROBUST SCALING?<font color/>
- It is a feature scaling technique that transforms data. 
- It is robust to outlier.

## <font color="blue">FORMULA<font color/>
$$\large X_i' = \frac{X_i-X_{median}}{IQR}$$

## <font color="blue">CODE<font color/>

In [35]:
df=pd.read_csv(r"D:\DUDUL DS\CAMPUSX\ML\3. ML TECHNIQUES\3. FEATURE SCALING\1. NORMALIZATION\wine_data (1).csv",header=None,usecols=[0,1,2])
df.columns=["Class label","Alcohol","Malic Acid"]

In [36]:
df

Unnamed: 0,Class label,Alcohol,Malic Acid
0,1,14.23,1.71
1,1,13.20,1.78
2,1,13.16,2.36
3,1,14.37,1.95
4,1,13.24,2.59
...,...,...,...
173,3,13.71,5.65
174,3,13.40,3.91
175,3,13.27,4.28
176,3,13.17,2.59


In [37]:
X=df.drop(["Class label"],axis=1)
y=df["Class label"]

In [38]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)

In [39]:
from sklearn.preprocessing import RobustScaler
rob=RobustScaler()
rob.fit(x_train)

x_train_scaled=rob.transform(x_train)
x_test_scaled=rob.transform(x_test)

In [40]:
x_train_scaled=pd.DataFrame(x_train_scaled,columns=x_train.columns)
x_test_scaled=pd.DataFrame(x_test_scaled,columns=x_test.columns)

In [41]:
rob.n_features_in_

2