# Feature Scaling or Standardization Introduction
It is a step of Data Pre Processing that is applied to independent variables or features of data. It helps to normalize the data within a particular range. Sometimes, it also helps in speeding up the calculations in an algorithm.

## Why and Where to Apply Feature Scaling? 

- The real-world dataset contains features that highly vary in magnitudes, units, and range. Normalization should be performed when the scale of a    feature is irrelevant or misleading and should not normalize when the scale is meaningful.

- The algorithms which use Euclidean Distance eg (KNN) measures are sensitive to Magnitudes. Here feature scaling helps to weigh all the features equally.

- Formally, If a feature in the dataset is big in scale compared to others then in algorithms where Euclidean distance is measured this big scaled feature becomes dominating and needs to be normalized. 

## Feature scaling mostly are of 2 types :
- Standardization (z-score)
- Normalization(min-max)

## Standardization

Here all the features will be transformed in usch a way that it will have the properties of a standard normal distribution with mean(μ) = 0 and standard deviation(σ) = 1

![image.png](attachment:be20f8a8-e716-4778-b89a-6ba941e15ab7.png)


# Note : Feature scaling is performed after splitting the dataset

After scaling the training data, you should use the same scaling parameters to transform the test data.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv(r'C:\Users\admin\Downloads\Dataset_03.csv')

In [3]:
df.head()

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience,Purchased
0,0,0,1,0,39343,1.1,0
1,0,1,0,0,46205,1.3,1
2,0,1,0,0,37731,1.5,0
3,0,1,0,0,43525,2.0,0
4,0,0,0,1,39891,2.2,0


In [4]:
df.columns

Index(['Australia', 'Canada', 'Dubai', 'USA', 'Salary', 'YearsExperience',
       'Purchased'],
      dtype='object')

In [5]:
# Train Test split 
y = df['Purchased']
x = df[['Australia', 'Canada', 'Dubai', 'USA', 'Salary', 'YearsExperience']]

In [6]:
from sklearn.model_selection import train_test_split 
x_train , x_test , y_train , y_test = train_test_split(x,y,random_state=2529)

In [7]:
x_train

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
2,0,1,0,0,37731,1.5
22,1,0,0,0,101302,7.9
6,0,1,0,0,60150,3.0
12,0,0,1,0,56957,4.0
28,0,0,0,1,122391,10.3
7,1,0,0,0,54445,3.2
19,0,1,0,0,93940,6.0
10,0,0,1,0,63218,3.9
29,0,1,0,0,121872,10.5
9,0,0,1,0,57189,3.7


In [8]:
x_test

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
1,0,1,0,0,46205,1.3
17,0,1,0,0,83088,5.3
16,0,0,0,1,66029,5.1
23,1,0,0,0,113812,8.2
25,1,0,0,0,105582,9.0
3,0,1,0,0,43525,2.0
20,0,1,0,0,91738,6.8
26,0,0,1,0,116969,9.5


In [9]:
# from sklearn.preprocessing import StandardScaler 
# sc = StandardScaler()
# x_train.iloc[:,4:] = sc.fit_transform(x_train.iloc[:,4:])

In [10]:
from sklearn.preprocessing import StandardScaler 
sc = StandardScaler()
x_train[['Salary','YearsExperience']] = sc.fit_transform(x_train[['Salary','YearsExperience']])

In [23]:
x_train.describe()

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
count,22.0,22.0,22.0,22.0,22.0,22.0
mean,0.227273,0.227273,0.318182,0.227273,0.42043,0.425532
std,0.428932,0.428932,0.476731,0.428932,0.31898,0.297075
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.224306,0.223404
50%,0.0,0.0,0.0,0.0,0.288607,0.31383
75%,0.0,0.0,1.0,0.0,0.702324,0.609043
max,1.0,1.0,1.0,1.0,1.0,1.0


In [12]:
x_test[['Salary','YearsExperience']] = sc.transform(x_test[['Salary','YearsExperience']])


In [24]:
x_test.describe()

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
count,8.0,8.0,8.0,8.0,8.0,8.0
mean,0.25,0.5,0.125,0.125,0.539068,0.510638
std,0.46291,0.534522,0.353553,0.353553,0.342595,0.326758
min,0.0,0.0,0.0,0.0,0.068438,0.021277
25%,0.0,0.0,0.0,0.0,0.275715,0.343085
50%,0.0,0.5,0.0,0.0,0.586841,0.526596
75%,0.25,1.0,0.0,0.0,0.825756,0.776596
max,1.0,1.0,1.0,1.0,0.935956,0.893617


# Normalization(Min-Max) Scaler 

![image.png](attachment:08fc9e06-1283-4475-aec6-61266691f9da.png)

In this approach we will scale down the values of the features between 0 to 1

In [14]:
from sklearn.preprocessing import MinMaxScaler
mms = MinMaxScaler()

In [15]:
df.head()

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience,Purchased
0,0,0,1,0,39343,1.1,0
1,0,1,0,0,46205,1.3,1
2,0,1,0,0,37731,1.5,0
3,0,1,0,0,43525,2.0,0
4,0,0,0,1,39891,2.2,0


In [16]:
df.columns


Index(['Australia', 'Canada', 'Dubai', 'USA', 'Salary', 'YearsExperience',
       'Purchased'],
      dtype='object')

In [17]:
x_train[['Salary', 'YearsExperience']] = mms.fit_transform(x_train[['Salary', 'YearsExperience']])


In [21]:
x_train.describe()

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
count,22.0,22.0,22.0,22.0,22.0,22.0
mean,0.227273,0.227273,0.318182,0.227273,0.42043,0.425532
std,0.428932,0.428932,0.476731,0.428932,0.31898,0.297075
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.224306,0.223404
50%,0.0,0.0,0.0,0.0,0.288607,0.31383
75%,0.0,0.0,1.0,0.0,0.702324,0.609043
max,1.0,1.0,1.0,1.0,1.0,1.0


In [19]:
x_test[['Salary','YearsExperience']] = mms.transform(x_test[['Salary','YearsExperience']])


In [25]:
x_test.describe()

Unnamed: 0,Australia,Canada,Dubai,USA,Salary,YearsExperience
count,8.0,8.0,8.0,8.0,8.0,8.0
mean,0.25,0.5,0.125,0.125,0.539068,0.510638
std,0.46291,0.534522,0.353553,0.353553,0.342595,0.326758
min,0.0,0.0,0.0,0.0,0.068438,0.021277
25%,0.0,0.0,0.0,0.0,0.275715,0.343085
50%,0.0,0.5,0.0,0.0,0.586841,0.526596
75%,0.25,1.0,0.0,0.0,0.825756,0.776596
max,1.0,1.0,1.0,1.0,0.935956,0.893617
