# FEATURE SCALING

# Why Should we Use Feature Scaling?

Some machine learning algorithms are sensitive to feature scaling while others are virtually invariant to it.

## Gradient Descent Based Algorithms

Machine learning algorithms like linear regression, logistic regression, neural network, etc. that use gradient descent as an optimization technique require data to be scaled. 

The presence of feature value X in the formula will affect the step size of the gradient descent. The difference in ranges of features will cause different step sizes for each feature. To ensure that the gradient descent moves smoothly towards the minima and that the steps for gradient descent are updated at the same rate for all the features, we scale the data before feeding it to the model.

Having features on a similar scale can help the gradient descent converge more quickly towards the minima.

### Distance-Based Algorithms

Distance algorithms like KNN, K-means, and SVM are most affected by the range of features. This is because behind the scenes they are using distances between data points to determine their similarity.

## Tree-Based Algorithms

Tree-based algorithms, on the other hand, are fairly insensitive to the scale of the features. Think about it, a decision tree is only splitting a node based on a single feature. The decision tree splits a node on a feature that increases the homogeneity of the node. This split on a feature is not influenced by other features.

# What is Normalization?

Normalization is a scaling technique in which values are rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max Scaling.
<br><br>
When the value of X is the minimum value in the column, the numerator will be 0, and hence X’ is 0<br>
On the other hand, when the value of X is the maximum value in the column, the numerator is equal to the denominator and thus the value of X’ is 1<br>
If the value of X is between the minimum and the maximum value, then the value of X’ is between 0 and 1<br>
 

# What is Standardization?

Standardization is another feature scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.

# Normalize or Standardize?

Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbors and Neural Networks.<br>
Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does not have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.<br>

# Implementing Feature Scaling in Python

![1.png](attachment:1.png)

![2.webp](attachment:2.webp)

# Normalization using sklearn

To normalize your data, you need to import the MinMaxScalar from the sklearn library and apply it to our dataset.

![3.png](attachment:3.png)

![4.png](attachment:4.png)

# Standardization using sklearn

To standardize your data, you need to import the StandardScalar from the sklearn library and apply it to our dataset. 

![5.png](attachment:5.png)

![6.webp](attachment:6.webp)

You would have noticed that I only applied standardization to my numerical columns and not the other One-Hot Encoded features. Standardizing the One-Hot encoded features would mean assigning a distribution to categorical features.

But why did I not do the same while normalizing the data? Because One-Hot encoded features are already in the range between 0 to 1. So, normalization would not affect their value.

The numerical features are now centered on the mean with a unit standard deviation. 

# Comparing unscaled, normalized and standardized data

![7.webp](attachment:7.webp)

# Applying Scaling to Machine Learning Algorithms

## K-Nearest Neighbours

Like we saw before, KNN is a distance-based algorithm that is affected by the range of features. Let’s see how it performs on our data, before and after scaling:

![8.png](attachment:8.png)

![9.png](attachment:9.png)

You can see that scaling the features has brought down the RMSE score of our KNN model. Specifically, the normalized data performs a tad bit better than the standardized data.

# Support Vector Regressor

SVR is distance-based algorithm. So let’s check out whether it works better with normalization or standardization:<br>

![10.png](attachment:10.png)

![11.png](attachment:11.png)

We can see that scaling the features does bring down the RMSE score. And the standardized data has performed better than the normalized data.

# Decision Tree

We already know that a Decision tree is invariant to feature scaling. 

![12.png](attachment:12.png)

![13.png](attachment:13.png)

You can see that the RMSE score has not moved an inch on scaling the features.