# Feature Transformation and Scaling Techniques to Boost Your Model Performance

Reference Article: <a href = "https://www.analyticsvidhya.com/blog/2020/07/types-of-feature-transformation-and-scaling/"> A Must Read Article </a>


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


df = pd.DataFrame({
    'Income': [15000, 1800, 120000, 10000],
    'Age': [25, 18, 42, 51],
    'Department': ['HR','Legal','Marketing','Management']
})

In [2]:
df.head()

Unnamed: 0,Income,Age,Department
0,15000,25,HR
1,1800,18,Legal
2,120000,42,Marketing
3,10000,51,Management


In [3]:
df_scaled = df.copy()
col_names = ['Income', 'Age']
features = df_scaled[col_names]

### Applying Min-Max Scaler

The MinMax scaler is one of the simplest scalers to understand.  It just scales all the data between 0 and 1. The formula for calculating the scaled value is-

$$\large x_{scaled} = \frac{(x – x_{min})}{(x_{max} – x_{min})}$$

Though (0, 1) is the default range, we can define our range of max and min values as well.

In [4]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

In [5]:
df_scaled[col_names] = scaler.fit_transform(features.values)

In [6]:
df_scaled.head()

Unnamed: 0,Income,Age,Department
0,0.111675,0.212121,HR
1,0.0,0.0,Legal
2,1.0,0.727273,Marketing
3,0.069374,1.0,Management


# Applying Standard Scaler

For each feature, the Standard Scaler scales the values such that the mean is 0 and the standard deviation is 1(or the variance).

$$\large x_{scaled} = \frac{x – mean}{Standard Deviation} $$

However, Standard Scaler assumes that the distribution of the variable is normal. Thus, in case, the variables are not normally distributed, we

- either choose a different scaler
- or first, convert the variables to a normal distribution and then apply this sca