In [1]:
#Importing Pandas and Numpy
import pandas as pd
import numpy as np

In [29]:
df = pd.DataFrame({'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

In [3]:
df

Unnamed: 0,points,assists,rebounds
0,25,5,11
1,12,7,8
2,15,7,10
3,14,9,6
4,19,12,6
5,23,9,5
6,25,9,9
7,29,4,12


In [4]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

In [7]:
df['points'].values

array([25, 12, 15, 14, 19, 23, 25, 29], dtype=int64)

# Min- Max Scaler

## What is Normalization?
Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.

Normalization = (X - Xmin)/(Xmax - Xmin)

Here, Xmax and Xmin are the maximum and the minimum values of the feature respectively.

When the value of X is the minimum value in the column, the numerator will be 0, and hence X’ is 0
On the other hand, when the value of X is the maximum value in the column, the numerator is equal to the denominator and thus the value of X’ is 1
If the value of X is between the minimum and the maximum value, then the value of X’ is between 0 and 1

In [14]:
min_max_scaler = MinMaxScaler()
  
# Scaled feature
x_after_min_max_scaler = min_max_scaler.fit_transform(df.values)
x_after_min_max_scaler

array([[0.76470588, 0.125     , 0.85714286],
       [0.        , 0.375     , 0.42857143],
       [0.17647059, 0.375     , 0.71428571],
       [0.11764706, 0.625     , 0.14285714],
       [0.41176471, 1.        , 0.14285714],
       [0.64705882, 0.625     , 0.        ],
       [0.76470588, 0.625     , 0.57142857],
       [1.        , 0.        , 1.        ]])

In [20]:
# Alternatively
for i in df.columns:
    col = i+'_minmax'
    df[col] = (df[i]-df[i].min())/(df[i].max()-df[i].min())

In [21]:
df

Unnamed: 0,points,assists,rebounds,points_minmax,assists_minmax,rebounds_minmax
0,25,5,11,0.764706,0.125,0.857143
1,12,7,8,0.0,0.375,0.428571
2,15,7,10,0.176471,0.375,0.714286
3,14,9,6,0.117647,0.625,0.142857
4,19,12,6,0.411765,1.0,0.142857
5,23,9,5,0.647059,0.625,0.0
6,25,9,9,0.764706,0.625,0.571429
7,29,4,12,1.0,0.0,1.0


## What is Standardization?
Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.

Here’s the formula for standardization:

Standardization = (X - Xmean)/STD

In [16]:
Standard_scaler = StandardScaler()
  
# Scaled feature
x_after_Standard_scaler = Standard_scaler.fit_transform(df.values)
x_after_Standard_scaler

array([[ 0.82452977, -1.15311332,  1.09619108],
       [-1.43207802, -0.31448545, -0.15659873],
       [-0.91132238, -0.31448545,  0.67859448],
       [-1.08490759,  0.52414242, -0.99179193],
       [-0.21698152,  1.78208422, -0.99179193],
       [ 0.47735934,  0.52414242, -1.40938853],
       [ 0.82452977,  0.52414242,  0.26099788],
       [ 1.51887063, -1.57242726,  1.51378768]])

In [30]:
#ALternatively
for i in df.columns:
    col = i+'_standard'
    df[col] = (df[i]-df[i].mean())/np.std(df[i])

In [31]:
df

Unnamed: 0,points,assists,rebounds,points_standard,assists_standard,rebounds_standard
0,25,5,11,0.82453,-1.153113,1.096191
1,12,7,8,-1.432078,-0.314485,-0.156599
2,15,7,10,-0.911322,-0.314485,0.678594
3,14,9,6,-1.084908,0.524142,-0.991792
4,19,12,6,-0.216982,1.782084,-0.991792
5,23,9,5,0.477359,0.524142,-1.409389
6,25,9,9,0.82453,0.524142,0.260998
7,29,4,12,1.518871,-1.572427,1.513788


## When to use normalization and standardization
- When you don’t know the distribution of your data or when you know it’s not Gaussian, normalization is a smart approach to apply. Normalization is useful when your data has variable scales and the technique you’re employing, such as k-nearest neighbors and artificial neural networks, doesn’t make assumptions about the distribution of your data.
- The assumption behind standardization is that your data follows a Gaussian (bell curve) distribution. This isn’t required, however, it helps the approach work better if your attribute distribution is Gaussian. When your data has variable dimensions and the technique you’re using (like logistic regression,  linear regression, linear discriminant analysis) standardization is useful.