### Hello everyone, in this mini notebook, we will examine some important normalization techniques topics.

### I hope you will remember these topics or even learn some!

# Table of Contents

1. [Introduction](#1)
1. [Min-max normalization](#2)
    1. [Definition](#2.1)
    1. [Implementation](#2.2)
    1. [When to use](#2.3)
    1. [Close variations](#2.4)
1. [Z-score normalization](#3)
    1. [Definition](#3.1)
    1. [Implementation](#3.2)
    1. [When to use](#3.3)
    1. [Deal with outliers: min-max vs. z-score](#3.4)
1. [Robust normalization](#4)
    1. [Definition](#4.1)
    1. [Implementation](#4.2)
    1. [When to use](#4.3)    
1. [Power normalization](#5)
    1. [Definition](#5.1)
    1. [Implementation](#5.2)
1. Batch normalization
1. Layer normalization
1. Group normalization
1. Local response normalization

##### *Note: I will examine normalization techniques (after #6) in different notebook as they are very different than rest of normalization techniques. I will provide a link when that notebook is ready.*

## Introduction <a id=1></a>

##### Normalization is a technique used in machine learning and deep learning models that helps model with training (and,  of course testing) as well as the convergence.

##### Normalization usually done at the data prepation phase, but some techniques can be applied during training.



## Min-max normalization <a id=2></a>

### Definition <a id=2.1></a>

##### Min-max normalization (also known as min-max scaling) is one of the simplest normalization techniques and belongs to the linear normalization. 

##### It is applied in data pre-processing and it maps the values to the desired range (usually between 0 and 1).

##### $ X_{\text{normalized}} = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}} $


### Implementation <a id=2.2></a>

##### Lets implement it by ourselves first.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use('ggplot')

In [2]:
X = np.arange(-4, 5)
X

array([-4, -3, -2, -1,  0,  1,  2,  3,  4])

In [3]:
def min_max_norm(inp):
    return (inp - np.min(inp))/(np.max(inp) - np.min(inp))

##### We use broadcasting here, for more information you can check my NumPy [notebook](https://www.kaggle.com/code/atuzen/important-numpy-concepts-for-deep-learning). 

In [4]:
X_normalized = min_max_norm(X)
X_normalized

## Max value becomes 1 and min becomes 0, as intended

array([0.   , 0.125, 0.25 , 0.375, 0.5  , 0.625, 0.75 , 0.875, 1.   ])

##### Now generalize the formula to map values between [a, b].

In [5]:
def min_max_norm_custom(inp, a, b):
    return a + ((inp - np.min(inp)) * (b - a))/(np.max(inp) - np.min(inp))

In [6]:
X_normalized = min_max_norm_custom(X, -1 , 1)
X_normalized

array([-1.  , -0.75, -0.5 , -0.25,  0.  ,  0.25,  0.5 ,  0.75,  1.  ])

##### Now, lets use the scikit-learn library. You can check details [here](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)

In [7]:
from sklearn.preprocessing import MinMaxScaler

In [8]:
scaler = MinMaxScaler() 
X_scaled = scaler.fit_transform(X.reshape(-1,1))
X_scaled

array([[0.   ],
       [0.125],
       [0.25 ],
       [0.375],
       [0.5  ],
       [0.625],
       [0.75 ],
       [0.875],
       [1.   ]])

### When to use? <a id=2.3></a>

##### As I mentioned before, this is one of the simplest approaches, meaning that it is quite straightforward.

##### Use cases:

* ##### The best case of this normalization is when the data is uniformly (or close to) distributed.

* ##### If the data in specific range, (e.g., activation functions such as sigmoid, tanh), this scaling is good option.

* ##### If the all features in data wanted to be impact the output equally.

* ##### No outliers. (we will examine this later)

#####  Not suitable:

* ##### Not uniformly distributed data (such as normal distribution)

* ##### High presence of outliers

### Close variations <a id=2.4></a>

* ##### Max Abs scaling ([link](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler))

## Z-score normalization <a id=3></a>

### Definition <a id=3.1></a>

##### Z-score normalization (also known as standardization) transforms input data to have 0 mean and 1 standart deviation. 

##### It is applied in data pre-processing. Formula:

##### $ X_{standardized} = \frac{X - \mu}{\sigma} $

##### where $ \mu $. is the mean and $ \sigma $ is the standart deviation.

##### After standardization, data is ensured to centered and consistently deviated around the mean.

### Implementation <a id=2.3></a>

##### Lets implement it by ourselves first.

In [9]:
def z_score(inp):
    mu = np.mean(inp)
    sigma = np.std(inp)
    return (inp - mu) / sigma

In [10]:
X_normalized = z_score(X)
X, X_normalized

(array([-4, -3, -2, -1,  0,  1,  2,  3,  4]),
 array([-1.54919334, -1.161895  , -0.77459667, -0.38729833,  0.        ,
         0.38729833,  0.77459667,  1.161895  ,  1.54919334]))

##### As you can see, this normalizes data way differently than min-max scaling.

##### We can also use with given mean and input, but it will not be necessary in most cases.

In [11]:
def z_score_custom(inp, mu, sigma):
    return (inp - mu) / sigma

##### Now, lets use sklearn

In [12]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler() 
X_scaled = scaler.fit_transform(X.reshape(-1,1))
X_scaled

array([[-1.54919334],
       [-1.161895  ],
       [-0.77459667],
       [-0.38729833],
       [ 0.        ],
       [ 0.38729833],
       [ 0.77459667],
       [ 1.161895  ],
       [ 1.54919334]])

In [13]:
# lets check if two versions are equal.

np.allclose(X_normalized.reshape(-1, 1), X_scaled)

True

### When to use? <a id=3.3></a>

##### Use cases:

* ##### There are outliers, but not in extreme amount.

* ##### Scaling features in similar shapes is necessary.

* ##### Keeping the difference between inputs. Min-max normalization lacks this feature.

##### Not suitable:

* ##### Non-gaussian data and/or sparse data.

* ##### High presence of outliers

### Deal with outliers: min-max vs. z-score <a id=3.4></a>

In [14]:
X_outlier = np.array([0, 1, 2, 3, 4, 5, 1000, 10000])

X_outlier

array([    0,     1,     2,     3,     4,     5,  1000, 10000])

In [15]:
X_min_max = min_max_norm(X_outlier)
X_z_score = z_score(X_outlier)

X_min_max, X_z_score

(array([0.e+00, 1.e-04, 2.e-04, 3.e-04, 4.e-04, 5.e-04, 1.e-01, 1.e+00]),
 array([-0.42034937, -0.42004407, -0.41973878, -0.41943349, -0.4191282 ,
        -0.4188229 , -0.11505704,  2.63257385]))

##### As you can see, min-max normalization transformed the highest outlier value to 1 but others became extremely small values.

##### On the other hand, Z-score performed smoother normalization for data with outliers.

## Robust normalization <a id=4></a>

### Definition <a id=4.1></a>

##### Robust normalization (also known as robust scaling) is quite similar to z-score standardization. 

##### It is applied in data pre-processing. Formula:

##### $ X_{robust} = \frac{X - \text{median}}{\text{MAD}} $

##### where $ \text{median} $ is the median and $ \text{MAD} $ is median absolute deviation.

### Implementation <a id=4.2></a>

##### Lets implement it by ourselves first.

In [16]:
def robust_norm(inp):
    med = np.median(inp)
    mad = np.median(np.abs(inp - med))
    return (inp - med) / mad

In [17]:
X = np.array([1, 2, 3, 4, 1000]) 

X_robust = robust_norm(X)

X_robust

array([ -2.,  -1.,   0.,   1., 997.])

In [18]:
from sklearn.preprocessing import RobustScaler

scaler = RobustScaler(with_scaling = False) 
X_scaled = scaler.fit_transform(X.reshape(-1,1))
X_scaled

array([[ -2.],
       [ -1.],
       [  0.],
       [  1.],
       [997.]])

### When to use? <a id=4.3></a>

##### Use cases:

* ##### For having robust feature set.

* ##### Other benefits same with z-score normalization.

#####  Not suitable:

* ##### Non - or low amount of outliers.

* ##### Other disadvantages same with z-score normalization.

## Power normalization <a id=5></a>

### Definition <a id=5.1></a>

##### Used for data that do not follow normal distribution and/or issues with skewness.

##### This technique is useful if normality of the data is wanted.

##### There are two types of scaling:

1. ##### Box-Cox transformation:
    * ##### Requires input data to be positive.

1. ##### Yeo-Johnson transformation:
    * ##### Supports both positive and negative inputs.
    
##### Yeo-Johnson is more commonly used.

### Implementation <a id=5.2></a>

In [19]:
from sklearn.preprocessing import PowerTransformer

X = np.arange(1, 10).reshape(-1,1)

scaler = PowerTransformer(method = "yeo-johnson") 
X_yeo = scaler.fit_transform(X)


scaler = PowerTransformer(method = "box-cox") 
X_box = scaler.fit_transform(X)

X, X_yeo, X_box

(array([[1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7],
        [8],
        [9]]),
 array([[-1.6530655 ],
        [-1.17228888],
        [-0.73398925],
        [-0.32478919],
        [ 0.06262238],
        [ 0.43280569],
        [ 0.78885272],
        [ 1.13298478],
        [ 1.46686724]]),
 array([[-1.69300618],
        [-1.16788746],
        [-0.71478085],
        [-0.30290104],
        [ 0.08080817],
        [ 0.44346555],
        [ 0.78950339],
        [ 1.12192248],
        [ 1.44287593]]))

##### Even though I provide the implementation from stracth, sklearn or other similar frameworks are way more robust and powerful. Using this libraries will be better for your code.

##### I appreciate any feedback. Please let me know if anything is missing or if you would like to add anything. I hope you enjoy this notebook.

##### I will provide links when the other notebook is finished.

##### Links: [GitHub](https://github.com/ahmetTuzen/Deep_Learning_Tutorials) and [LinkedIn](https://www.linkedin.com/in/ahmet-tuzen/)