### Hello everyone, in this mini notebook, we will examine some important normalization techniques topics.

### I hope you will remember these topics or even learn some!

# Table of Contents - Part 2

[Part 1: Introduction, Min-max normalization, Z-score normalization, Robust normalization, Power normalization](https://www.kaggle.com/code/atuzen/dive-into-normalization/)
1. [Batch normalization](#1)
    1. [Definition](#1.1)
    1. [When to apply/add?](#1.2)
    1. [Benefits](#1.3)
1. [Layer normalization](#2)
    1. [Introduction](#2.1)
    1. [Benefits](#2.2)
    1. [Layer normalization vs batch normalization](#2.3)
1. [Group normalization](#3)
    1. [Definition](#3.1)
    1. [Benefits](#3.2)
    1. [Normalization methods comparison](#3.3)
1. [Local response normalization](#4)
    1. [Definition](#4.1)
1. [Conclusion](#5)
    1. [References and links](#5.1)


**Note: The link to Part 1 is provided with a link, as these topics are less advanced compared to these normalization techniques.**

# Batch normalization <a id=1></a>

## Definition <a id=1.1></a>

#### Batch normalization is introduced in paper written by Sergey Ioffe and Christian Szegedy in their paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" [link](https://arxiv.org/abs/1502.03167). [1] This technique is proposed for training deep neural networks (DNN).

#### The problem with DNN is that the input of each layers is constantly changing, which slows down the training phase and causing adopting smaller learning rates usually.

#### Batch normalization addresses this problem by performing normalization to each batch. And thanks to batch normalization, higher learning rates as well as the dependency of the dropout regularization is reduced.

#### This normalization operation makes each dimension unit gaussians ($\mu \rightarrow 0, \sigma \rightarrow 1$) for each batch which makes the model more stable.

## When to apply/add? <a id=1.2></a> 

#### Usually insert after convolutional layer(s) and/or fully connected layer(s).

## Benefits <a id=1.3></a> 

* Stable and faster training (improves gradient flow).
* Regularization technique (you can check my notebook for regularization here).
* Allowing higher learning rates.
* Dependence of weight initialization is reduced.
* The risk of vanishing or exploiting gradients are reduced (therefore deeper networks are possible).


# Layer normalization <a id=2></a>

## Introduction <a id=2.1></a>

#### Layer normalization is introduced in paper "Layer Normalization" written by Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton [link](https://arxiv.org/abs/1607.06450) [2].

#### This technique is introduced to address some limitations of batch normalization. 

#### Layer normalization normalizes the input for each layer independently. 

## Benefits <a id=2.2></a>

#### The benefits of batch normalization is also apply to layer normalization.

#### Additionally, Layer normalization is less dependent on mini-batch size.

## Layer normalization vs batch normalization <a id=2.3></a>

#### Batch normalization normalizes within mini-batches and layer normalization normalizes across all features.

#### To summarize, batch normalization operates horizontally and layer norm operates vertically.

#### Layer normalization is more suitable for RNN's, while batch normalization is used for CNN and deep feedforward networks.

# Group normalization <a id=3></a>

## Definition <a id=3.1></a>

#### Group normalization is introduced in paper "Group Normalization" written by Yuxin Wu and Kaiming He [link](https://arxiv.org/abs/1803.08494) [3].

#### This technique is introduced to address some limitations of batch normalization when the batch sizes become smaller.

#### Group normalization divides input channels into smaller groups and then normalizes these groups independently.

## Benefits <a id=3.2></a>

#### The benefits of batch normalization is also apply to group normalization.

#### Group normalization is not highly dependent on batch size.

#### It also has the benefits oflayer normalization, meaning that group normalization is a good choice for CNNs and RNNs.

## Normalization methods comparison <a id=3.3></a>

**Image is taken from paper [3]**

#### I am providing this image directly from the paper as the image makes it much easier to understand.

![](https://github.com/ahmetTuzen/Deep_Learning_Tutorials/blob/main/Normalization/normalization%20methods.png?raw=true)

# Local response normalization <a id=4></a>

## Definition <a id=4.1></a>

#### Local response normalization is introduced in the famous AlexNet [4] architecture.

#### It is based on the calculation of a normalization factor from the local neighborhood that used for this technique.

#### Strong neurons are more effective in this normalization as they are impacting the neighbors.

#### Unlike the first three normalizations, there is no vertical, horizontal or grouping, normalization is operated for each feature individually.

# Conclusion <a id=5></a>

#### In these two mini-notebooks, we examined normalization techniques that are widely adopted by machine learning and deep learning models and applications. I hope you have enjoyed these notebooks. If you have any questions or would like to add anything, I would love to hear them.

#### Finally, I appreacte any feedback/upvotes.

## References and links <a id=5.1></a>

##### Papers:

[1] Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

[2] Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.

[3] Wu, Y., & He, K. (2018). Group normalization. In Proceedings of the European conference on computer vision (ECCV) (pp. 3-19).

[4] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.

#### Normalization layers in pytorch: [link](https://pytorch.org/docs/stable/nn.html#normalization-layers)

#### My links: [GitHub](https://github.com/ahmetTuzen/Deep_Learning_Tutorials) and [LinkedIn](https://www.linkedin.com/in/ahmet-tuzen/)