# RegularizationTechniquesForDeepNeuralNetworks

In this module, we will study:
1. Overfitting and Underfitting
2. Regularization Techniques
3. The Biase-Variance Tradeoff
4. L1 Regularization
5. L2 Regularization
6. Dropout
7. Early Stopping
8. Batch Normalization
9. And much more


![Image1](Module2Pic1.png)

***

# Regularization

Regularization is a set of strategies used in Machine Learning to reduce the generalization error. Most models, after training, perform very well on a specific subset of the overall population but fail to generalize well. This is also known as overfitting. Regularization strategies aim to reduce overfitting and keep, at the same time, the training error as low as possible.

***
## The bias-variance tradeoff: overfitting and underfitting

![Image1](Module2Pic2.png)

### What is biase?

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.



### What is variance?

Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.

### What is underfitting?

In supervised learning, underfitting happens when a model unable to capture the underlying pattern of the data. These models usually have high bias and low variance. It happens when we have very less amount of data to build an accurate model or when we try to build a linear model with a nonlinear data. Also, these kind of models are very simple to capture the complex patterns in data like Linear and logistic regression.

### What is overfitting?

In supervised learning, overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over noisy dataset. These models have low bias and high variance. These models are very complex like Decision trees which are prone to overfitting.

![Pic](Module2Pic3.png)

***

### What is Bias Variance Tradeoff?

The bias-variance tradeoff is a term to describe the fact that we can reduce the variance by increasing the bias. Good regularization techniques strive to simultaneously minimize the two sources of error. Hence, achieving better generalization.

There are many regularization techniques, we will study a few of them.



## L2 Regularization

L2 regularization, also known as weight decay or ridge regression, adds a norm penalty in the form of $\frac{1}{2}||w||_2^2$. As a result the cost function reduces to:
\begin{equation}
J^\prime(w; X, y) = J(w; X, y) + \frac{\alpha}{2}||w||_2^2
\end{equation}
where $\alpha$ is the regularization parameter between 0 and 1.

If we compute the gradient w.r.t. $w$, the above equation reduces to:
\begin{equation}
\frac{\partial}{\partial w}J^\prime(w; X, y) = \frac{\partial}{\partial w}J(w; X, y) + \alpha w
\end{equation}

The equation effectively shows us that each weight of the weight vector will be reduced by a constant factor on each training step.

**Please note that we usually regularize only weights not biases**

The L2 regularizer will have a big impact on the directions of the weight vector that don’t “contribute” much to the loss function. On the other hand, it will have a relatively small effect on the directions that contribute to the loss function. As a result, we reduce the variance of our model, which makes it easier to generalize on unseen data.


## References

1. [Regularization techniques for training deep neural networks](https://theaisummer.com/regularization/)
2. [Understanding the Bias-Variance Tradeoff](https://towardsdatascience.com/understanding-the-bias-variance-tradeoff-165e6942b229)
3. [L1 and L2 Regularization — Explained](https://towardsdatascience.com/l1-and-l2-regularization-explained-874c3b03f668#:~:text=L2%20regularization%20forces%20weights%20toward,never%20be%20equal%20to%20zero.)
4. [Over-fitting and Regularization](https://towardsdatascience.com/over-fitting-and-regularization-64d16100f45c)
5. [L1 vs L2 Regularization](https://medium.com/analytics-vidhya/l1-vs-l2-regularization-which-is-better-d01068e6658c)
