<img src="https://i2.wp.com/dataaspirant.com/wp-content/uploads/2020/08/1-handle-overfitting-in-deep-learning-models.png?resize=768%2C452&ssl=1" />

Deep learning is one of the most revolutionary technologies at present. It gives machines the ability to think and learn on their own. The key motivation for deep learning is to build algorithms that mimic the human brain. 

To achieve this we need to feed as much as relevant data for the models to learn. Unlike machine learning algorithms the deep learning algorithms learning won’t be saturated with feeding more data. But feeding more data to deep learning models will lead to overfitting issue.

That’s why developing a more generalized deep learning model is always a challenging problem to solve. Usually, we need more data to train the deep learning model. In order to get an efficient score we have to feed more data to the model. But unfortunately, in some cases, we face issues with a lack of data. 

One of the most common problems with building neural networks is overfitting. The key reason is, the build model is not generalized well and it’s well-optimized only for the training dataset. In layman terms, the model memorized how to predict the target class only for the training dataset. 


The other cases overfitting usually happens when we don’t have enough data, or because of complex architectures without regularizations.

If we don't have the sufficient data to feed, the model will fail to capture the trend in data. It tries to understand each and every data point in training data and performs poorly on test/unseen data.

<h1> Techniques to Handle Overfitting In Deep Learning</h1>

* Regularization
* Dropout
* Data Augmentation
* Early stopping

## Model with overfitting issue

Now we are going to build a deep learning model which suffers from overfitting issue. Later we will apply different techniques to handle the overfitting issue. 

We are going to learn how to apply these techniques, then we will build the same model to show how we improve the deep learning model performance.

Before that let’s quickly see the synopsis of the model flow.

<h3>Synopsis of the model we are going to build</h3>
Before we are going to handle overfitting, we need to create a Base model .
First, we are going to create a base model in order to showcase the overfitting

In order to create a model and showcase the example, 

first, we need to create data. we are going to create data by using make_moons() function.

Then we fit a very basic model (without applying any techniques) on newly created data points

Then we will walk you through the different techniques to handle overfitting issues with example codes and graphs.

## Data preparation

The make_moons() function is for binary classification and will generate a swirl pattern, or two moons

parameters:

n_samples - int: the total number of points generated optional (default=100)

shuffle- bool: whether to shuffle the samples.optional (default=True)

noise- double or None: the standard deviation of Gaussian noise added to the data (default=None)

random_state- int: RandomState instance, default=None

Returns:

Xarray of shape [n_samples, 2]

Y array of shape [n_samples], the integer labels (0 or 1) for class membership of each sample

In [None]:
import numpy as np

np.random.seed(800)

In [None]:
from sklearn.datasets import make_moons

x,y = make_moons(n_samples=100,noise=0.2,random_state=1)

In [None]:
#plot the graph
import matplotlib.pyplot as plt
plt.scatter(x[:,0],x[:,1],c=y,s=100)
plt.show()

## Base Model Creation

In [None]:
#importing libraries
import tensorflow as tf
import warnings
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split

In [None]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.33,random_state=42)

In [None]:
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])

history = model.fit(x_train, y_train, 
                    validation_data=(x_test, y_test), 
                    epochs=4000, verbose=0)

In [None]:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()

## Adding Regularization

<h4><b>Regularization</b> is one of the best techniques to avoid overfitting. It can be done by simply adding a penalty to the loss function with respect to the size of the weights in the model. By adding regularization to neural networks it may not be the best model on training but it is able to outperform well on unseen data. 
    </h4>

You can see the example below:

In [None]:

model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu',kernel_regularizer='l2'))
model.add(Dense(1, activation='sigmoid',kernel_regularizer='l2'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])

history = model.fit(x_train, y_train, 
                    validation_data=(x_test, y_test), 
                    epochs=4000, verbose=0)

## Results 

In [None]:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()

## Adding Dropout

<img src="https://i2.wp.com/dataaspirant.com/wp-content/uploads/2020/08/8-Deep-learning-dropout.png?resize=768%2C394&ssl=1" />
<hr>

Dropout is simply dropping the neurons in neural networks. During training a deep learning model, it drops some of its neurons and trains on rest. It updates the weights of only selected or activated neurons and others remain constant. 

For every next/new epoch again it selects some nodes randomly based on the dropout ratio and keeps the rest of the neurons deactivated. It helps to create a more robust model that is able to perform well on unseen data.

 You can see the example below

In [None]:
from tensorflow.keras.layers import Dropout
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])

history = model.fit(x_train, y_train, 
                    validation_data=(x_test, y_test), 
                    epochs=500, verbose=0)

In [None]:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()

# Data Augementation

We can prevent the model from being overfitted by training the model on more numbers of examples.  We can increase the size of the data by applying some minor changes in the data. 

Examples: 

* Translations, 
* Rotations, 
* Changes in scale, 
* Shearing, 
* Horizontal (and in some cases, vertical) flips.

This technique mostly used for only CNN’s

### code snippet for augmentation in Keras

In [None]:
from keras.preprocessing.image import ImageDataGenerator

aug = ImageDataGenerator(
    rotation_range=20,
    zoom_range=0.15,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.15,
    horizontal_flip=True,
    fill_mode="nearest")

### You can see the demo of Data Augmentation below

<img src="https://i0.wp.com/dataaspirant.com/wp-content/uploads/2020/08/data-augmentation-example.png?resize=768%2C339&ssl=1" />

# Early Stopping

<img src="https://i0.wp.com/dataaspirant.com/wp-content/uploads/2020/08/early-stopping-graph.png?resize=768%2C409&ssl=1" />


<hr>
It is one of the most universally used techniques in which we can smartly overcome the overfitting in deep learning. Too many epochs can lead to overfitting of the training dataset. In a way this a smar way to handle overfitting.

Early stopping is a technique that monitors the model performance on validation or test set based on a given metric and stops training when performance decreases.


## Adding early Callbacks

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

model = Sequential()
model.add(Dense(128, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])

callback= EarlyStopping(monitor='val_loss')
history = model.fit(x_train, y_train, 
                    validation_data=(x_test, y_test), 
                    epochs=2000,callbacks=[callback])

In [None]:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()


# Conclusion

Each technique approaches the problem differently and tries to create a model more generalized and robust to perform well on new data. We have different types of techniques to avoid overfitting, you can also use all of these techniques in one model.

Don't limit youself to consider only these techniques for handle overfitting, you can try other new and advanced techniques to handle overfitting while building deep learning models.

We can't say which technique is better, try to use all of the techniques and select the best according to your data.

<h3>Suggestions</h3>
    <b>Classical approach:</b> use early stopping and L2 regularization
    
 <b>The modern approach:</b> use early stopping and dropout, in addition to regularization.

<h1 style="color:green";> Thank you for reading, please do upvote if it helps you 🙏🙏</h1>