<a href="https://colab.research.google.com/github/cloudpedagogy/data-science-programming/blob/main/deep-learning-keras/04_Advanced_Topics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Regularization techniques: Dropout, L1/L2 regularization


Let's understand what these regularization techniques are:

1. **Dropout:** It is a technique where randomly selected neurons are ignored or "dropped-out" during training. This means that their contribution to the activation of downstream neurons is temporarily removed on the forward pass and any weight updates are not applied to the neuron on the backward pass. Dropout helps prevent overfitting.

2. **L1/L2 regularization:** These are other regularization techniques that work by adding a penalty to the loss function. L1 regularization adds a penalty equal to the absolute value of the magnitude of coefficients. L2 adds a penalty equal to the square of the magnitude of coefficients. These techniques discourage learning a more complex or flexible model, so as to avoid the risk of overfitting.

To apply these techniques using the Keras library with the Pima Indians Diabetes dataset, we first need to load the data:


In [None]:
import pandas as pd

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
dataframe = pd.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]


We also need to split the dataset into a training set and a testing set:


In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)


Next, let's apply these regularization techniques in Keras:


In [None]:
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.regularizers import l1, l2

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu', kernel_regularizer=l2(0.01)))
model.add(Dropout(0.5))
model.add(Dense(8, activation='relu', kernel_regularizer=l1(0.01)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

# compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])


Finally, we fit the model to our data:


In [None]:
model.fit(X_train, Y_train, epochs=150, batch_size=10, verbose=0)


The `kernel_regularizer` argument in the `Dense` function allows us to add L1 or L2 regularization, while the `Dropout` function allows us to add dropout regularization.

After training, we can evaluate the model performance on the testing dataset:


In [None]:
_, accuracy = model.evaluate(X_test, Y_test)
print('Accuracy: %.2f' % (accuracy*100))


Remember that it's important to carefully tune the regularization parameters. If they're too high, they might cause underfitting. If they're too low, they might not effectively prevent overfitting. Also, the above example is a simple demonstration and there might be room for further improvement, such as data normalization or standardization, more sophisticated architecture, etc.


# Exercise


1. Load the necessary libraries and the Pima Indian diabetes dataset.
2. Preprocess the dataset by splitting it into features (X) and the target variable (y). Scale the features to have zero mean and unit variance.
3. Create a function to build a neural network model with the specified regularization technique. The function should take the regularization parameter as input.
4. Build three neural network models using the following regularization techniques:
   a) Dropout: Add a dropout layer with a specified dropout rate.
   b) L1 Regularization: Add L1 regularization to the model with a specified regularization parameter.
   c) L2 Regularization: Add L2 regularization to the model with a specified regularization parameter.
5. Compile and train the models using appropriate loss function, optimizer, and evaluation metric.
6. Evaluate each model on the test set and print the accuracy score for comparison.


# Sample Solution

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.regularizers import l1, l2
from keras.optimizers import Adam

# Load the Pima Indian diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
columns = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv(url, names=columns)

# Split the data into features (X) and target variable (y)
X = df.drop('class', axis=1)
y = df['class']

# Standardize the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Function to build a neural network model with regularization
def build_model(regularization=None, reg_param=None, dropout_rate=None):
    model = Sequential()
    model.add(Dense(64, input_dim=8, activation='relu', kernel_regularizer=regularization(reg_param) if regularization else None))
    if dropout_rate:
        model.add(Dropout(dropout_rate))
    model.add(Dense(32, activation='relu', kernel_regularizer=regularization(reg_param) if regularization else None))
    model.add(Dense(1, activation='sigmoid'))
    return model

# Build and train models with different regularization techniques
dropout_model = build_model(dropout_rate=0.2)
dropout_model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])
dropout_model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)

l1_model = build_model(regularization=l1, reg_param=0.01)
l1_model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])
l1_model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)

l2_model = build_model(regularization=l2, reg_param=0.01)
l2_model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])
l2_model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)

# Evaluate models on the test set
dropout_score = dropout_model.evaluate(X_test, y_test)[1]
l1_score = l1_model.evaluate(X_test, y_test)[1]
l2_score = l2_model.evaluate(X_test, y_test)[1]

# Print the accuracy scores
print("Dropout Model Accuracy: {:.2f}%".format(dropout_score * 100))
print("L1 Regularization Model Accuracy: {:.2f}%".format(l1_score * 100))
print("L2 Regularization Model Accuracy: {:.2f}%".format(l2_score * 100))


# Advanced optimizer algorithms: RMSprop, Adam, etc.


Here's an example of how to use the RMSprop and Adam optimizers in Keras using the Pima Indians Diabetes dataset. Note that the dataset needs to be preprocessed before feeding it to a neural network, such as by normalizing the features and splitting the data into training and testing sets.

Then, you can load the Pima Indians Diabetes dataset:


In [None]:
import pandas as pd

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
data = pd.read_csv(url, names=names)

X = data.iloc[:, 0:8]
Y = data.iloc[:, 8]


You can then create a function to build a simple Keras model:


In [None]:
from keras.models import Sequential
from keras.layers import Dense

def create_model(optimizer='adam'):
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model


Now, let's fit the model using the RMSprop optimizer:


In [None]:
model = create_model(optimizer='RMSprop')
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
_, accuracy = model.evaluate(X, Y, verbose=0)
print('Accuracy: %.2f' % (accuracy*100))


Finally, fit the model using the Adam optimizer:


In [None]:
model = create_model(optimizer='adam')
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
_, accuracy = model.evaluate(X, Y, verbose=0)
print('Accuracy: %.2f' % (accuracy*100))


In practice, you'll want to tune other parameters of the model as well, such as the learning rate and the number of layers in the model. You'll also want to use a separate validation set or cross-validation to get a better estimate of the model's performance on unseen data.

Please note that both RMSprop and Adam are advanced optimization algorithms that adapt the learning rate during training. Adam combines the advantages of two other extensions of stochastic gradient descent: AdaGrad and RMSProp. RMSProp (Root Mean Square Propagation) is also a method in which the learning rate is adapted for each of the parameters. The idea is to divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight.


# Exercise


1. Load the Pima Indian dataset from the given URL.
2. Preprocess the data by splitting it into features (X) and labels (y), and then splitting them into training and testing sets.
3. Build a neural network using Keras with the following architecture:
   - Input layer with the appropriate input shape.
   - Two hidden layers with 64 neurons each and ReLU activation function.
   - Output layer with a single neuron and a sigmoid activation function.
4. Compile the model with RMSprop optimizer and binary cross-entropy loss function. Train the model on the training data for 100 epochs.
5. Evaluate the model on the testing data and record the accuracy.
6. Repeat steps 4 and 5 but this time using the Adam optimizer.
7. Compare the performance of RMSprop and Adam optimizers and analyze the results.


# Sample Solution

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import RMSprop, Adam

# Step 1: Load the Pima Indian dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
dataset = pd.read_csv(url, names=column_names)

# Step 2: Preprocess the data
X = dataset.drop('Outcome', axis=1).values
y = dataset['Outcome'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Build the neural network
def create_model():
    model = Sequential()
    model.add(Dense(64, input_dim=8, activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    return model

# Step 4 and 5: Train and evaluate using RMSprop optimizer
model_rmsprop = create_model()
model_rmsprop.compile(optimizer=RMSprop(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])
model_rmsprop.fit(X_train, y_train, epochs=100, batch_size=32, verbose=0)
loss_rmsprop, accuracy_rmsprop = model_rmsprop.evaluate(X_test, y_test)

print("RMSprop Optimizer:")
print("Test Accuracy:", accuracy_rmsprop)

# Step 6 and 7: Train and evaluate using Adam optimizer
model_adam = create_model()
model_adam.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])
model_adam.fit(X_train, y_train, epochs=100, batch_size=32, verbose=0)
loss_adam, accuracy_adam = model_adam.evaluate(X_test, y_test)

print("\nAdam Optimizer:")
print("Test Accuracy:", accuracy_adam)

# Step 7: Compare the performance of RMSprop and Adam optimizers
if accuracy_rmsprop > accuracy_adam:
    print("\nRMSprop performed better than Adam.")
elif accuracy_rmsprop < accuracy_adam:
    print("\nAdam performed better than RMSprop.")
else:
    print("\nRMSprop and Adam had the same performance.")


# A quiz on Advanced topics on Deep Learning with Keras


Question 1: What is the main purpose of using regularization techniques in deep learning models?
<br>a) To speed up model training
<br>b) To reduce the number of layers in the model
<br>c) To prevent overfitting
<br>d) To increase the model's complexity

Question 2: Which of the following regularization techniques randomly drops a fraction of neurons during training?
<br>a) L1 regularization
<br>b) L2 regularization
<br>c) Dropout
<br>d) RMSprop

Question 3: What is the benefit of using L1 regularization in a neural network?
<br>a) It adds noise to the model, improving generalization.
<br>b) It penalizes large weights, encouraging sparsity in the model.
<br>c) It increases the learning rate, leading to faster convergence.
<br>d) It helps the model escape local minima during training.

Question 4: Which of the following statements about dropout is true?
<br>a) Dropout only affects the output layer of the neural network.
<br>b) Dropout randomly removes input features during training.
<br>c) Dropout is applied during inference to improve prediction accuracy.
<br>d) Dropout helps prevent overfitting by reducing co-adaptation of neurons.

Question 5: What is the advantage of using advanced optimizer algorithms like RMSprop and Adam?
<br>a) They guarantee a globally optimal solution.
<br>b) They require fewer hyperparameter tunings compared to SGD.
<br>c) They are computationally less expensive than other optimizers.
<br>d) They can adapt the learning rate for each parameter, leading to faster convergence.

---
Answers
<br>1: c) To prevent overfitting
<br>2: c) Dropout
<br>3: b) It penalizes large weights, encouraging sparsity in the model.
<br>4: d) Dropout helps prevent overfitting by reducing co-adaptation of neurons.
<br>5: d) They can adapt the learning rate for each parameter, leading to faster convergence.

---