# BWT - Deep Learning Track
## Task#26: Understanding Optimizers, Last-layer Activations, Loss Functions and Evaluation Metrics
### Adil Mubashir Chaudhry


As we move towards further refining our model to perform better than some baseline performance, there are some key questions that need to be answered and choices we will need to be making

## Optimizers

Optimizers are algorithms used to minimize the loss function during the training of a machine learning model. The goal of training a machine learning model is to find the optimal set of weights and biases for the model that can minimize the loss function. The optimizer is responsible for finding the optimal values of these parameters. There are various types of optimizers, including gradient descent, stochastic gradient descent, Adam optimizer, Adagrad optimizer, RMSprop optimizer, and many others. These optimizers differ in the way they update the weights and biases of the model during training. The choice of optimizer depends on the type of problem you are trying to solve and the size of the dataset you are working with.

Below is a table that covers some of the main optimizers you will come across and their strengths and weaknesses:

![image.png](attachment:image.png)

Below is a snippet of code showing how you can use different optimizers on your model

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD, Adam, RMSprop

model = Sequential()
model.add(Dense(64, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer=SGD(lr=0.01), loss='binary_crossentropy', metrics=['accuracy'])

model.compile(optimizer=Adam(lr=0.001), loss='binary_crossentropy', metrics=['accuracy'])

model.compile(optimizer=RMSprop(lr=0.001), loss='binary_crossentropy', metrics=['accuracy'])


## Last-Layer Activations

The last-layer activation function is a function applied to the output of the last layer of a neural network before computing the loss function. The last layer of a neural network is responsible for producing the final prediction of the model. The choice of activation function for the last layer depends on the type of problem you are trying to solve. For example, if you are solving a binary classification problem, you can use the sigmoid function as the last-layer activation function. If you are solving a multi-class classification problem, you can use the softmax function as the last-layer activation function.
![image.png](attachment:image.png)

## Loss

A loss function is a function used to evaluate the performance of a machine learning model. The goal of the loss function is to measure the difference between the predicted output of the model and the actual output. The choice of loss function depends on the type of problem you are trying to solve. For example, if you are solving a regression problem, you can use the mean squared error (MSE) as the loss function. If you are solving a classification problem, you can use the binary cross-entropy loss or categorical cross-entropy loss as the loss function.

Table 4.1 also shows which loss function is suitable for which kind of problem

Below is a snippet of how you can adjust the loss function in your model:

In [2]:
from keras.models import Sequential
from keras.layers import Dense
from keras.losses import binary_crossentropy, categorical_crossentropy, mean_squared_error

# Define the model
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))

# Compile the model with different loss functions
# Using binary_crossentropy loss for binary classification problem
model.compile(optimizer='adam', loss=binary_crossentropy, metrics=['accuracy'])

# Using categorical_crossentropy loss for multi-class classification problem
model.compile(optimizer='adam', loss=categorical_crossentropy, metrics=['accuracy'])

# Using mean_squared_error loss for regression problem
model.compile(optimizer='adam', loss=mean_squared_error, metrics=['mae'])

## Evaluation Metrics
Evaluation metrics are used to evaluate the performance of a machine learning model. The choice of evaluation metric depends on the type of problem you are trying to solve. For example, if you are solving a binary classification problem, you can use metrics such as accuracy, precision, recall, F1-score, and ROC-AUC score to evaluate the performance of the model. If you are solving a regression problem, you can use metrics such as mean absolute error (MAE), mean squared error (MSE), and R-squared score to evaluate the performance of the model.

below is a snippet of how they can be set in a model:

In [4]:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.metrics import binary_accuracy, categorical_accuracy, mean_absolute_error

# Define the model
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))

# Compile the model with different evaluation metrics
# Using binary_accuracy metric for binary classification problem
model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=[binary_accuracy])

# Using categorical_accuracy metric for multi-class classification problem
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=[categorical_accuracy])

# Using mean_absolute_error metric for regression problem
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error', metrics=[mean_absolute_error])