# Understading Optimizers
Optimizers are algorithms used to update the parameters (weights and biases) of a neural network during training to minimize the loss function. They work by adjusting the parameters in the direction of the steepest descent of the loss function. Some popular optimizers include stochastic gradient descent (SGD), Adam, RMSprop, and Adagrad.

# Last-layer activations
The last-layer activation function is used to transform the output of the last layer of a neural network into a format suitable for the problem at hand. For example, in a binary classification problem, the last layer activation function is often a sigmoid function that outputs a probability value between 0 and 1. In a multi-class classification problem, the last layer activation function is often a softmax function that outputs a probability distribution over all classes.

# Loss Function
A loss function is used to measure how well the model's predictions match the actual values in the training data. The optimizer then uses the loss function to adjust the parameters of the model to minimize the loss. Some commonly used loss functions include mean squared error (MSE), binary cross-entropy, and categorical cross-entropy.

# Evaluation metrics
Evaluation metrics are used to measure the performance of the model on the test data. Some commonly used evaluation metrics for classification problems include accuracy, precision, recall, and F1-score. For regression problems, commonly used evaluation metrics include mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE).Evaluation metrics

In [None]:
# Model definition
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(32, activation='relu', input_shape=(784,)))
model.add(layers.Dense(10, activation='softmax'))

The last layer uses a softmax activation. You saw this pattern in the MNIST example. It means the network will output a probability distribution over the 10 different output classes—for every input sample, the network will produce a 10-dimensional output vector, where output[i] is the probability that the sample belongs to class i. The 10 scores will sum to 1.

In [None]:
# Compiling the model
from keras import optimizers

model.compile(optimizer=optimizers.RMSprop(lr=0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

An instance of the **RMSprop optimizer** is created with a learning rate of 0.001. The learning rate determines the step size at each iteration while moving toward a minimum of a loss function during training.

The loss function is specified as **categorical_crossentropy**, which is commonly used as the loss function for multi-class classification problems.

The code **metrics=['accuracy']** specifies the evaluation metric used to measure the performance of the model. In this case, accuracy is used, which is the fraction of correctly classified samples over the total number of samples. Other metrics include precision, recall, F1-score, and custom metrics.