# Keras Vs Pytorch

In this lecture, we will explore two widely-used deep learning frameworks: [Keras](https://keras.io/) and [Pytorch](https://pytorch.org/). While both are powerful tools for building neural networks, they offer
distinct features and approaches that make them suitable for different tasks and user preferences.

Our task today involves implementing a Multi-Layer Perceptron (MLP) with one hidden layer to tackle the MNIST digit classification problem. MNIST is a well-known
dataset of handwritten digits, making it an ideal choice for demonstrating classification tasks due to its simplicity and availability.<br><br>


Following are some differences between Keras and Pytorch:

|                 | **PyTorch** | **Keras**|
|-----------------|-------------|----------|
| **Target Audience**        | Researchers, experienced developers with need for flexibility              | Beginners, quick prototypers, industry practitioners|
| **Flexibility**            | High; dynamic computation graph allows for custom operations and research  | Moderate; pre-built layers offer speed but limit some customization|
| **Community & Ecosystem**  | Strong research community support, backed by Facebook (Meta)               | Extensive enterprise adoption, part of TensorFlow ecosystem, backed by Google|
| **Deployment Capabilities**| Supports model exporting for integration into production systems           | Leverages TensorFlow's ecosystem for scalable and efficient deployment|
| **Documentation & Resources** | Advanced materials and documentation for experienced developers         | Beginner-friendly resources and tutorials|
| **Use Cases**                | Experimental projects, cutting-edge research, complex models             | Industry applications, rapid prototyping, enterprise use|
| **Scalability**|Handles large-scale deployments with dynamic computation graph performance | Efficient scaling via TensorFlow's distributed training capabilities|



## Keras

In [None]:
import numpy as np
import keras
from keras import layers
import matplotlib.pyplot as plt

In [None]:
# TODO: load the MNIST dataset using the keras.datasets.mnist.load_data function
(x_train, y_train), (x_test, y_test) = ...

# TODO: define the class number and the input layer size
num_classes = ...
input_shape = ...

# TODO: flatten the images
x_train = x_train.reshape((...))
x_test  = x_test.reshape((...))

# TODO: normalize the images
x_train = ...
x_test  = ...

print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

# TODO: convert class vectors to binary class matrices (one-hot encoding)
# HINT: use the keras.utils.to_categorical function
y_train_onehot = ...
y_test_onehot  = ...

print("y_train shape:", y_train_onehot.shape)
print("y_train:", y_train[0], y_train_onehot[0])

In [None]:
# TODO: define our model as follow:
#       - Input Layer: 28*28 neurons
#       - Hidden layer: 10 neurons, ReLU activation function
#       - Output layers: 10 neurons, Softmax activation function
model = keras.Sequential(
    [
        keras.Input(shape=..., name="Input layer"),
        layers.Dense(..., name="hidden_layer"),
        layers.Dense(..., name="output_layer"),
    ]
)

# or you can just 'add' some layers
# model = Sequential()
# mode.add(keras.Input(shape=input_shape))
# model.add(layers.Dense(10, activation="relu"))
# model.add(layers.Dense(num_classes, activation="softmax"))

model.summary()

In [None]:
# TODO: using the 'compile' function:
#       - select the SGD algorimt as the Optimizer
#       - use the Categorical Crossentropy Loss as the loss fucntion
#       - add the Accuracy metric

# NOTE: Cross entropy measures the difference between the predicted probability and the true probability.
#       This makes the CrossEntropy Loss a good loss function for Classifiers !
model.compile(...)

In [None]:
# TODO: using the .fit method, train the model as follows
#       - batch size: 32
#       - epochs 10
history = model.fit(..., verbose=True)

In [None]:
# TODO: plot the Accuracy and the Training Loss stored in the HISTORY var
plt.plot(...)
plt.title("Loss function")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.show()

plt.plot(..., color='tab:orange')
plt.title("Accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.show()

In [None]:
# TODO: evaluate the model using the testing set
# HINT: check the evaluate method !
score = ...

print("Test accuracy:", score[1])

In [None]:
# TODO: make some predictions
# HINT: use the predict method
preds = ...
preds_labels = ...

print("Prediction shapes:", preds.shape, preds_labels.shape)
print("Predicted label:", preds_labels[0], preds[0])

In [None]:
# This is a cool way to preprocess data using Keras !
(x_train_test, _), _ = keras.datasets.mnist.load_data()

flat_and_normalize = keras.Sequential([
    keras.Input(shape=(28,28)),
    layers.Rescaling(1./255),
    layers.Reshape((28*28,)),
])

_norm_dst = flat_and_normalize(x_train_test)

print(x_train.shape)
print(_norm_dst.shape)

## Pytorch

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

In [None]:
# We are going to use the same dataset as before
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

print("y_train shape:", y_train_onehot.shape)
print("y_train:", y_train[0], y_train_onehot[0])

In [None]:
# TODO: define the Dataset class to handle images and labels
class MyMnistDataset(Dataset):
  def __init__(self, images, labels):
    assert len(images) == len(labels)

    # TODO: define the class attributes:
    #       - images and labels
    ...

  def __len__(self):
    # TODO: return the dataset len
    # HINT: you can use images or labels to compute the dataset len
    return ...

  def __getitem__(self, idx):
    # TODO: get image and label by given index
    _selected_img = ...
    _selected_lab = ...

    # TODO: convert to tensor
    _selected_img = ...
    _selected_lab = ...

    return _selected_img, _selected_lab

In [None]:
# TODO: create the training and testing datasets
train_dataset = ...
test_dataset = ...

In [None]:
# TODO: create the training Dataloader using as batch size 32 with shuffle
#       Create the testing Dataloader with batch size 1 (usefull for evaluation phase)
#       no shuffle
train_loader = DataLoader(...)
test_loader = DataLoader(...)

In [None]:
# Choose device to use
if torch.backends.mps.is_available():
    device = torch.device("mps")
    print("Using Apple MPS device for computation.")
elif torch.cuda.is_available():
    device = torch.device("cuda")
    print("Using CUDA device for computation.")
else:
    device = torch.device("cpu")
    print("Using CPU for computation.")

In [None]:
# TODO: define MLP model as follows:
#       - Hidden layer: 28*28 Input neurons, 10 output neurons, ReLU activation function
#       - Output layers: 10 Input neurons, 10 output neurons, Softmax activation function
# HINT: check the torch.nn module !
class MLPModel(nn.Module):
    def __init__(self):
        # initialize the base class (nn.Module) to inherit its methods and properties
        super(MLPModel, self).__init__()

        # TODO: define activation functions
        self.relu = ...
        self.softmax = ...

        # TODO: first fully connected layer: 28*28 or 784 input features, 10 output features
        self.hidden_layer = ...

        # TODO: output layer: 10 input features, 10 output features (number of classes)
        self.output_layer = ...

    def forward(self, x):
        # TODO: pass the input x through first fully connected layer
        # HINT: the class attributes are callable (they are actualluy functions)!
        x = ...
        x = ...

        # TODO: pass the input x through output layer
        x = ...
        x = ...

        # return the output logits of shape (batch_size, 10), representing class scores
        return x

In [None]:
# TODO: create the model and send it to the selected device
model = ...
print(model)

In [None]:
# TODO: define the Loss Function as the CrossEntropy Loss
criterion = ...
# TODO: define the optimizer as the SGD algorithm with learning rate 0.01
optimizer = ...

In [None]:
# number of epochs to train the model
n_epochs = 10

model.train() # prepare the model for training

history = []
for epoch in range(n_epochs):
    # monitor training loss
    epoch_loss = []

    ###################
    # train the model #
    ###################
    for x, y in train_loader:
      # TODO: send data to selected device
      x, y = ...
      # clear the gradients of all optimized variables
      optimizer.zero_grad()

      # TODO: forward pass, compute predicted outputs by passing inputs to the model
      output = ...

      # TODO: calculate the loss
      loss = ...

      # TODO: backward pass, compute gradient of the loss with respect to model parameters
      loss.backward()

      # perform a single optimization step (parameter update)
      optimizer.step()

      # update running training loss
      epoch_loss.append(loss.item())

    # print training statistics
    # calculate average loss over an epoch
    epoch_loss = np.mean(epoch_loss)
    history.append(epoch_loss)

    print(f'Epoch: {epoch+1} \tTraining Loss: {epoch_loss:.6f}')

In [None]:
# Plot Loss Function over epochs
plt.plot(history)
plt.title("Loss Function")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.show()

In [None]:
# Evaluate the model
model.eval() # prepare model for evaluation
accuracy = 0

# prevent computing gradients
with torch.no_grad():
  for x, target in test_loader:
    output = model(x)

    # compute the maximum on the second dimension (remember we have a batch size = 1)
    # and return a tuple: (values, indices)
    _, pred = torch.max(output, 1)
    _, target = torch.max(target, 1)

    accuracy += (pred == target).item()

accuracy = accuracy / len(y_test)
print(f"Accuracy: {accuracy*100}%")