<a href="https://colab.research.google.com/github/aryagagas/machine-learning-and-deep-learning-assignment/blob/main/04_Bonus_Air_Arya_Gagasan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Compare 3 configurations for the activation function. Show and explain your performance result.

## Import Libraries and Data Preparation

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision.transforms import ToTensor
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score, precision_score, recall_score

# Download and load MNIST dataset
train_data = torchvision.datasets.MNIST(root='data', train=True, transform=ToTensor(), download=True)
test_data = torchvision.datasets.MNIST(root='data', train=False, transform=ToTensor(), download=True)

# Create data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=128, shuffle=False)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 24934261.18it/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 3696768.00it/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 28555172.29it/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 11651699.55it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw






## Training and Evaluating the Model

In [None]:
# Define a function to train and evaluate a neural network with a specific activation function
def train_and_evaluate(model, activation_function, num_epochs=10):
    # Define hyperparameters
    learning_rate = 0.01
    optimizer = optim.SGD(model.parameters(), lr=learning_rate)
    criterion = nn.CrossEntropyLoss()

    for epoch in range(num_epochs):
        model.train()
        for images, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

        model.eval()
        with torch.no_grad():
            all_preds = []
            all_labels = []
            for images, labels in test_loader:
                outputs = model(images)
                _, preds = torch.max(outputs, 1)
                all_preds.extend(preds.cpu().numpy())
                all_labels.extend(labels.cpu().numpy())

            acc = accuracy_score(all_labels, all_preds)
            f1 = f1_score(all_labels, all_preds, average='macro', zero_division=0)
            precision = precision_score(all_labels, all_preds, average='macro', zero_division=0)
            recall = recall_score(all_labels, all_preds, average='macro', zero_division=0)

        print(f'Epoch {epoch + 1}: Accuracy={acc:.4f}, F1 Score={f1:.4f}, Precision={precision:.4f}, Recall={recall:.4f}')


## Define the Neural Network Model

In [None]:
# Define Model Configuration
class SimpleModel(nn.Module):
    def __init__(self, activation_function):
        super(SimpleModel, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(28*28, 256)
        self.activation = activation_function
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x)
        return x


## Create models with different activation functions

In [None]:
activation_functions = [nn.ReLU(), nn.Sigmoid(), nn.Tanh()]

for activation_function in activation_functions:
    model = SimpleModel(activation_function)
    print(f"Model with {activation_function.__class__.__name__} Activation:")
    train_and_evaluate(model, activation_function)
    print("\n")


Model with ReLU Activation:
Epoch 1: Accuracy=0.8112, F1 Score=0.7996, Precision=0.8204, Recall=0.8050
Epoch 2: Accuracy=0.8641, F1 Score=0.8607, Precision=0.8631, Recall=0.8612
Epoch 3: Accuracy=0.8837, F1 Score=0.8814, Precision=0.8822, Recall=0.8816
Epoch 4: Accuracy=0.8935, F1 Score=0.8915, Precision=0.8920, Recall=0.8918
Epoch 5: Accuracy=0.8987, F1 Score=0.8970, Precision=0.8975, Recall=0.8971
Epoch 6: Accuracy=0.9022, F1 Score=0.9006, Precision=0.9008, Recall=0.9007
Epoch 7: Accuracy=0.9047, F1 Score=0.9031, Precision=0.9034, Recall=0.9032
Epoch 8: Accuracy=0.9098, F1 Score=0.9083, Precision=0.9088, Recall=0.9085
Epoch 9: Accuracy=0.9110, F1 Score=0.9095, Precision=0.9103, Recall=0.9095
Epoch 10: Accuracy=0.9144, F1 Score=0.9130, Precision=0.9134, Recall=0.9131


Model with Sigmoid Activation:
Epoch 1: Accuracy=0.4982, F1 Score=0.4325, Precision=0.6299, Recall=0.4861
Epoch 2: Accuracy=0.5900, F1 Score=0.5279, Precision=0.7263, Recall=0.5781
Epoch 3: Accuracy=0.7172, F1 Score=0.6

## Show and explain the performance result

The provided performance results are for three different configurations of a neural network model with different activation functions: ReLU, Sigmoid, and Tanh. Let's analyze and explain the performance results for each configuration:

<br>**Model with ReLU Activation:**
- ReLU (Rectified Linear Unit) is known for its ability to handle the vanishing gradient problem and accelerate the convergence of training.
- As seen in the results, the accuracy increases consistently with each epoch, reaching 91.44% accuracy by the 10th epoch.
- F1 Score, Precision, and Recall all show steady improvement over epochs, indicating that the model is learning effectively.
- ReLU is a popular choice for many deep learning tasks due to its effectiveness.

<br>**Model with Sigmoid Activation:**
- Sigmoid activation outputs values between 0 and 1, which can be interpreted as probabilities.
- The accuracy starts very low at 49.82% and gradually improves to 86.44% by the 10th epoch. However, it lags behind ReLU.
- F1 Score, Precision, and Recall also improve but at a slower rate compared to ReLU.
- Sigmoid tends to suffer from the vanishing gradient problem, which can make training slower and less effective for deep networks.

<br>**Model with Tanh Activation:**
- Tanh activation is similar to sigmoid but has values between -1 and 1.
- The accuracy starts at 83.54% and reaches 91.17% by the 10th epoch, which is better than the Sigmoid model but slightly below the ReLU model.
- F1 Score, Precision, and Recall all show steady improvement over epochs, similar to ReLU.
- Tanh can be useful when you want values centered around zero, and it helps mitigate the vanishing gradient problem better than Sigmoid.

<br>In summary, the ReLU activation function performs the best among the three configurations, achieving the highest accuracy, F1 Score, Precision, and Recall. It converges faster and more effectively during training. Tanh also performs well, while Sigmoid lags behind in terms of accuracy and training speed. The choice of activation function should depend on the specific problem and architecture, but ReLU is often a good default choice for many scenarios.