# Task
Investiga el dataset MNIST, crea código con PyTorch y W&B para entrenar un clasificador multiclase con MNIST. Entrena un modelo baseline sin regularización ni normalización, luego agrega normalización y regularizaciones (L2, dropout) y compara los resultados.

## Investigación del dataset mnist

### Subtask:
Realizar una investigación sobre el dataset MNIST, su estructura, contenido y uso común en tareas de clasificación de imágenes.


**Reasoning**:
The subtask requires researching the MNIST dataset. I will use a code block to document the key findings from the research about the dataset's structure, content, common usage, and characteristics.



In [1]:
# Research on MNIST dataset
print("MNIST Dataset Research:")
print("- Structure: Contains grayscale images of handwritten digits (0-9).")
print("- Image size: 28x28 pixels.")
print("- Number of classes: 10 (one for each digit).")
print("- Dataset size: 60,000 training examples, 10,000 test examples.")
print("- Content: Images of handwritten digits and corresponding labels.")
print("- Common usage: Benchmark for image classification tasks, especially for evaluating new models and techniques.")
print("- Common models: Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), traditional machine learning models.")
print("- Characteristics: Relatively simple dataset, low resolution images, variations in handwriting style.")

MNIST Dataset Research:
- Structure: Contains grayscale images of handwritten digits (0-9).
- Image size: 28x28 pixels.
- Number of classes: 10 (one for each digit).
- Dataset size: 60,000 training examples, 10,000 test examples.
- Content: Images of handwritten digits and corresponding labels.
- Common usage: Benchmark for image classification tasks, especially for evaluating new models and techniques.
- Common models: Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), traditional machine learning models.
- Characteristics: Relatively simple dataset, low resolution images, variations in handwriting style.


## Configuración del entorno

### Subtask:
Instalar las librerías necesarias (PyTorch, Weights & Biases).


**Reasoning**:
Install the required libraries using pip.



In [2]:
!pip install torch torchvision wandb



## Carga y preprocesamiento de datos

### Subtask:
Cargar el dataset MNIST usando PyTorch y preparar los dataloaders para entrenamiento y prueba.


**Reasoning**:
Import necessary libraries and define the transformation to load and prepare the MNIST dataset.



In [3]:
import torch
import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader

# Define a transformation to convert images to tensors
transform = transforms.ToTensor()

# Load the MNIST training dataset
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)

# Load the MNIST test dataset
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create a DataLoader for the training dataset
batch_size = 64  # You can adjust the batch size
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# Create a DataLoader for the test dataset
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print("MNIST datasets loaded and DataLoaders created.")
print(f"Number of training batches: {len(train_loader)}")
print(f"Number of test batches: {len(test_loader)}")

100%|██████████| 9.91M/9.91M [00:00<00:00, 16.1MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 479kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.42MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 5.36MB/s]

MNIST datasets loaded and DataLoaders created.
Number of training batches: 938
Number of test batches: 157





## Definición del modelo baseline

### Subtask:
Definir una red neuronal simple (baseline) para clasificación multiclase en PyTorch, sin normalización ni regularización.


**Reasoning**:
Define the baseline neural network model as instructed, including flattening and linear layers with ReLU activation.



In [4]:
import torch.nn as nn

class BaselineNN(nn.Module):
    def __init__(self):
        super(BaselineNN, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(28 * 28, 128)  # Input layer: 784 features, output 128 features
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 64) # Hidden layer: 128 features, output 64 features
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(64, 10)   # Output layer: 64 features, output 10 features (for 10 classes)

    def forward(self, x):
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return x

print("BaselineNN model defined.")

BaselineNN model defined.


## Configuración de weights & biases (w&b)

### Subtask:
Configurar W&B para el seguimiento del experimento.


**Reasoning**:
Import the wandb library and initialize a new W&B run for experiment tracking.



In [5]:
import wandb

# Initialize a new W&B run
wandb.init(project="mnist-classification")

print("W&B initialized for project 'mnist-classification'.")

  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33ma01642653[0m ([33mmansoor35[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


W&B initialized for project 'mnist-classification'.


## Entrenamiento del modelo baseline

### Subtask:
Entrenar el modelo baseline utilizando PyTorch y registrar las métricas (pérdida, precisión) con W&B.


**Reasoning**:
Define the loss function and optimizer, then implement the training loop to train the baseline model and log metrics to W&B.



In [6]:
import torch.optim as optim

# Set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Instantiate the baseline model and move it to the device
baseline_model = BaselineNN().to(device)

# Define the loss function
criterion = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = optim.Adam(baseline_model.parameters(), lr=0.001) # Using Adam optimizer with learning rate 0.001

# Define the number of epochs
num_epochs = 5 # You can adjust the number of epochs

print("Starting baseline model training...")

# Training loop
for epoch in range(num_epochs):
    baseline_model.train()  # Set the model to training mode
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    for i, data in enumerate(train_loader):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = baseline_model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        # Calculate accuracy
        _, predicted = torch.max(outputs.data, 1)
        total_predictions += labels.size(0)
        correct_predictions += (predicted == labels).sum().item()

    # Calculate average loss and accuracy for the epoch
    epoch_loss = running_loss / len(train_loader)
    epoch_accuracy = correct_predictions / total_predictions

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.4f}")

    # Log training loss and accuracy to W&B
    wandb.log({"epoch": epoch + 1, "train_loss": epoch_loss, "train_accuracy": epoch_accuracy})

print("Finished baseline model training.")

# Finalize the W&B run
wandb.finish()

Using device: cpu
Starting baseline model training...
Epoch [1/5], Loss: 0.3507, Accuracy: 0.9026
Epoch [2/5], Loss: 0.1435, Accuracy: 0.9577
Epoch [3/5], Loss: 0.0982, Accuracy: 0.9701
Epoch [4/5], Loss: 0.0729, Accuracy: 0.9774
Epoch [5/5], Loss: 0.0581, Accuracy: 0.9814
Finished baseline model training.


0,1
epoch,▁▃▅▆█
train_accuracy,▁▆▇██
train_loss,█▃▂▁▁

0,1
epoch,5.0
train_accuracy,0.98142
train_loss,0.05812


## Evaluación del modelo baseline

### Subtask:
Evaluar el modelo baseline en el conjunto de prueba y registrar los resultados con W&B.


**Reasoning**:
Initialize a new W&B run for evaluating the baseline model and then evaluate the model on the test set, calculating the accuracy and logging it to W&B before finishing the run.



In [7]:
import torch
import wandb

# Initialize a new W&B run for evaluation
wandb.init(project="mnist-classification", name="baseline_evaluation")

# Set the model to evaluation mode
baseline_model.eval()

# Deactivate gradient calculation
with torch.no_grad():
    correct_predictions = 0
    total_predictions = 0
    # Iterate over the test loader
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        # Forward pass
        outputs = baseline_model(inputs)

        # Calculate predictions
        _, predicted = torch.max(outputs.data, 1)
        total_predictions += labels.size(0)
        correct_predictions += (predicted == labels).sum().item()

    # Calculate evaluation accuracy
    test_accuracy = correct_predictions / total_predictions

    # Print evaluation accuracy
    print(f"Test Accuracy: {test_accuracy:.4f}")

    # Log evaluation accuracy to W&B
    wandb.log({"test_accuracy": test_accuracy})

# Finalize the W&B run
wandb.finish()

Test Accuracy: 0.9749


0,1
test_accuracy,▁

0,1
test_accuracy,0.9749


## Definición del modelo con normalización y regularización

### Subtask:
Definir una nueva red neuronal con normalización por lotes (Batch Normalization), regularización L2 y Dropout.


**Reasoning**:
Define a new neural network class with Batch Normalization, L2 regularization (handled by the optimizer's weight_decay), and Dropout layers as specified in the instructions.



In [8]:
import torch.nn as nn

class RegularizedNN(nn.Module):
    def __init__(self):
        super(RegularizedNN, self).__init__()
        self.flatten = nn.Flatten()

        # First layer with Batch Normalization and Dropout
        self.fc1 = nn.Linear(28 * 28, 128)
        self.bn1 = nn.BatchNorm1d(128)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(0.5) # Increased dropout probability

        # Second layer with Batch Normalization and Dropout
        self.fc2 = nn.Linear(128, 64)
        self.bn2 = nn.BatchNorm1d(64)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(0.5) # Increased dropout probability

        # Output layer
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.bn1(x)
        x = self.relu1(x)
        x = self.dropout1(x)

        x = self.fc2(x)
        x = self.bn2(x)
        x = self.relu2(x)
        x = self.dropout2(x)

        x = self.fc3(x)
        return x

print("RegularizedNN model defined with Batch Normalization and Dropout.")

RegularizedNN model defined with Batch Normalization and Dropout.


## Entrenamiento del modelo con normalización y regularización

### Subtask:
Entrenar el modelo con normalización y regularización, registrando las métricas con W&B.


**Reasoning**:
Import the necessary libraries for training and W&B logging, initialize a new W&B run, set the device, instantiate the regularized model, define the loss function and optimizer with L2 regularization, and then start the training loop as per the instructions.



In [9]:
import torch
import torch.nn as nn
import torch.optim as optim
import wandb

# Initialize a new W&B run for regularized model training
wandb.init(project="mnist-classification", name="regularized_model_training")

# Set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Instantiate the regularized model and move it to the device
regularized_model = RegularizedNN().to(device)

# Define the loss function
criterion = nn.CrossEntropyLoss()

# Define the optimizer with L2 regularization (weight_decay)
optimizer = optim.Adam(regularized_model.parameters(), lr=0.001, weight_decay=1e-5) # Added weight_decay for L2 regularization

# Define the number of epochs
num_epochs = 5 # You can adjust the number of epochs

print("Starting regularized model training...")

# Training loop
for epoch in range(num_epochs):
    regularized_model.train()  # Set the model to training mode
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    for i, data in enumerate(train_loader):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = regularized_model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        # Calculate accuracy
        _, predicted = torch.max(outputs.data, 1)
        total_predictions += labels.size(0)
        correct_predictions += (predicted == labels).sum().item()

    # Calculate average loss and accuracy for the epoch
    epoch_loss = running_loss / len(train_loader)
    epoch_accuracy = correct_predictions / total_predictions

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.4f}")

    # Log training loss and accuracy to W&B
    wandb.log({"epoch": epoch + 1, "regularized_train_loss": epoch_loss, "regularized_train_accuracy": epoch_accuracy})

print("Finished regularized model training.")

# Finalize the W&B run
wandb.finish()

Using device: cpu
Starting regularized model training...
Epoch [1/5], Loss: 0.5932, Accuracy: 0.8356
Epoch [2/5], Loss: 0.3546, Accuracy: 0.8973
Epoch [3/5], Loss: 0.3053, Accuracy: 0.9121
Epoch [4/5], Loss: 0.2736, Accuracy: 0.9210
Epoch [5/5], Loss: 0.2541, Accuracy: 0.9268
Finished regularized model training.


0,1
epoch,▁▃▅▆█
regularized_train_accuracy,▁▆▇██
regularized_train_loss,█▃▂▁▁

0,1
epoch,5.0
regularized_train_accuracy,0.92677
regularized_train_loss,0.25408


## Evaluación del modelo con normalización y regularización

### Subtask:
Evaluar el modelo con normalización y regularización en el conjunto de prueba y registrar los resultados con W&B.


**Reasoning**:
Import necessary libraries and initialize a new W&B run for evaluation.



In [10]:
import torch
import wandb

# Initialize a new W&B run for evaluation of the regularized model
wandb.init(project="mnist-classification", name="regularized_evaluation")

print("W&B initialized for regularized model evaluation.")

W&B initialized for regularized model evaluation.


**Reasoning**:
Evaluate the regularized model on the test dataset and log the test accuracy to W&B.



In [11]:
# Set the model to evaluation mode
regularized_model.eval()

# Deactivate gradient calculation
with torch.no_grad():
    correct_predictions = 0
    total_predictions = 0
    # Iterate over the test loader
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        # Forward pass
        outputs = regularized_model(inputs)

        # Calculate predictions
        _, predicted = torch.max(outputs.data, 1)
        total_predictions += labels.size(0)
        correct_predictions += (predicted == labels).sum().item()

    # Calculate evaluation accuracy
    test_accuracy = correct_predictions / total_predictions

    # Print evaluation accuracy
    print(f"Regularized Model Test Accuracy: {test_accuracy:.4f}")

    # Log evaluation accuracy to W&B
    wandb.log({"regularized_test_accuracy": test_accuracy})

# Finalize the W&B run
wandb.finish()

Regularized Model Test Accuracy: 0.9658


0,1
regularized_test_accuracy,▁

0,1
regularized_test_accuracy,0.9658


## Comparación de resultados

### Subtask:
Comparar los resultados de ambos modelos (baseline y con regularización/normalización) utilizando la interfaz de W&B.


## Análisis de resultados

### Subtask:
Analizar el impacto de la normalización y regularización en el rendimiento del modelo.


**Reasoning**:
Based on the previous steps, the baseline model achieved a test accuracy of 0.9749, and the regularized model achieved a test accuracy of 0.9658. I will now compare these values and discuss the impact of normalization and regularization as per the instructions.



In [12]:
baseline_accuracy = 0.9749
regularized_accuracy = 0.9658

print(f"Baseline Model Test Accuracy: {baseline_accuracy:.4f}")
print(f"Regularized Model Test Accuracy: {regularized_accuracy:.4f}")

print("\nComparison and Impact of Normalization and Regularization:")
if regularized_accuracy > baseline_accuracy:
    print("The regularized model performed better on the test set compared to the baseline model.")
elif regularized_accuracy < baseline_accuracy:
    print("The regularized model performed slightly worse on the test set compared to the baseline model.")
else:
    print("The regularized model performed similarly to the baseline model on the test set.")

print("\nDiscussion of Potential Reasons:")
print("Normalization (Batch Normalization) is typically used to stabilize and accelerate the training process by normalizing the inputs to layers.")
print("Regularization (L2 and Dropout) are techniques intended to prevent overfitting by adding penalties to the model complexity (L2) or randomly dropping units during training (Dropout).")
print("In this specific case, the baseline model achieved slightly higher test accuracy. This could be due to several factors:")
print("- The baseline model, without regularization, might have overfit the training data, but on this specific test set, the overfitting did not significantly degrade performance, or the test set is very similar to the training data distribution.")
print("- The chosen regularization strength (L2 weight_decay) or dropout rate might be too high for this relatively simple dataset and model, causing the model to be under-capacity or hindering its ability to learn the underlying patterns effectively.")
print("- The number of training epochs might be too small for the regularized model to fully benefit from the regularization techniques, which often require more training to converge effectively compared to unregularized models.")
print("- The dataset (MNIST) is relatively simple, and a basic model might already achieve near-optimal performance without heavy regularization. In such cases, regularization might not provide a significant boost or could even slightly hinder performance if not carefully tuned.")
print("For more complex datasets or models, normalization and regularization are generally crucial for improving generalization and preventing overfitting.")

Baseline Model Test Accuracy: 0.9749
Regularized Model Test Accuracy: 0.9658

Comparison and Impact of Normalization and Regularization:
The regularized model performed slightly worse on the test set compared to the baseline model.

Discussion of Potential Reasons:
Normalization (Batch Normalization) is typically used to stabilize and accelerate the training process by normalizing the inputs to layers.
Regularization (L2 and Dropout) are techniques intended to prevent overfitting by adding penalties to the model complexity (L2) or randomly dropping units during training (Dropout).
In this specific case, the baseline model achieved slightly higher test accuracy. This could be due to several factors:
- The baseline model, without regularization, might have overfit the training data, but on this specific test set, the overfitting did not significantly degrade performance, or the test set is very similar to the training data distribution.
- The chosen regularization strength (L2 weight_dec

## Summary:

### Data Analysis Key Findings

*   The MNIST dataset consists of 28x28 grayscale images of handwritten digits (0-9), with 60,000 training and 10,000 test examples.
*   A baseline neural network without normalization or regularization achieved a test accuracy of 0.9749 on the MNIST dataset.
*   A regularized neural network (with Batch Normalization, L2, and Dropout) achieved a test accuracy of 0.9658 on the MNIST dataset.
*   In this specific experiment on the MNIST dataset, the regularized model performed slightly worse on the test set compared to the baseline model.

### Insights or Next Steps

*   For a relatively simple dataset like MNIST, a basic model without extensive regularization might be sufficient, and careful hyperparameter tuning is crucial when adding regularization to avoid negatively impacting performance.
*   Investigate the effect of different regularization strengths (L2 weight decay and dropout rates) and potentially a higher number of training epochs for the regularized model to see if performance can be improved.
