## Training and Tracking an Artificial Neural Network on the MNIST Dataset Using PyTorch and MLflow

Code Summary

This Python script builds, trains, and evaluates an Artificial Neural Network (ANN) model on the MNIST dataset using PyTorch, and logs the entire experiment with MLflow.

Key steps:

Hyperparameters: Sets learning rate, batch size, and number of epochs.

Data Preprocessing: Loads the MNIST dataset and applies normalization using torchvision.transforms.

Model Definition: Defines a simple feedforward ANN with three fully connected layers:

Input Layer: 28x28 pixels (784) flattened into a vector.

Two hidden layers with 128 and 64 units, using ReLU activation.

Output Layer: 10 units for classifying digits (0-9).

Training:
The train() function iterates over the training data, computes the loss using cross-entropy, and updates the model's weights using Adam optimizer.

Accuracy and loss are logged after each epoch.

Evaluation: The test() function computes the accuracy and loss on the test dataset.
MLflow Integration:
Parameters (learning rate, batch size, epochs) and metrics (loss, accuracy) are logged during training and testing for each epoch.

The final trained model is saved using mlflow.pytorch.log_model().
After training, the model and results are tracked in MLflow for reproducibility and analysis.

This workflow is useful for experimenting with hyperparameters, tracking performance, and storing models efficiently.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from sklearn.metrics import accuracy_score

import mlflow
import mlflow.pytorch

In [7]:
learning_rate = 0.001
batch_size = 64
epochs = 10


transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])


In [8]:
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1006)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:00<00:00, 11.9MB/s]


Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1006)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 81.5kB/s]


Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1006)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:07<00:00, 228kB/s]


Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1006)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<?, ?B/s]

Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw






In [16]:
train_dataset

Dataset MNIST
    Number of datapoints: 60000
    Root location: ./data
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.1307,), std=(0.3081,))
           )

In [9]:
class ANN(nn.Module):
    def __init__(self):
        super(ANN, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)
    
    def forward(self, x):
        x = x.view(-1, 28*28)  # Flatten the input
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [10]:
def train(model, device, train_loader, optimizer, criterion, epoch):
    model.train()
    total_loss = 0
    correct = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        _, predicted = torch.max(output.data, 1)
        correct += (predicted == target).sum().item()
    
    avg_loss = total_loss / len(train_loader)
    accuracy = 100. * correct / len(train_loader.dataset)
    print(f'Train Epoch: {epoch} Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%')
    return avg_loss, accuracy

In [11]:
def test(model, device, test_loader, criterion):
    model.eval()
    total_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            total_loss += criterion(output, target).item()
            _, predicted = torch.max(output.data, 1)
            correct += (predicted == target).sum().item()
    
    avg_loss = total_loss / len(test_loader)
    accuracy = 100. * correct / len(test_loader.dataset)
    print(f'Test Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%')
    return avg_loss, accuracy

In [12]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = ANN().to(device)
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

In [13]:
with mlflow.start_run():
    mlflow.log_param("learning_rate", learning_rate)
    mlflow.log_param("batch_size", batch_size)
    mlflow.log_param("epochs", epochs)

    for epoch in range(1, epochs + 1):
        train_loss, train_acc = train(model, device, train_loader, optimizer, criterion, epoch)
        test_loss, test_acc = test(model, device, test_loader, criterion)
        
        # Log metrics to MLflow
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("train_accuracy", train_acc, step=epoch)
        mlflow.log_metric("test_loss", test_loss, step=epoch)
        mlflow.log_metric("test_accuracy", test_acc, step=epoch)


    mlflow.pytorch.log_model(model, "model")

print("Training and tracking with MLflow completed.")

Train Epoch: 1 Loss: 0.2738, Accuracy: 91.82%
Test Loss: 0.1309, Accuracy: 95.94%
Train Epoch: 2 Loss: 0.1151, Accuracy: 96.42%
Test Loss: 0.0979, Accuracy: 96.88%
Train Epoch: 3 Loss: 0.0802, Accuracy: 97.47%
Test Loss: 0.0929, Accuracy: 96.98%
Train Epoch: 4 Loss: 0.0618, Accuracy: 98.01%
Test Loss: 0.0846, Accuracy: 97.37%
Train Epoch: 5 Loss: 0.0508, Accuracy: 98.34%
Test Loss: 0.0894, Accuracy: 97.34%
Train Epoch: 6 Loss: 0.0395, Accuracy: 98.71%
Test Loss: 0.0780, Accuracy: 97.69%
Train Epoch: 7 Loss: 0.0355, Accuracy: 98.81%
Test Loss: 0.0875, Accuracy: 97.52%
Train Epoch: 8 Loss: 0.0287, Accuracy: 98.99%
Test Loss: 0.0848, Accuracy: 97.74%
Train Epoch: 9 Loss: 0.0242, Accuracy: 99.17%
Test Loss: 0.0984, Accuracy: 97.43%
Train Epoch: 10 Loss: 0.0264, Accuracy: 99.12%
Test Loss: 0.0872, Accuracy: 97.83%
Training and tracking with MLflow completed.


The Artificial Neural Network (ANN) model, trained on the MNIST dataset using PyTorch and tracked via MLflow, achieved impressive results. After 10 epochs, the model reached a Test Loss of 0.0872 and a Test Accuracy of 97.83%, demonstrating strong performance in digit classification. The use of MLflow allowed for efficient tracking of hyperparameters, training metrics, and the final model, facilitating future experimentation and model optimization