# MNIST classifier tutorial

* MNIST database of handwritten digits is a popular dataset to demonstrate machine learning classifier training.
* In this tutorial, we train a basic Neural Network (NN) classifier using PyTorch.
* Once the basic NN classifier training workflow has been defined, it is easily converted to a Covalent workflow.
* This tutorial has two primary objectives.
* One goal is to show the ease with which a "normal" workflow can be adapted to a Covalent workflow.
* The second objective is to display the browser-based Covalent workflow tracking UI.

### Getting started
This tutorial requires installing Covalent, PyTorch and Torchvision.

In [1]:
# !pip install cova
# !conda install pytorch torchvision -c pytorch -y


Once Covalent has been installed, run `covalent start` to start the dispatcher and browser-based UI servers.

In [2]:
# !covalent start   # Start the server
# !covalent status  # Check server status
# !covalent stop    # Stop the server


Import Covalent, PyTorch and other relevant libraries.

In [3]:
import covalent as ct

import torch
import torch.nn.functional as F

from pathlib import Path

from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from typing import Tuple


### Construct MNIST classifier training workflow

Construct a convolutional neural network model by inheriting from `torch.nn.Module`.

In [4]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)


Construct a data loader to retrieve the classifier training and test data.

In [5]:
@ct.electron
def data_loader(
    batch_size: int,
    train: bool,
    download: bool = True,
    shuffle: bool = True,
    data_dir: str = "~/data/mnist/",
) -> torch.utils.data.dataloader.DataLoader:
    """MNIST data loader."""

    data_dir = Path(data_dir).expanduser()
    data_dir.mkdir(parents=True, exist_ok=True)

    data = datasets.MNIST(data_dir, train=train, download=download, transform=ToTensor())

    return DataLoader(data, batch_size=batch_size, shuffle=shuffle)


Construct a function to retrieve a Stochastic Gradient Descent optimizer.

In [6]:
@ct.electron
def get_optimizer(
    model: NeuralNetwork, learning_rate: float, momentum: float
) -> torch.optim.Optimizer:
    """Get Stochastic Gradient Descent optimizer."""

    return torch.optim.SGD(model.parameters(), learning_rate, momentum)


Write a function to train the model over one epoch.

In [7]:
@ct.electron
def train_over_one_epoch(
    dataloader: torch.utils.data.dataloader.DataLoader,
    model: NeuralNetwork,
    optimizer: torch.optim.Optimizer,
    log_interval: int,
    epoch: int,
    loss_fn,
    train_losses: list,
    train_counter: int,
    device: str = "cpu",
):
    """Train neural network model over one epoch."""

    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % log_interval == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

            train_losses.append(loss)
            train_counter.append((batch * 64) + ((epoch - 1) * len(dataloader.dataset)))

    return model, optimizer


Write a function to test the performance of the classifier for a given loss function.

In [8]:
@ct.electron
def test(
    dataloader: torch.utils.data.dataloader.DataLoader,
    model: NeuralNetwork,
    loss_fn: callable,
    device: str = "cpu",
) -> None:
    """Test the model performance."""

    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")


Write a function to train the model over several epochs and save the final state of the _optimizer_ and the _neural network_.

In [9]:
@ct.electron
def train_model(
    train_dataloader: torch.utils.data.dataloader.DataLoader,
    train_losses: list,
    train_counter: int,
    log_interval: int,
    model: NeuralNetwork,
    optimizer: torch.optim.Optimizer,
    loss_fn: callable,
    epochs: int,
    results_dir: str = "~/data/mnist/results/",
) -> Tuple[NeuralNetwork,]:
    """Train neural network model."""

    results_dir = Path(results_dir).expanduser()
    results_dir.mkdir(parents=True, exist_ok=True)

    for epoch in range(1, epochs + 1):
        print(f"Epoch {epoch}\n-------------------------------")
        model, optimizer = train_over_one_epoch(
            dataloader=train_dataloader,
            model=model,
            optimizer=optimizer,
            train_losses=train_losses,
            train_counter=train_counter,
            log_interval=log_interval,
            epoch=epoch,
            loss_fn=loss_fn,
        )

    # Save model and optimizer
    torch.save(model.state_dict(), f"{results_dir}model.pth")
    torch.save(optimizer.state_dict(), f"{results_dir}optimizer.pth")
    return model, optimizer


Finally, we put all these tasks together to construct the MNIST classifier training and test workflow.

In [10]:
@ct.lattice
def workflow(
    model: NeuralNetwork,
    epochs: int = 2,
    batch_size_train: int = 64,
    batch_size_test: int = 1000,
    learning_rate: float = 0.01,
    momentum: float = 0.5,
    log_interval: int = 10,
    loss_fn: callable = F.nll_loss,
):
    """MNIST classifier training workflow"""

    train_dataloader = data_loader(batch_size=batch_size_train, train=True)
    test_dataloader = data_loader(batch_size=batch_size_test, train=False)

    train_losses, train_counter, test_losses = [], [], []
    optimizer = get_optimizer(model=model, learning_rate=learning_rate, momentum=momentum)
    model, optimizer = train_model(
        train_dataloader=train_dataloader,
        train_losses=train_losses,
        train_counter=train_counter,
        log_interval=log_interval,
        model=model,
        optimizer=optimizer,
        loss_fn=loss_fn,
        epochs=epochs,
    )
    test(dataloader=test_dataloader, model=model, loss_fn=loss_fn)


### Run MNIST classifier workflow as a normal function

Run the MNIST classifier workflow to benchmark the performance and the time taken to train and test the model.

In [11]:
import time

start = time.time()
workflow(
    model=NeuralNetwork().to("cpu"),
)
end = time.time()
print(f"Regular workflow takes {end - start} seconds.")


Epoch 1
-------------------------------
loss: 2.314262  [    0/60000]
loss: 2.309220  [  640/60000]


  return F.log_softmax(x)


loss: 2.300434  [ 1280/60000]
loss: 2.307727  [ 1920/60000]
loss: 2.285933  [ 2560/60000]
loss: 2.297280  [ 3200/60000]
loss: 2.320525  [ 3840/60000]
loss: 2.285317  [ 4480/60000]
loss: 2.291002  [ 5120/60000]
loss: 2.279628  [ 5760/60000]
loss: 2.289176  [ 6400/60000]
loss: 2.289491  [ 7040/60000]
loss: 2.291130  [ 7680/60000]
loss: 2.285216  [ 8320/60000]
loss: 2.287600  [ 8960/60000]
loss: 2.280427  [ 9600/60000]
loss: 2.255551  [10240/60000]
loss: 2.248369  [10880/60000]
loss: 2.271496  [11520/60000]
loss: 2.265409  [12160/60000]
loss: 2.270688  [12800/60000]
loss: 2.174972  [13440/60000]
loss: 2.220165  [14080/60000]
loss: 2.192686  [14720/60000]
loss: 2.218361  [15360/60000]
loss: 2.184808  [16000/60000]
loss: 2.138504  [16640/60000]
loss: 2.182030  [17280/60000]
loss: 2.179194  [17920/60000]
loss: 2.095106  [18560/60000]
loss: 2.125116  [19200/60000]
loss: 1.867359  [19840/60000]
loss: 1.864085  [20480/60000]
loss: 1.872849  [21120/60000]
loss: 1.941268  [21760/60000]
loss: 1.80

<div align="center">
<img src="././mnist_images/regular_workflow_loss.png"  width="95%" height="95%"/>
</div>

### Run workflow with Covalent

The Covalent dispatcher (`ct.dispatch`) can be used to dispatch the workflow.

In [18]:
dispatch_id = ct.dispatch(workflow)(
    model=NeuralNetwork().to("cpu"),
)
print(f"Dispatch id: {dispatch_id}")
result = ct.get_result(dispatch_id=dispatch_id, wait=True)
print(f"Covalent workflow takes {result.end_time - result.start_time} seconds.")


In [15]:
print(result)


Lattice Result
status: COMPLETED
result: None
inputs: {'args': [NeuralNetwork(
  (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2_drop): Dropout2d(p=0.5, inplace=False)
  (fc1): Linear(in_features=320, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=10, bias=True)
)], 'kwargs': {}}
error: None

start_time: 2022-02-25 23:15:38.814316+00:00
end_time: 2022-02-25 23:16:20.271711+00:00

results_dir: /Users/faiyaz/Code/covalent/doc/source/tutorials/machine_learning/results
dispatch_id: bd610c55-f4fc-4aa9-964f-092c23c33ab7

Node Outputs
------------
data_loader(0): <torch.utils.data.dataloader.DataLoader object at 0x7fad2282e790>
:parameter:64(1): 64
:parameter:True(2): True
data_loader(3): <torch.utils.data.dataloader.DataLoader object at 0x7fad316424c0>
:parameter:1000(4): 1000
:parameter:False(5): False
get_optimizer(6): SGD (
Parameter Group 0
    dampening: 0
    lr: 0.01
    momentum

Once the workflow has been dispatched, the results can be tracked on the covalent UI browser. Use `covalent status` (shown below) to find the UI browser address. 

In [14]:
# !covalent status


Clicking on `http://0.0.0.0:47007` we can see the UI browser which lists the various dispatch ids.

<div align="center">
<img src="././mnist_images/ui_dispatch_ids.png"  width="95%" height="95%"/>
</div>

Clicking on the dispatch id, we can see the details of the workflow execution. Note the task execution dependency graph. 

<div align="center">
<img src="././mnist_images/ui_workflow.png"  width="95%" height="95%"/>
</div>

## Covalent concepts

* Covalent allows the user to deploy multiple workflows without having to wait for them to finish running. [Supports asynchronous workflow deployment without any additional code???]

* A Covalent workflow can be _dispatched_ to take advantage of automatic parallelization, user friendly interface etc. but it can also just as easily be run as a normal Python function. Hence, adding the electron / lattice decorators only enhances what we can be done with the workflow with a minimum overhead.

* Execution time for Covalent workflows and subtasks are readily available without needing any additional code.

References:
- https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html