# PyTorch quickstart - Training

## Welcome to PrimeHub!

In this quickstart, we will perfome following actions to train a model: 

1. Train a neural network that classifies images.
1. Move trained model to <a target="_blank" href="https://docs.primehub.io/docs/quickstart/nb-data-store#phfs-storage">PHFS Storage</a>.

### Prerequisites
1. Enable <a target="_blank" href="https://docs.primehub.io/docs/quickstart/nb-data-store#phfs-storage">PHFS Storage</a>.

**Contact your admin if any prerequisite is not enabled yet.**

## 1. Train a neural network that classifies images

Firstly, let's import libraries.

In [1]:
import os
from datetime import datetime
import torch
from torch.nn import functional as F
from torch import nn
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms

Load and prepare the MNIST dataset. Convert the samples to tensor and normalize them.

In [2]:
!wget www.di.ens.fr/~lelarge/MNIST.tar.gz
!tar -zxvf MNIST.tar.gz
!rm MNIST.tar.gz

transform=transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.1307,), (0.3081,))])

mnist_train = MNIST(os.getcwd(), train=True, download=True, transform=transform)
mnist_train = DataLoader(mnist_train, batch_size=64, shuffle=True)
mnist_test = MNIST(os.getcwd(), train=False, download=True, transform=transform)
mnist_test = DataLoader(mnist_test, batch_size=64)

--2021-10-29 14:45:35--  http://www.di.ens.fr/~lelarge/MNIST.tar.gz
Resolving www.di.ens.fr (www.di.ens.fr)... 129.199.99.14
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.di.ens.fr/~lelarge/MNIST.tar.gz [following]
--2021-10-29 14:45:36--  https://www.di.ens.fr/~lelarge/MNIST.tar.gz
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘MNIST.tar.gz’

MNIST.tar.gz            [        <=>         ]  33.20M   804KB/s    in 25s     

2021-10-29 14:46:02 (1.33 MB/s) - ‘MNIST.tar.gz’ saved [34813078]

MNIST/
MNIST/raw/
MNIST/raw/train-labels-idx1-ubyte
MNIST/raw/t10k-labels-idx1-ubyte.gz
MNIST/raw/t10k-labels-idx1-ubyte
MNIST/raw/t10k-images-idx3-ubyte.gz
MNIST/raw/train-images-idx3-ubyte
MNIST/raw/train-labels-idx1-ubyte.gz
MNIST/raw/t10k-images-idx3-ubyte
MNIST/raw/tra

Set device on GPU if available, else CPU.

In [3]:
if torch.cuda.is_available():
    device = torch.cuda.current_device()
    print(torch.cuda.device(device))
    print('Device Count:', torch.cuda.device_count())
    print('Device Name: {}'.format(torch.cuda.get_device_name(device)))
else:
    device = 'cpu'

Build the model class.

In [4]:
class PyTorchModel(nn.Module):
    def __init__(self):
        super().__init__()

        # mnist images are (1, 28, 28) (channels, width, height)
        self.layer_1 = nn.Linear(28 * 28, 128)
        self.layer_2 = nn.Linear(128, 256)
        self.layer_3 = nn.Linear(256, 10)

    def forward(self, x):
        batch_size, channels, width, height = x.size()

        # (b, 1, 28, 28) -> (b, 1*28*28)
        x = x.view(batch_size, -1)
        x = self.layer_1(x)
        x = F.relu(x)
        x = self.layer_2(x)
        x = F.relu(x)
        x = self.layer_3(x)

        x = F.softmax(x, dim=1)
        return x

Create the model instance.

In [5]:
net = PyTorchModel().to(device)

Choose an optimizer, loss function.

In [6]:
criterion = nn.NLLLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)

Train the model to minimize the loss. It prints the loss every 200 mini-batches.

In [7]:
for epoch in range(2):
    running_loss = 0.0
    for i, data in enumerate(mnist_train, 0):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs.to(device))
        loss = criterion(outputs, labels.to(device))
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 200 == 199:
            print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1, running_loss / 199))
            running_loss = 0.0

[1,   200] loss: -0.773
[1,   400] loss: -0.900
[1,   600] loss: -0.915
[1,   800] loss: -0.931
[2,   200] loss: -0.942
[2,   400] loss: -0.945
[2,   600] loss: -0.946
[2,   800] loss: -0.949


Use the test data to check the model performance.

In [8]:
correct = 0
total = 0
with torch.no_grad():
    for data in mnist_test:
        images, labels = data
        outputs = net(images.to(device))
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels.to(device)).sum().item()

print("Accuracy of the network on the %d test images: %d %%" % (total, 100 * correct / total))

Accuracy of the network on the 10000 test images: 94 %


Save the trained model.

In [9]:
now = datetime.now()
date_time = now.strftime("%Y%m%d-%H%M%S")
SAVED_NAME = f"pytorch-mnist-{date_time}"
os.makedirs(SAVED_NAME, exist_ok=True)
torch.save(net.state_dict(), os.path.join(SAVED_NAME, "model.pt"))
print(f"We successfully saved the model in {SAVED_NAME}.")

We successfully saved the model in pytorch-mnist-20211029-144829.


Save the model class file. The class name must be `PyTorchModel`. The content is the model class content and the imports used in the class.

In [10]:
model_class_file_content = """
import torch
from torch.nn import functional as F
from torch import nn

class PyTorchModel(nn.Module):
    def __init__(self):
        super().__init__()

        # mnist images are (1, 28, 28) (channels, width, height)
        self.layer_1 = nn.Linear(28 * 28, 128)
        self.layer_2 = nn.Linear(128, 256)
        self.layer_3 = nn.Linear(256, 10)

    def forward(self, x):
        batch_size, channels, width, height = x.size()

        # (b, 1, 28, 28) -> (b, 1*28*28)
        x = x.view(batch_size, -1)
        x = self.layer_1(x)
        x = F.relu(x)
        x = self.layer_2(x)
        x = F.relu(x)
        x = self.layer_3(x)

        x = F.softmax(x, dim=1)
        return x
"""
model_class_file = open(os.path.join(SAVED_NAME, "ModelClass.py"), "w")
model_class_file.write(model_class_file_content)
model_class_file.close()
print(f"We successfully saved the model class file in {SAVED_NAME}.")

We successfully saved the model class file in pytorch-mnist-20211029-144829.


## 2. Move trained model to <a target="_blank" href="https://docs.primehub.io/docs/quickstart/nb-data-store#phfs-storage">PHFS Storage</a>

To deploy our model, we need to move model to PHFS storage.

In [11]:
SAVED_DIR = "~/phfs/example-models/pytorch"

In [12]:
! mkdir -p $SAVED_DIR
! mv $SAVED_NAME $SAVED_DIR

Check the model under PHFS storage.

In [13]:
! ls -lt $SAVED_DIR

total 0
drwxr-xr-x 1 root root 0 Oct 29 14:48 pytorch-mnist-20211029-144829
drwxr-xr-x 1 root root 0 Oct 29 14:45 pytorch-mnist-20211029-144522
drwxr-xr-x 1 root root 0 Oct 29 14:40 pytorch-mnist-20211029-144006
