# **Machine Learning with PyTorch**

PyTorch is an open source framework for AI research and commercial production in machine learning. It is used to build, train, and optimize deep learning neural networks for applications such as image recognition, natural language processing, and speech recognition. It provides computation support for CPU, GPU, parallel and distributed training on multiple GPUs and multiple nodes. PyTorch is also flexible and easily extensible, with specific libraries and tools available for many different domains. All of the above have made PyTorch a leading framework in machine learning.

This lab shows you how easy it is to get started with PyTorch and use it to build, train and evaluate a neural network.


# Objectives

After completing this lab you will be able to:

 - Install necessary PyTorch libraries;
 - Use PyTorch to build, train and evaluate neural networks.
 - Save the trained model parameters and use them later for inferencing.

### Installing Required Libraries

The following required libraries are pre-installed in the Skills Network Labs environment. However, if you run this notebook commands in a different Jupyter environment (e.g. Watson Studio or Ananconda), you will need to install these libraries by removing the `#` sign before `!pip` in the code cell below.


In [2]:
# All Libraries required for this lab are listed below. The libraries pre-installed on Skills Network Labs are commented.
# !pip install torch torchvision

### Importing Required Libraries


In [14]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

# Download Dataset and Create Data Loader

The images are 28x28 pixel images of digits 0 through 9.


In [15]:
# Download training data from MNIST datasets.
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

batch_size = 64

# Create data loaders to iterate over data
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

print("Training data size:", len(train_dataloader) * batch_size)
print("Test data size:", len(test_dataloader) * batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Training data size: 60032
Test data size: 10048
Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


# Define Model

We first determine the best device for performing training with cpu as the default device.

We then define the AI model as a neural network with 3 layers: an input layer, a hidden layer, and an output layer. Between the layers, we use a ReLU activation function.

Since the input images are 1x28x28 tensors, we need to flatten the input tensors into a 784 element tensor using the Flatten module before passing the input into our neural network.

In [16]:
# Get device for training.
device = torch.device(
    "cuda" if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available() # Apple Silicon GPU
    else "cpu"
)
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, num_classes)
        )

    def forward(self, image_tensor):
        image_tensor = self.flatten(image_tensor)
        logits = self.linear_relu_stack(image_tensor)
        return logits

input_size = 28*28
hidden_size = 512
num_classes = 10

model = NeuralNetwork(input_size, hidden_size, num_classes).to(device)
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


# Training loop

In [17]:
# Define our learning rate, loss function and optimizer
learning_rate = 1e-3 # 0.001
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Let's define our training function 
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()

    for batch_num, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Forward pass to compute prediction
        pred = model(X)
        # Compute prediction error using loss function
        loss = loss_fn(pred, y)

        # Backward pass
        optimizer.zero_grad() # zero any previous gradient calculations
        loss.backward() # calculate gradient
        optimizer.step() # update model parameters
        
        if batch_num > 0 and batch_num % 100 == 0:
            loss, current = loss.item(), batch_num * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

# Test Loop

The test methods evaluates the model's predictive performance using the test_dataloader. During testing, we don't require gradient computation, so we set the model in evaluate mode.

In [18]:
# Our test function
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    for X, y in dataloader:
        X, y = X.to(device), y.to(device)
        pred = model(X)
        test_loss += loss_fn(pred, y).item()
        correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

# Train the Model

Now that we have defined methods to train our model and test the trained model's predictive behavior, lets train the model for 5 epochs over the dataset.

In [19]:
# Let's run training
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 0.287140  [ 6400/60000]
loss: 0.207831  [12800/60000]
loss: 0.267115  [19200/60000]
loss: 0.131060  [25600/60000]
loss: 0.306589  [32000/60000]
loss: 0.133767  [38400/60000]
loss: 0.254461  [44800/60000]
loss: 0.225999  [51200/60000]
loss: 0.177978  [57600/60000]
Test Error: 
 Accuracy: 96.0%, Avg loss: 0.130468 

Epoch 2
-------------------------------
loss: 0.087250  [ 6400/60000]
loss: 0.087333  [12800/60000]
loss: 0.134419  [19200/60000]
loss: 0.037877  [25600/60000]
loss: 0.139846  [32000/60000]
loss: 0.066882  [38400/60000]
loss: 0.148412  [44800/60000]
loss: 0.117106  [51200/60000]
loss: 0.120266  [57600/60000]
Test Error: 
 Accuracy: 97.1%, Avg loss: 0.092219 

Epoch 3
-------------------------------
loss: 0.087777  [ 6400/60000]
loss: 0.054144  [12800/60000]
loss: 0.137608  [19200/60000]
loss: 0.099790  [25600/60000]
loss: 0.069709  [32000/60000]
loss: 0.037692  [38400/60000]
loss: 0.064657  [44800/60000]
loss: 0.075775  [51200/600

# Save the model and make predictions

Once we have a trained model, we can save the model parameters for future use in inferences. Here we save the state_dict of the model which contains the trained parameters. We then create a new instance of the model and load the previously saved parameters into the new instance of the model. Finally we can inference using the new instance of the model.


In [20]:
# Save our model parameters
torch.save(model.state_dict(), "ml_with_pytorch_model.pth")
print("Saved PyTorch Model State to ml_with_pytorch_model.pth")

# Load the saved model parameters into a new instance of the model
model = NeuralNetwork(input_size, hidden_size, num_classes).to(device)
model.load_state_dict(torch.load("ml_with_pytorch_model.pth"))

# Inference using the new model instance
model.eval()
for i in range(10):
    x, y = test_data[i][0], test_data[i][1]

    x = x.to(device)
    pred = model(x)
    predicted, actual = pred[0].argmax(0).item(), y
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Saved PyTorch Model State to ml_with_pytorch_model.pth
Predicted: "7", Actual: "7"
Predicted: "2", Actual: "2"
Predicted: "1", Actual: "1"
Predicted: "0", Actual: "0"
Predicted: "4", Actual: "4"
Predicted: "1", Actual: "1"
Predicted: "4", Actual: "4"
Predicted: "9", Actual: "9"
Predicted: "5", Actual: "5"
Predicted: "9", Actual: "9"


# Congratulations! You have completed the lab
