# **Machine Learning with PyTorch**

# Introduction to PyTorch

PyTorch is an open-source machine learning library developed by Facebook's AI Research lab. It's primarily used for building and training neural networks, but it also supports other types of machine learning models.

PyTorch's design philosophy is to be efficient, flexible, and easy to use. It's based on the Torch library, which has been used for many years in the research community, but PyTorch has added its own improvements and features.

Here are some key features of PyTorch:

- **Dynamic computation graph**: PyTorch allows you to define a computation graph dynamically during runtime, which makes it easier to debug and understand your models.

- **Automatic differentiation**: PyTorch automatically computes gradients for you, which is essential for training neural networks.

- **GPU acceleration**: PyTorch can run on GPUs, which can significantly speed up your computations.

- **Large ecosystem**: PyTorch has a large and active community, which means there are many pre-built models and tools available.

- **Pythonic syntax**: PyTorch uses Python syntax, which makes it easy to read and write code.

Overall, PyTorch is a powerful and flexible library that's well-suited for a wide range of machine learning tasks.

## Step-1: Importing the Required Libraries

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

### MNIST Dataset

Here, we'll be working with the MNIST dataset, which contains over 60,000 images of handwritten digits. The dataset will be split into training and testing sets.
<!--
Here are the main steps we'll be following:

1. **Download the MNIST dataset**: We'll download the MNIST dataset and load it into our environment.

2. **Create a DataLoader for the dataset**: We'll create a DataLoader object to handle loading and preprocessing the data for our model.

3. **Define an AI model to recognize a hand-written digit**: We'll define a neural network model that can recognize hand-written digits.

4. **Train the defined AI model using training data from the MNIST dataset**: We'll train our model using the training data from the MNIST dataset.

5. **Test the trained AI model using testing data from the MNIST dataset**: We'll test our trained model using the testing data from the MNIST dataset.

6. **Evaluate the model**: We'll evaluate the performance of our model using various metrics.

Overall, the goal of this lab is to build and train a neural network model that can recognize hand-written digits using the MNIST dataset.
-->

## Step-2: Downloading Datasets and Building DataLoader

In this code cell, we're downloading and loading the MNIST dataset using the datasets module from PyTorch. The MNIST dataset is a popular dataset that contains images of handwritten digits.


- Using the `MNIST` class from the `datasets` module download the training and testing data. The `root` parameter specifies the directory where the data will be saved. The `train` parameter specifies whether we want the training or testing data. The `download` parameter specifies whether we want to download the data if it's not already downloaded. The `transform` parameter specifies the transformation to apply to the data.

In [2]:
# Download training data from MNIST datasets.
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

- Creating `DataLoader` objects to iterate over the data. The `DataLoader` class takes in a dataset and a batch size, and returns an iterator that can be used to load batches of data. We're passing in the `training_data` and `test_data` datasets that we downloaded earlier, and setting the batch size to 64.

In [3]:

batch_size = 64

# Create data loaders to iterate over data
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

- Verify size of the training and testing data by multiplying the length of the `DataLoader` object by the batch size.

In [4]:
print("Training data size:", len(train_dataloader) * batch_size)
print("Test data size:", len(test_dataloader) * batch_size)

Training data size: 60032
Test data size: 10048


- Iterating over the `test_dataloader` object and printing the shape of the input data `X` and the shape and data type of the target data `y`. The X data is in the shape of `[N, C, H, W]`, where `N` is the number of samples, `C` is the number of channels (in this case, 1), `H` is the height of the image, and `W` is the width of the image. The `y` data is in the shape of `[N]`, where `N` is the number of samples.

In [5]:
for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


## Step-3: Defining the Model

In this code cell, we're defining a neural network model using the `NeuralNetwork` class that we defined earlier. The model takes in an image tensor and returns a tensor of logits.

- Determine the device for training. We're checking if a GPU is available using `torch.cuda.is_available()`, and if not, we're checking if an Apple Silicon GPU is available using `torch.backends.mps.is_available()`. If neither is available, we're using the CPU.

In [6]:
# Get device for training.
device = torch.device(
    "cuda" if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available() # Apple Silicon GPU
    else "cpu"
)
print(f"Using {device} device")

# Printing Device Architecture
if device.type == "cuda":
    print(torch.cuda.get_device_name(0))
elif device.type == "mps":
    print(torch.backends.mps.current_device_name())

Using cuda device
NVIDIA GeForce GTX 1050 Ti


- Defining the `NeuralNetwork` class that inherits from `nn.Module`. The `__init__` method initializes the model with an input size, a hidden size, and a number of classes. The `flatten` layer flattens the input image tensor into a 1D tensor. The `linear_relu_stack` layer is a sequential container that contains three linear layers, each followed by a ReLU activation function.

In [7]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, num_classes)
        )

    def forward(self, image_tensor):
        image_tensor = self.flatten(image_tensor)
        logits = self.linear_relu_stack(image_tensor)
        return logits

- Setting the input size, hidden size, and number of classes for the model, We're creating an instance of the `NeuralNetwork` class with these parameters, and moving it to the device we defined earlier using the `to()` method. We're then printing the model.

In [8]:
input_size = 28*28
hidden_size = 512
num_classes = 10

model = NeuralNetwork(input_size, hidden_size, num_classes).to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


## Step-4: Training loop
Here, we're defining the training loop for our model. The training loop takes in a dataloader, a model, a loss function, and an optimizer, and trains the model on the data.

- Setting the learning rate, loss function, and optimizer for our model, The `learning_rate` is set to 0.001. The loss function is set to `nn.CrossEntropyLoss()`, which is a common loss function for classification tasks. The optimizer is set to `torch.optim.Adam`, which is a common optimizer for training neural networks.

In [9]:
learning_rate = 1e-3 # 0.001
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

- Defining the `train` function that takes in a dataloader, a model, a loss function, and an optimizer. The function sets the model to training mode, iterates over the dataloader, and performs a forward pass to compute the prediction, a backward pass to compute the gradient, and an update to the model parameters using the optimizer. It also prints the loss and the progress of the training loop every 100 batches.

In [11]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()

    for batch_num, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Forward pass to compute prediction
        pred = model(X)
        # Compute prediction error using loss function
        loss = loss_fn(pred, y)

        # Backward pass
        optimizer.zero_grad() # zero any previous gradient calculations
        loss.backward() # calculate gradient
        optimizer.step() # update model parameters
        
        if batch_num > 0 and batch_num % 100 == 0:
            loss, current = loss.item(), batch_num * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

## Step-5: Test Loop
Here, we're defining the test loop for our model. The test loop takes in a dataloader, a model, and a loss function, and evaluates the model on the test data.

- Defining the `test` function that takes in a dataloader, a model, and a loss function. The function sets the model to evaluation mode, iterates over the dataloader, and computes the loss and accuracy of the model on the test data. It also prints the accuracy and average loss of the model on the test data.

In [12]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    for X, y in dataloader:
        X, y = X.to(device), y.to(device)
        pred = model(X)
        test_loss += loss_fn(pred, y).item()
        correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

## Step-6: Model Training
Here, we're training the model using the training loop that we defined earlier.

In [13]:
epochs = 10

for epoch in range(epochs):
    print(f"Starting epoch {epoch+1}/{epochs}")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)

Starting epoch 1/10
loss: 0.279117  [ 6400/60000]
loss: 0.191582  [12800/60000]
loss: 0.258322  [19200/60000]
loss: 0.153658  [25600/60000]
loss: 0.322177  [32000/60000]
loss: 0.127257  [38400/60000]
loss: 0.243816  [44800/60000]
loss: 0.277377  [51200/60000]
loss: 0.163641  [57600/60000]
Test Error: 
 Accuracy: 95.1%, Avg loss: 0.151130 

Starting epoch 2/10
loss: 0.089042  [ 6400/60000]
loss: 0.094192  [12800/60000]
loss: 0.113072  [19200/60000]
loss: 0.025151  [25600/60000]
loss: 0.150454  [32000/60000]
loss: 0.049042  [38400/60000]
loss: 0.107422  [44800/60000]
loss: 0.160705  [51200/60000]
loss: 0.101604  [57600/60000]
Test Error: 
 Accuracy: 96.4%, Avg loss: 0.111046 

Starting epoch 3/10
loss: 0.075426  [ 6400/60000]
loss: 0.039534  [12800/60000]
loss: 0.139657  [19200/60000]
loss: 0.017066  [25600/60000]
loss: 0.072302  [32000/60000]
loss: 0.044181  [38400/60000]
loss: 0.061914  [44800/60000]
loss: 0.139639  [51200/60000]
loss: 0.087648  [57600/60000]
Test Error: 
 Accuracy: 96

## Step-7: Saving the Model parameters to make predictions


- Saving the model parameters to a file named `ml_with_pytorch_model.pth` using the `torch.save()` function. We're passing in the model state dictionary as the first argument and the file name as the second argument.

In [14]:
# Save our model parameters
torch.save(model.state_dict(), "ml_with_pytorch_model.pth")
print("Saved PyTorch Model State to ml_with_pytorch_model.pth")

Saved PyTorch Model State to ml_with_pytorch_model.pth


- Loading the saved model parameters into a new instance of the model using the `model.load_state_dict()` function. We're passing in the saved model parameters as the first argument and the model instance as the second argument.

In [15]:
# Load the saved model parameters into a new instance of the model
model = NeuralNetwork(input_size, hidden_size, num_classes).to(device)
model.load_state_dict(torch.load("ml_with_pytorch_model.pth"))

<All keys matched successfully>

- Set the model to **evaluation mode** with `model.eval()`.  
- Loop through the **first 10 samples** in the test dataset and generate predictions using the model.  
- Pass each test sample to the model to obtain its **prediction tensor**.  
- Determine the **predicted class label** by applying `argmax(0).item()` to the prediction tensor.  
- Display both the **predicted** and **actual** class labels for each test sample.

In [16]:
# Inference using the new model instance
model.eval()
for i in range(10):
    x, y = test_data[i][0], test_data[i][1]

    x = x.to(device)
    pred = model(x)
    predicted, actual = pred[0].argmax(0).item(), y
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "7", Actual: "7"
Predicted: "2", Actual: "2"
Predicted: "1", Actual: "1"
Predicted: "0", Actual: "0"
Predicted: "4", Actual: "4"
Predicted: "1", Actual: "1"
Predicted: "4", Actual: "4"
Predicted: "9", Actual: "9"
Predicted: "5", Actual: "5"
Predicted: "9", Actual: "9"
