**QuickStart**

This section goes over the API and the PyTorch library that is most commonly used for machine learning tasks. It is not intended for you to understand the majortiy of the functions and methods being done but for familiarity on what you will encounter going through PyTorch.

**Working With Data**

There are two key fundamental building blocks when it comes to handling data in PyTorch. These are *torch.utils.dataa.DataLoader* and *torch.utils.data.Dataset*. Dataset stores the samples and their associated labels while DataLoader wraps an iterable around the Dataset.

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

PyTorch offers many convenient libraries to work with datasets. These are TorchText, TorchVision, and TorchAudio. These are libraries on their own that you could use depending on the area that you want to work on. Here, we will be using TorchVision as our dataset is located there. 

The *torchvision.datasets* module contains Dataset objects for many real-world vision data. Examples are COCO, CIFAR, ImageNet, Flickr8k, etc. Every TorchVision dataset includes two arguments: *transform* and *target_transform*. These are used to modify the samples and labels respectively. 

In [2]:
# Used to download the training data from datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Used to download the test data from datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

# Notice how we specified that for training_data we identified train=True while for our test_data we used train=False.

We pass the Dataset as an argument to DataLoader because DataLoader will use an iterable to wrap over our Dataset. DataLoader also supports automatic batching, sampling, shuffling, and multiprocessing data loading. We specify our batch size as 24. That means each element in the dataloader iterable will return a batch of 64 features and labels.  

In [3]:
batch_size = 64

#Create data loaders.

train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

# It is important that both train and test dataloaders have the same batchsize as we will be using these together.

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


**Creating Models**

Let's try making a model. There'll be a lot of scary stuff involved but once you get to understand the fundamentals in the later chapters, you can come back here and it'll all make sense. So look forward to that. For now, be scared! 

In order to define a neural network, we need to create a class that inherits from nn.Module. We define the layers of the network in the __init__ functio and specify how data will pass through the network in the forward function. To accelerate oprations in the neural network, we move it to the GPU or MPS if available. 

This pretty much describes the code below:

In [4]:
# Get CPU if CUDA is not available, else use CUDA/GPU
a = 10
b = 12

if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"
else: 
    device = "cpu"
    
print(f"using {device} device")

# Defining a Model

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

using cuda device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


**Optimizing The Model Parameters**

Since we've succesfully created a model and specified it's inner-workings. Let's start utilizing the it and initializing the parameters to use in order to train the model succesfully. 

First, we're going to need to decide on a loss function and an optimizer.  - eg. L1.Loss / SGD

In a single training loop, the model will make predictions based on the training dataset in batches that we've specified earlier. Then this backpropagates the prediction error to adjust the model's parameters.

Wait, what's backpropagation?! Sounds scary? No. Not really, it's basically just retracing your steps. 

So there are terms that might be confusing going through machine learning, especially when dealing with models.

**Forward Pass:**
1. During the forward pass, input data flows through the neural network layers.
2. Each layer performs a transformation (such as matrix multiplication, activation function, etc.) on the input.
3. The final layer produces the network’s output (predictions).
   
**Loss Calculation:**
1. After obtaining predictions, we compute a loss (error) that quantifies how far off our predictions are from the actual target values.
2. The loss function measures the discrepancy between predictions and ground truth.
   
**Backward Pass (Backpropagation):**
1. Here’s where the retracing begins!
2. We calculate the gradient of the loss with respect to each model parameter (weights and biases).
3. The gradient indicates how much the loss changes when we tweak a parameter slightly.
4. We use the chain rule to compute gradients layer by layer, starting from the output layer and moving backward.
5. Essentially, we’re figuring out how much each parameter contributed to the overall error.
   
**Parameter Updates:**
7. Armed with gradients, we update the model’s parameters using optimization algorithms (e.g., stochastic gradient descent).
8. These updates nudge the parameters in a direction that reduces the loss.
9. The process repeats for multiple epochs until convergence.

This pretty much describes whats happening inside a model! Still a bit too complicated? Here's an analogy.

**Retracing the Steps:**
1. Imagine you’re hiking in a dense forest (the neural network).
2. You take a path (forward pass) and reach a clearing (predictions).
3. Now, you want to find your way back to the trailhead (minimize loss).
4. Backpropagation retraces your steps, guiding you through the forest to adjust your route (parameter updates).

Given this valuable information. We'll do exactly that!

In [5]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        #Compute Loss
        pred = model(X)
        loss = loss_fn(pred, y)

        #Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch+1) * len(X)
            print(f"loss: {loss:>7f} [{current:>5d} {size:>5d}]")

The function above is for training one epoch.

Let's also create a funtion that can test the model's performance with our test dataset to make sure that our model is learning and also learning correctly!

In [6]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [7]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We have all the functions that we need. We've created our model which was the NeuralNetwork. We also have specified how we can train our model and lastly, we have also created a function to check the accuracy of the training of our model. Now, we just need to repeat this process over and over again. Each iteration is called an epoch! 

For each epoch, we need to see the accuracy and the loss. So accuracy is the overall effectiveness of our model in our perspective. Our model doesn't actually need the accuracy, it just needs the loss! So basically, accuracy is for human consumption while the weights are for our model's to utilize. It's important that we get both so we can understand how our model is doing and we also get a general overview of it's effectiveness through accuracy.

In [8]:
epochs = 5
for t in range(epochs):
    print(f"Test Error: {t+1}\n---------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Epochs Completed!")

Test Error: 1
---------------------------
loss: 2.291268 [   64 60000]
loss: 2.284646 [ 6464 60000]
loss: 2.270569 [12864 60000]
loss: 2.270412 [19264 60000]
loss: 2.249402 [25664 60000]
loss: 2.230483 [32064 60000]
loss: 2.232310 [38464 60000]
loss: 2.200326 [44864 60000]
loss: 2.202946 [51264 60000]
loss: 2.172067 [57664 60000]
Test Error: 
 Accuracy: 51.8%, Avg loss: 2.163739 

Test Error: 2
---------------------------
loss: 2.172838 [   64 60000]
loss: 2.159387 [ 6464 60000]
loss: 2.111948 [12864 60000]
loss: 2.123531 [19264 60000]
loss: 2.068344 [25664 60000]
loss: 2.027146 [32064 60000]
loss: 2.042439 [38464 60000]
loss: 1.966782 [44864 60000]
loss: 1.977923 [51264 60000]
loss: 1.902617 [57664 60000]
Test Error: 
 Accuracy: 54.7%, Avg loss: 1.897839 

Test Error: 3
---------------------------
loss: 1.935545 [   64 60000]
loss: 1.898257 [ 6464 60000]
loss: 1.792604 [12864 60000]
loss: 1.821865 [19264 60000]
loss: 1.716495 [25664 60000]
loss: 1.677881 [32064 60000]
loss: 1.688919 [

66% is a respectable number. Let's see if we can do better. Let's do another 5 epochs and see if it improves.

In [9]:
epochs = 5
for t in range(epochs):
    print(f"Test Error: {t+1}\n---------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Epochs Completed!")

Test Error: 1
---------------------------
loss: 1.158862 [   64 60000]
loss: 1.152614 [ 6464 60000]
loss: 0.985317 [12864 60000]
loss: 1.115977 [19264 60000]
loss: 1.000349 [25664 60000]
loss: 1.022981 [32064 60000]
loss: 1.052574 [38464 60000]
loss: 0.995721 [44864 60000]
loss: 1.038914 [51264 60000]
loss: 0.971774 [57664 60000]
Test Error: 
 Accuracy: 65.8%, Avg loss: 0.983090 

Test Error: 2
---------------------------
loss: 1.036489 [   64 60000]
loss: 1.050346 [ 6464 60000]
loss: 0.868631 [12864 60000]
loss: 1.022363 [19264 60000]
loss: 0.908663 [25664 60000]
loss: 0.926903 [32064 60000]
loss: 0.972166 [38464 60000]
loss: 0.920909 [44864 60000]
loss: 0.959006 [51264 60000]
loss: 0.903905 [57664 60000]
Test Error: 
 Accuracy: 67.3%, Avg loss: 0.909118 

Test Error: 3
---------------------------
loss: 0.946688 [   64 60000]
loss: 0.979599 [ 6464 60000]
loss: 0.785239 [12864 60000]
loss: 0.957573 [19264 60000]
loss: 0.847088 [25664 60000]
loss: 0.856918 [32064 60000]
loss: 0.915952 [

70% is a lot better! Now that we have our model. Let's try actually using it for ourself. It's a lot more cooler if we can see it in action. But first we have to save it. One common way to save the model is by serializing the internal state dictionary. What does that mean? Well it just means that we save the models parameters (the parameters that we already optimized).

In [10]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State To - model.pth")

Saved PyTorch Model State To - model.pth


We've saved it, it's time to load it. That means that we are recreating the model's structure and loading the state dictionary (optimized parameters that we just saved) into it.

In [11]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

In [12]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]


model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax()], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


And that's it for the quickstart. We've created a neural network!