<a href="https://colab.research.google.com/github/MaCoZu/pytorch_tutorials/blob/main/01_quickstart_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
+# For tips on running notebooks in Google Colab, see
# https://pytorch.org/tutorials/beginner/colab
%matplotlib inline

SyntaxError: ignored


[Learn the Basics](intro.html) ||
**Quickstart** ||
[Tensors](tensorqs_tutorial.html) ||
[Datasets & DataLoaders](data_tutorial.html) ||
[Transforms](transforms_tutorial.html) ||
[Build Model](buildmodel_tutorial.html) ||
[Autograd](autogradqs_tutorial.html) ||
[Optimization](optimization_tutorial.html) ||
[Save & Load Model](saveloadrun_tutorial.html)

# Quickstart
This section runs through the API for common tasks in machine learning. Refer to the links in each section to dive deeper.

## Working with data
PyTorch has two [primitives to work with data](https://pytorch.org/docs/stable/data.html):
``torch.utils.data.DataLoader`` and ``torch.utils.data.Dataset``.
``Dataset`` stores the samples and their corresponding labels, and ``DataLoader`` wraps an iterable around
the ``Dataset``.


In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

PyTorch offers domain-specific libraries such as [TorchText](https://pytorch.org/text/stable/index.html),
[TorchVision](https://pytorch.org/vision/stable/index.html), and [TorchAudio](https://pytorch.org/audio/stable/index.html),
all of which include datasets. For this tutorial, we  will be using a TorchVision dataset.

The ``torchvision.datasets`` module contains ``Dataset`` objects for many real-world vision data like
CIFAR, COCO ([full list here](https://pytorch.org/vision/stable/datasets.html)). In this tutorial, we
use the FashionMNIST dataset. Every TorchVision ``Dataset`` includes two arguments: ``transform`` and
``target_transform`` to modify the samples and labels respectively.



In [3]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:00<00:00, 114410098.53it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 5823997.11it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:00<00:00, 67145396.81it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 23964791.33it/s]


Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw



We pass the ``Dataset`` as an argument to ``DataLoader``. This wraps an iterable over our dataset, and supports
automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element
in the dataloader iterable will return a batch of 64 features and labels.



In [4]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


Read more about [loading data in PyTorch](data_tutorial.html).




--------------




## Creating Models
To define a neural network in PyTorch, we create a class that inherits
from [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). We define the layers of the network
in the ``__init__`` function and specify how data will pass through the network in the ``forward`` function. To accelerate
operations in the neural network, we move it to the GPU or MPS if available.



In [5]:
# Get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            # nn.Linear layer is used for linear transformations, also known as fully connected layer or dense layer
            nn.Linear(28*28, 512),
            #  nn.ReLU() layer applies the Rectified Linear Unit (ReLU) activation function to introduce non-linearity.
            nn.ReLU(),

            nn.Linear(512, 512),
            nn.ReLU(),
            # The final nn.Linear(512, 10) layer takes the output of the second layer (512-dimensional tensor) and produces
            # a 10-dimensional tensor, which corresponds to the number of output classes (assuming this is a classification problem with 10 classes).
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

# .to(device) method is used to move the model to the specified device (CPU, GPU, or MPS).
model = NeuralNetwork().to(device)
print(model)

Using cuda device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Read more about [building neural networks in PyTorch](buildmodel_tutorial.html).




--------------




## Optimizing the Model Parameters
To train a model, we need a [loss function](https://pytorch.org/docs/stable/nn.html#loss-functions)
and an [optimizer](https://pytorch.org/docs/stable/optim.html).


<details>
  <summary>Explanation</summary>


  torch.optim.SGD:<br>
  is an optimizer in PyTorch that stands for Stochastic Gradient Descent. It is used to update the parameters of a neural network during the training process. SGD is one of the most popular optimization algorithms used in machine learning and deep learning.


  model.parameters(): <br>
  This part passes the parameters of the neural network model to the optimizer. The model.parameters() function returns an iterable containing all the learnable parameters of the model. These parameters are the weights and biases of the fully connected layers defined in the NeuralNetwork model.


  lr=1e-3: <br>
  This is the learning rate, which is a hyperparameter that controls the step size at each iteration of the optimization process. The learning rate determines how much the model parameters will be updated based on the gradients of the loss function with respect to the parameters. A small learning rate means small updates, while a large learning rate means larger updates. The value 1e-3 is a shorthand notation for 0.001.


So, when you create the SGD optimizer as shown in the code snippet, it means you want to use Stochastic Gradient Descent to update the parameters of the model (an instance of the NeuralNetwork class) during the training process, and the learning rate is set to 0.001 (or 1e-3).

</details>


In [6]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and
backpropagates the prediction error to adjust the model's parameters.

<details>
  <summary> Explanation </summary>


This code defines a function called `train`, which is used to train a neural network model using a given dataloader, loss function, and optimizer. The function performs a single training epoch, which means it processes all the batches from the dataloader once to update the model's parameters.


Let's go through the code step by step:


1. `def train(dataloader, model, loss_fn, optimizer):`
   - The function `train` is defined, which takes four arguments: `dataloader`, `model`, `loss_fn`, and `optimizer`. These are the components needed for training the model.

2. `size = len(dataloader.dataset)`
   - The `size` variable is set to the total number of samples in the dataset that the dataloader loads. This is used to track the progress of training and to calculate the percentage of data processed.

3. `model.train()`
   - The `model.train()` call sets the model in training mode. This is necessary because some layers, such as dropout and batch normalization, behave differently during training and evaluation.

4. `for batch, (X, y) in enumerate(dataloader):`
   - The function iterates over batches in the dataloader. It uses the `enumerate` function to get both the batch index (`batch`) and the data (`X` and `y`) from the dataloader.

5. `X, y = X.to(device), y.to(device)`
   - The data `X` and the corresponding labels `y` are moved to the specified device (`device` was determined earlier, e.g., "cuda" for GPU or "cpu").

6. `pred = model(X)`
   - The model is used to make predictions (`pred`) on the input data `X`.

7. `loss = loss_fn(pred, y)`
   - The loss function (`loss_fn`) is applied to calculate the prediction error between the predicted values (`pred`) and the actual labels (`y`).

8. `loss.backward()`
   - Backpropagation is performed to compute the gradients of the loss with respect to the model's parameters.

9. `optimizer.step()`
   - The optimizer updates the model's parameters based on the computed gradients. This is the step where the model learns from the training data.

10. `optimizer.zero_grad()`
   - The gradients of the model's parameters are reset to zero. This step is necessary before computing the gradients for the next batch to avoid accumulation of gradients from previous batches.

11. `if batch % 100 == 0: ...`
   - This block of code prints the training progress every 100 batches. It displays the current loss and the number of processed samples out of the total dataset size.

In summary, the `train` function takes care of a single training epoch for a given model using a dataloader, loss function, and optimizer. It iterates through batches, makes predictions, calculates the loss, performs backpropagation, updates the model's parameters, and prints the training progress at regular intervals. The entire process is repeated for each epoch to train the model effectively.

</details>


In [7]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the model's performance against the test dataset to ensure it is learning.

<details>
  <summary>Explanation</summary>
  This code defines a function called `test`, which is used to evaluate the performance of a trained neural network model using a given dataloader and loss function. The function computes the test accuracy and average test loss over the entire test dataset.

Let's break down the code step by step:

1. `def test(dataloader, model, loss_fn):`
   - The function `test` is defined, which takes three arguments: `dataloader`, `model`, and `loss_fn`. These are the components needed for evaluating the model's performance on the test dataset.

2. `size = len(dataloader.dataset)`
   - The `size` variable is set to the total number of samples in the test dataset that the dataloader loads. This is used to calculate the test accuracy later.

3. `num_batches = len(dataloader)`
   - The `num_batches` variable is set to the total number of batches in the dataloader. This is used to calculate the average test loss later.

4. `model.eval()`
   - The `model.eval()` call sets the model in evaluation mode. This is necessary because some layers, such as dropout and batch normalization, behave differently during training and evaluation. Setting the model in evaluation mode disables dropout and batch normalization layers.

5. `test_loss, correct = 0, 0`
   - Two variables are initialized to keep track of the cumulative test loss and the number of correctly classified samples in the test dataset.

6. `with torch.no_grad():`
   - The code inside this block ensures that no gradients are computed during the evaluation. This is done to save memory and computation since gradients are not needed during the evaluation phase.

7. `for X, y in dataloader:`
   - The function iterates over batches in the dataloader, where `X` represents the input data, and `y` represents the corresponding labels.

8. `X, y = X.to(device), y.to(device)`
   - The data `X` and the corresponding labels `y` are moved to the specified device (`device` was determined earlier, e.g., "cuda" for GPU or "cpu").

9. `pred = model(X)`
   - The model is used to make predictions (`pred`) on the input data `X`.

10. `test_loss += loss_fn(pred, y).item()`
    - The loss function (`loss_fn`) is applied to calculate the test loss between the predicted values (`pred`) and the actual labels (`y`). The test loss is accumulated for all batches.

11. `correct += (pred.argmax(1) == y).type(torch.float).sum().item()`
    - The code compares the model's predicted class (obtained using `pred.argmax(1)`) to the true class labels (`y`). It counts the number of correctly classified samples in the batch and accumulates the count for all batches.

12. `test_loss /= num_batches`
    - The accumulated test loss is divided by the total number of batches (`num_batches`) to obtain the average test loss.

13. `correct /= size`
    - The accumulated count of correctly classified samples is divided by the total number of samples in the test dataset (`size`) to obtain the test accuracy.

14. `print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")`
    - The function prints the test accuracy and average test loss.

In summary, the `test` function evaluates the performance of a trained neural network model on the test dataset. It computes the test accuracy and average test loss by iterating over the test dataloader and making predictions using the model. The function does not update the model's parameters and is only used for evaluation purposes.

</details>



In [8]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (*epochs*). During each epoch, the model learns
parameters to make better predictions. We print the model's accuracy and loss at each epoch; we'd like to see the
accuracy increase and the loss decrease with every epoch.



In [9]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.313956  [   64/60000]
loss: 2.285758  [ 6464/60000]
loss: 2.273672  [12864/60000]
loss: 2.264206  [19264/60000]
loss: 2.230599  [25664/60000]
loss: 2.223581  [32064/60000]
loss: 2.218231  [38464/60000]
loss: 2.192645  [44864/60000]
loss: 2.182539  [51264/60000]
loss: 2.144359  [57664/60000]
Test Error: 
 Accuracy: 50.9%, Avg loss: 2.143307 

Epoch 2
-------------------------------
loss: 2.157951  [   64/60000]
loss: 2.136428  [ 6464/60000]
loss: 2.081563  [12864/60000]
loss: 2.098812  [19264/60000]
loss: 2.039306  [25664/60000]
loss: 1.988308  [32064/60000]
loss: 2.011313  [38464/60000]
loss: 1.934308  [44864/60000]
loss: 1.932187  [51264/60000]
loss: 1.853156  [57664/60000]
Test Error: 
 Accuracy: 56.0%, Avg loss: 1.857818 

Epoch 3
-------------------------------
loss: 1.896335  [   64/60000]
loss: 1.856865  [ 6464/60000]
loss: 1.738919  [12864/60000]
loss: 1.782669  [19264/60000]
loss: 1.677593  [25664/60000]
loss: 1.630099  [32064/600

Read more about [Training your model](optimization_tutorial.html).




--------------




## Saving Models
A common way to save a model is to serialize the internal state dictionary (containing the model parameters).



In [10]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


## Loading Models

The process for loading a model includes re-creating the model structure and loading
the state dictionary into it.



In [11]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

This model can now be used to make predictions.



In [12]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


Read more about [Saving & Loading your model](saveloadrun_tutorial.html).


