## PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs

* Designed to cover the essential topics of the deep leanring libray PyTorch.
* Link to course by Sebastian Raschka: https://sebastianraschka.com/teaching/pytorch-1h/


## Components of PyTorch

PyTorch is developed by Facebook’s AI Research lab (FAIR, now called Meta AI).

* A tensor library that extends the concept of array oriented programming library Numpy.
*  PyTorch is an automatic differentiation engine, also known as autograd, which enables the automatic computation of gradients.
* PyTorch is a deep learning library, meaning that it offers modular, flexible, and efficient building blocks.

## Scalars, vectors, matrices, and tensors

In [None]:
import torch

# create a 0D tensor (Scalar)
tensor0d = torch.tensor(1)

# create a 1D tensor (vector)
tendor1d = torch.tensor([1, 2, 3])

# create a 2D tensor (matrix)
tensor2d = torch.tensor([[1, 2],[3, 4]])

# create a 3d tensor
tensor3d = torch.tensor([
    [1, 2],
    [3, 4],
    [5, 6],
    [7, 8]
])

In [None]:
torch.__version__

'2.8.0+cu126'

## Tensor data types

In [None]:
# pytorch adopts the default 64-bit integer, which we can access with `.dtype`
tensor1d = torch.tensor([1, 2, 3])
print(tensor1d.dtype)

torch.int64


In [None]:
# we can create tensors from floats, with a 32-bit precision by default
floatvec = torch.tensor([1.0, 2.0, 3.0])
print(floatvec.dtype)

torch.float32


In [None]:
# we can change a dtype with the `.to` method
floatvec = tensor1d.to(torch.float32)
print(floatvec.dtype)

torch.float32


## Common PyTorch tensor operations

In [None]:
tensor2d = torch.tensor([
    [1, 2, 3],
    [4, 5, 6]
])
print(tensor2d)

tensor([[1, 2, 3],
        [4, 5, 6]])


In [None]:
# .shape attribute allows to acces the shape of a tensor
print(tensor2d.shape)
# this return [2,3] tensor has 2 rows & 3 columns.

torch.Size([2, 3])


In [None]:
# we can use .reshape to reshape a tensor
tensor2d.reshape(3, 2)

tensor([[1, 2],
        [3, 4],
        [5, 6]])

In [None]:
# we can use .T to transpose a tensor, flipping it across its diagnol
tensor2d.T

tensor([[1, 4],
        [2, 5],
        [3, 6]])

In [None]:
# multiply two matrices use the .matmul method
tensor2d.matmul(tensor2d.T)

tensor([[14, 32],
        [32, 77]])

In [None]:
# or use the @ operator
tensor2d @ tensor2d.T

tensor([[14, 32],
        [32, 77]])

## Seeing models as computation graphs

In [None]:
import torch.nn.functional as F

# The forward pass of a logistic regression classifier (single layer neural network).
# returning a score between 0 and 1 that is compared to the true label (0 or 1) when computing the loss

y = torch.tensor([1.0])  # True Label
x1 = torch.tensor([1.1]) # Input Feature
w1 = torch.tensor([2.2]) # Weight parameter
b = torch.tensor([0.0])  # Bias unit

z = x1 * w1 * b          # net input
a = torch.sigmoid(z)     # activation and output

loss = F.binary_cross_entropy(a, y)
print(loss)

tensor(0.6931)


**Automatic Differentiation (Autograd)**
By tracking every operation performed on a tensor, PyTorchs autograd engine constructs a computational graph in the background. Then calling the grad function, we can compute the gradient of the loss with respects to model parameter w1.

In [None]:
from torch.autograd import grad

y = torch.tensor([1.0])
x1 = torch.tensor([1.1])
w1 = torch.tensor([2.2], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)

z = x1 * w1 * b
a = torch.sigmoid(z)

loss = F.binary_cross_entropy(a, y)

grad_L_w1 = grad(loss, w1, retain_graph=True) # Graph is usually destroyed after calculating the gradient to save memory, so we set retain_graph.
grad_L_b = grad(loss, b, retain_graph=True)

print(grad_L_w1)
print(grad_L_b)


(tensor([-0.]),)
(tensor([-1.2100]),)


In [None]:
# We can call '.backward()' on the loss, and PyTorch will compute the gradients of all leaf nodes on the graph.
loss.backward()
print(w1.grad)
print(b.grad)


tensor([-0.])
tensor([-1.2100])


## Multilayer Neural Network

The following code implemets a classic multilayer perceptron with two hidden layers, to illustrate typical ussage of the Module class.

In [None]:
class NeuralNetwork(torch.nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super().__init__()

        self.layers = torch.nn.Sequential(

            # 1st hidden layer
            torch.nn.Linear(num_inputs, 30),
            torch.nn.ReLU(),

            # 2nd hidden layer
            torch.nn.Linear(30, 20),
            torch.nn.ReLU(),

            # Output layer
            torch.nn.Linear(20, num_outputs),
        )

    def forward(self, x):
        logits = self.layers(x)
        return logits

In [None]:
model = NeuralNetwork(50, 3)

In [None]:
print(model)

NeuralNetwork(
  (layers): Sequential(
    (0): Linear(in_features=50, out_features=30, bias=True)
    (1): ReLU()
    (2): Linear(in_features=30, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
  )
)


Note that each parameters for which requiers_grad=True counts as a trainable parameter and will be updated during training.

In [None]:
# Check the number of parameters
num_params = sum(
    p.numel() for p in model.parameters() if p.requires_grad
)
print("Total number of trainable parameters:", num_params)

Total number of trainable parameters: 2213


In [None]:
# Access the weight parameter matrix at position 0
print(model.layers[0].weight)
print(model.layers[0].weight.shape)

Parameter containing:
tensor([[-0.0444,  0.0189, -0.0280,  ..., -0.0830,  0.0253, -0.1358],
        [-0.0435, -0.0173, -0.0741,  ...,  0.1295, -0.0751,  0.0326],
        [-0.1110,  0.1210,  0.0807,  ..., -0.0691,  0.0217, -0.0458],
        ...,
        [ 0.0943,  0.1294,  0.1368,  ...,  0.0536,  0.0618, -0.0925],
        [-0.1207,  0.0586, -0.1242,  ...,  0.1047,  0.1116, -0.1241],
        [ 0.0664,  0.0722,  0.0648,  ..., -0.1105,  0.1012,  0.0596]],
       requires_grad=True)
torch.Size([30, 50])


Lets now see how its used via the forward pass.

In [None]:
torch.manual_seed(123)

X = torch.rand((1, 50))
out = model(X)
print(out)

tensor([[-0.1207, -0.3436, -0.1243]], grad_fn=<AddmmBackward0>)


Here, grad_fn=<AddmmBackward0> represents the last-used function to compute a variable in the computational graph. Means that the tensor we are inspecting was created via a matrix multiplication and addition operation. PyTorch will use this information when it computes gradients during backpropagation.

If we just want to use a network without training or backpropagation, for example, if we use it for prediction after training, constructing this computational graph for backpropagation can be wasteful as it performs unnecessary computations and consumes additional memory.

In [None]:
with torch.no_grad():
    out = model(X)
print(out)

tensor([[-0.1207, -0.3436, -0.1243]])


In PyTorch, it’s common practice to code models such that they return the outputs of the last layer (logits) without passing them to a nonlinear activation function.
That’s because PyTorch’s commonly used loss functions combine the softmax.

we have to call the softmax function explicitly.
The values can now be interpreted as class-membership probabilities that sum up to 1.

In [None]:
with torch.no_grad():
    out = torch.softmax(model(X), dim=1)
print(out)

tensor([[0.3576, 0.2861, 0.3563]])


## Data Loaders

Lets start by creating a simple toy dataset of five training examples with two features, we will also create a tensor containing the corresponding class labels

In [None]:
X_train = torch.tensor([
    [-1.2, 3.1],
    [-0.9, 2.9],
    [-0.5, 2.6],
    [2.3, -1.1],
    [2.7, -1.5]
])

Y_train = torch.tensor([0, 0, 0, 1, 1])

In [None]:
X_test = torch.tensor([
    [-0.8, 2.8],
    [2.6, -1.6]
])

Y_test = torch.tensor([0, 1])

Next, we create a custom dataset class, by subclassing from PyTorch’s Dataset parent class.

In [None]:
from torch.utils.data import Dataset

class ToyDataset(Dataset):
    def __init__(self, X, Y):
         self.features = X
         self.labels = Y

    def __getitem__(self, index):
        one_x = self.features[index]
        one_y = self.labels[index]
        return one_x, one_y

    def __len__(self):
        return self.labels.shape[0]

train_ds = ToyDataset(X_train, Y_train)
test_ds = ToyDataset(X_test, Y_test)

Now that we defined a PyTorch Dataset class, we can use PyTorch's DataLoader class to sample from it.

In [None]:
from torch.utils.data import DataLoader
torch.manual_seed(123)

train_loader = DataLoader(
    dataset = train_ds,
    batch_size=2,
    shuffle=True,
    num_workers = 0,
    drop_last=True
)

test_loader = DataLoader(
    dataset=test_ds,
    batch_size=2,
    shuffle=False,
    num_workers=0
)

We have five training examples thats no evenly divisable by 2.
So its recommended to drop the last batch in this example as the 3rd batch will only contain one example.

In [None]:
for idx, (X, Y) in enumerate(train_loader):
    print(f"Batch {idx+1}:", X, Y)

Batch 1: tensor([[ 2.3000, -1.1000],
        [-0.9000,  2.9000]]) tensor([1, 0])
Batch 2: tensor([[-1.2000,  3.1000],
        [-0.5000,  2.6000]]) tensor([0, 0])


## Training Loop

In [None]:
import torch.nn.functional as F

torch.manual_seed(123)
model = NeuralNetwork(num_inputs=2, num_outputs=2)
optimizer = torch.optim.SGD(model.parameters(), lr=0.5) # stochastic gradient descent

num_epochs = 3

for epoch in range(num_epochs):

    model.train()
    for batch_idx, (features, labels) in enumerate(train_loader):

        logits = model(features)

        loss = F.cross_entropy(logits, labels) # Loss Funtion which will apply the softmax

        optimizer.zero_grad() # zero the gradient so they don't accumulate
        loss.backward()
        optimizer.step() # use the gradients to update the model parameters to minimize the loss

        ### Logging
        print(f"Epoch: {epoch+1:03d}/{num_epochs:03d}"
              f" | Batch {batch_idx:03d}/{len(train_loader):03d}"
              f" | Train/Val Loss {loss:.2f}")

    model.eval()

Epoch: 001/003 | Batch 000/002 | Train/Val Loss 0.75
Epoch: 001/003 | Batch 001/002 | Train/Val Loss 0.65
Epoch: 002/003 | Batch 000/002 | Train/Val Loss 0.44
Epoch: 002/003 | Batch 001/002 | Train/Val Loss 0.13
Epoch: 003/003 | Batch 000/002 | Train/Val Loss 0.03
Epoch: 003/003 | Batch 001/002 | Train/Val Loss 0.00


In [None]:
# Print the models outputs as raw logits
with torch.no_grad():
    outputs = model(X_train)
print(outputs)

tensor([[ 2.8569, -4.1618],
        [ 2.5382, -3.7548],
        [ 2.0944, -3.1820],
        [-1.4814,  1.4816],
        [-1.7176,  1.7342]])


In [None]:
# Use PyTorch's softmax function to turn them into probabilites
# torch.set_printoptions(sci_mode=False)
probas = torch.softmax(outputs, dim=1)
print(probas)

tensor([[    0.9991,     0.0009],
        [    0.9982,     0.0018],
        [    0.9949,     0.0051],
        [    0.0491,     0.9509],
        [    0.0307,     0.9693]])


The 1st training example has a 99.91% probability of belonging to class 0 and a 0.09% probability of beloging to class 1.

In [None]:
# We can convert them into class labels using the argmax function
predictions = torch.argmax(probas, dim=1)
print(predictions)

tensor([0, 0, 0, 1, 1])
