# Deep Learning With PyTorch

Using:
* [Deep Learning With PyTorch - Full Course by Patrick Loeber](https://www.youtube.com/watch?v=c36lUUr864M&t=5s)

### Imports

In [1]:
import torch
import torch.nn as nn
import numpy as np

In [2]:
import timeit

In [3]:
%pip install torchviz

Collecting torchviz
  Downloading torchviz-0.0.2.tar.gz (4.9 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: torchviz
  Building wheel for torchviz (setup.py) ... [?25l[?25hdone
  Created wheel for torchviz: filename=torchviz-0.0.2-py3-none-any.whl size=4131 sha256=0322fd83e1ba745b22ef7f20da699a7cc2ad8dc1e166cb4ff1d81ee783f4fd93
  Stored in directory: /root/.cache/pip/wheels/4c/97/88/a02973217949e0db0c9f4346d154085f4725f99c4f15a87094
Successfully built torchviz
Installing collected packages: torchviz
Successfully installed torchviz-0.0.2


In [4]:
# For NN Visualization
import torch
from torchviz import make_dot

# Chapter

## Tensors

An empty tensor (1d-tensor) equals a scalar value.

### Basic tensor operations

In [14]:
# Examples of tensors.
examples = [
    torch.empty(1),
    torch.empty(4),
    torch.empty(2, 2),
    torch.ones(1, 2, 3, dtype=torch.float16),
    torch.zeros(3, 5),
]
for x in examples:
    print(x, end="\n\n")

tensor([-3.0238e-18])

tensor([-3.0238e-18,  4.3341e-41, -3.0238e-18,  4.3341e-41])

tensor([[-1.8110e+17,  3.1608e-41],
        [ 7.8268e-03,  0.0000e+00]])

tensor([[[1., 1., 1.],
         [1., 1., 1.]]], dtype=torch.float16)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])



Whenever possible use the torch functions to operate on tensors, since they provied significant time benefits.

Here a demonstration on addition:

In [15]:
# Generate random samples of tuples of two 2x2 tensors.
samples = [(torch.rand(2,2), torch.rand(2,2)) for _ in range(1000)]

In [16]:
%timeit for x, y in samples: x + y

2.2 ms ± 338 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [17]:
%timeit for x, y in samples: torch.add(x, y)

2.19 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Functions with a tailing underscore can be used to do inplace operations. E.g. `x.add_(y)` as `x += y` equivalent.

In [18]:
%timeit for x, y in samples: x.add_(y)

1.27 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Overview of basic tensor functions:
- `torch.rand()`
- `torch.add()`
- `torch.sub()`
- `torch.mul()`

### Resizing tensors

In [20]:
x = torch.rand(4,4)
print(x)

tensor([[0.5909, 0.5356, 0.0767, 0.4340],
        [0.0711, 0.6423, 0.0419, 0.9317],
        [0.9207, 0.4962, 0.1566, 0.3799],
        [0.8482, 0.6867, 0.8395, 0.6020]])


In [22]:
# Concatenate all rows to one 1D-tensor.
y0 = x.view(16)
print(y0, end="\n\n")
# This is the same as using `flatten()`.
y1 = x.flatten()
print(y1)

tensor([0.5909, 0.5356, 0.0767, 0.4340, 0.0711, 0.6423, 0.0419, 0.9317, 0.9207,
        0.4962, 0.1566, 0.3799, 0.8482, 0.6867, 0.8395, 0.6020])

tensor([0.5909, 0.5356, 0.0767, 0.4340, 0.0711, 0.6423, 0.0419, 0.9317, 0.9207,
        0.4962, 0.1566, 0.3799, 0.8482, 0.6867, 0.8395, 0.6020])


In [18]:
# One can use the dummy value '-1' to let pytorch determine the second dimension.
y = x.view(-1, 8)
print(y)
print(y.size())

tensor([[0.0173, 0.5829, 0.3781, 0.1291, 0.9226, 0.2438, 0.1266, 0.0506],
        [0.1893, 0.3049, 0.2081, 0.1130, 0.0461, 0.3080, 0.5791, 0.3032]])
torch.Size([2, 8])


### Tensor conversions

Note that if the tenor is stored on the CPU, this will be a shallow copy (a reference and not a true copy).

In [35]:
# Instantiate a tensor on the cpu (default place).
a = torch.ones(5)
print(f"{a = }\t{type(a)}")

# Create a copy of a torch tensor by calling the numpy() function.
b = a.numpy()
print(f"{b = }\t\t{type(b)}")
# Create a copy of the numpy array from_numpy() function.
c = torch.from_numpy(b)
print(f"{c = }\t{type(c)}")

print()

# Changing the values in one variable, changes all since these are only a references.
a.add_(1)
print(f"{a = }")
print(f"{b = }")
print(f"{c = }")

a = tensor([1., 1., 1., 1., 1.])	<class 'torch.Tensor'>
b = array([1., 1., 1., 1., 1.], dtype=float32)		<class 'numpy.ndarray'>
c = tensor([1., 1., 1., 1., 1.])	<class 'torch.Tensor'>

a = tensor([2., 2., 2., 2., 2.])
b = array([2., 2., 2., 2., 2.], dtype=float32)
c = tensor([2., 2., 2., 2., 2.])


### Using a GPU

In [29]:
assert torch.cuda.is_available()

# Initialize the GPU as cuda device.
device = torch.device("cuda")
device

device(type='cuda')

When storing a tensor on the GPU, copies of it will be deep copies (an independent instance, a true copy).

In [39]:
# Instantiate a variable on the CPU and move it to be stored on the GPU.
a = torch.ones(5).to(device)
b = a.cpu().numpy()
print(f"{a = }")
print(f"{b = }")

print()

a.add_(torch.ones(5, device=device))
# Print the variable values whild on GPU.
print(f"{a = }")
print(f"{b = }")
# Print the variable values whild on CPU.
print(f"{a.cpu() = }")

a = tensor([1., 1., 1., 1., 1.], device='cuda:0')
b = array([1., 1., 1., 1., 1.], dtype=float32)

a = tensor([2., 2., 2., 2., 2.], device='cuda:0')
b = array([1., 1., 1., 1., 1.], dtype=float32)
a.cpu() = tensor([2., 2., 2., 2., 2.])


## Gradients with Autograd

If needed later, one can specify that a gradient will need to be computed for a specific variable:

In [22]:
x = torch.randn(5, requires_grad=True)
x

tensor([-0.6373,  0.1020,  0.0279, -0.2443,  1.7219], requires_grad=True)

In [23]:
# Note that Pytorch tracs the required operations for the Backpropagation.
y = x + 2
y

tensor([1.3627, 2.1020, 2.0279, 1.7557, 3.7219], grad_fn=<AddBackward0>)

In [24]:
z = y * y * 2
z

tensor([ 3.7141,  8.8370,  8.2251,  6.1648, 27.7045], grad_fn=<MulBackward0>)

In [25]:
z_scalar = z.mean()
z_scalar

tensor(10.9291, grad_fn=<MeanBackward0>)

In [26]:
print(x.grad)
# Trigger the backward pass by calling
z_scalar.backward() # dz/dx
# Now the gradient has been computed and can be used.
print(x.grad)

None
tensor([1.0902, 1.6816, 1.6224, 1.4045, 2.9775])


In [27]:
x = torch.randn(3, requires_grad=True)
z = x.add(2)
z.mul_(3)

# Now the output is not a scalar value but a vector. To trigger the backwards pass, another vector is needed.
v = torch.tensor([.1, 1., .001], dtype=torch.float32)
assert x.size() == v.size(), "The vector must have the same dimension as the input dimension!"

z.backward(v) # dz/dx
print(x.grad)

tensor([0.3000, 3.0000, 0.0030])


There are several options to turn of the `requires_grad`-flag:
* `x.requires_grad_(False)`
* `x.detach()`
* `with torch.no_grad()`

In [28]:
x = torch.randn(3, requires_grad=True)
print(x)
x.requires_grad_(False)
print(x)

tensor([0.4007, 0.5235, 0.3513], requires_grad=True)
tensor([0.4007, 0.5235, 0.3513])


In [29]:
x = torch.randn(3, requires_grad=True)
print(x)
y = x.detach()
print(x)
print(y)

tensor([ 1.3854,  1.1474, -0.5296], requires_grad=True)
tensor([ 1.3854,  1.1474, -0.5296], requires_grad=True)
tensor([ 1.3854,  1.1474, -0.5296])


In [40]:
x = torch.randn(3, requires_grad=True)
print(x)
with torch.no_grad():
    y = x.add(2)
    print(f"torch.no_grad - x\t{x}")
    print(f"torch.no_grad - y\t{y}")

z = x.add(2)
print(f"x\t\t\t{x}")
print(f"z\t\t\t{z}")

tensor([-0.7027, -0.3157,  0.0259], requires_grad=True)
torch.no_grad - x	tensor([-0.7027, -0.3157,  0.0259], requires_grad=True)
torch.no_grad - y	tensor([1.2973, 1.6843, 2.0259])
x			tensor([-0.7027, -0.3157,  0.0259], requires_grad=True)
z			tensor([1.2973, 1.6843, 2.0259], grad_fn=<AddBackward0>)


## Tiny NN example

### Gradients by hand

In [51]:
# f = w * x
# f = 2 * x
X_test: float = 5.
Y_test: float = 2 * X_test
X = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
Y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)
w = .0

# Training
learning_rate = 0.01
nr_epochs = 20

# Model prediction.
def forward(x: torch.tensor) -> torch.tensor:
    return torch.mul(x, w)

# Loss function: MSE
def loss_mse(y: torch.tensor, y_pred: torch.tensor) -> torch.tensor:
    return ((torch.sub(y, y_pred))**2).mean()

# Gradient.
# MSE = 1/N * (w*x - y)**2
# dJ/dw = 1/N 2x (x*w - y)
def gradient(x: torch.tensor, y: torch.tensor, y_predicted: torch.tensor) -> torch.tensor:
    return np.dot(torch.mul(x, 2), torch.sub(y_predicted, y)).mean()

print(f"Prediction before training: f({X_test}) = {forward(X_test):.3f}")

for epoch in range(nr_epochs):
    # Prediction from the forward pass.
    y_pred = forward(X)

    # Loss computation.
    l = loss_mse(Y, y_pred)

    # Gradient computation.
    dw = gradient(X, Y, y_pred)

    # Update the weights.
    w -= torch.mul(dw, learning_rate)

    if epoch % (nr_epochs//10) == 0:
        print(f"[Epoch {epoch + 1}/{nr_epochs}]:\tw = {w:.3f},\tmse_loss = {l:.8f}")


print(f"Prediction after training: f({X_test}) = {forward(X_test):.3f} (solution f({X_test})={Y_test})")

Prediction before training: f(5.0) = 0.000
[Epoch 1/20]:	w = 1.200,	mse_loss = 30.00000000
[Epoch 3/20]:	w = 1.872,	mse_loss = 0.76800019
[Epoch 5/20]:	w = 1.980,	mse_loss = 0.01966083
[Epoch 7/20]:	w = 1.997,	mse_loss = 0.00050332
[Epoch 9/20]:	w = 1.999,	mse_loss = 0.00001288
[Epoch 11/20]:	w = 2.000,	mse_loss = 0.00000033
[Epoch 13/20]:	w = 2.000,	mse_loss = 0.00000001
[Epoch 15/20]:	w = 2.000,	mse_loss = 0.00000000
[Epoch 17/20]:	w = 2.000,	mse_loss = 0.00000000
[Epoch 19/20]:	w = 2.000,	mse_loss = 0.00000000
Prediction after training: f(5.0) = 10.000 (solution f(5.0)=10.0)


### Gradients by Autograd

In [64]:
# f = w * x
# f = 2 * x
X_test: float = 5.
Y_test: float = 2 * X_test
X = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
Y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)

w = torch.tensor(.0, dtype=torch.float32, requires_grad=True)

# Training
learning_rate = 0.01
nr_epochs = 100

# Model prediction.
def forward(x: torch.tensor) -> torch.tensor:
    return torch.mul(x, w)

# Loss function: MSE
def loss_mse(y: torch.tensor, y_pred: torch.tensor) -> torch.tensor:
    return ((torch.sub(y, y_pred))**2).mean()

print(f"Prediction before training: f({X_test}) = {forward(X_test):.3f}")

for epoch in range(nr_epochs):
    # Prediction from the forward pass.
    y_pred = forward(X)

    # Loss computation.
    l = loss_mse(Y, y_pred)

    # Gradient computation. Backward pass.
    l.backward() # dl/dw

    # Update the weights.
    with torch.no_grad():
        w -= torch.mul(w.grad, learning_rate)

    # Reset the gradients.
    w.grad.zero_()

    if epoch % (nr_epochs // 10) == 0:
        print(f"[Epoch {epoch + 1}/{nr_epochs}]:\tw = {w:.3f},\tmse-loss = {l:.8f}")


print(f"Prediction after training: f({X_test}) = {forward(X_test):.3f} (solution f({X_test})={Y_test})")

Prediction before training: f(5.0) = 0.000
[Epoch 1/100]:	w = 0.300,	mse-loss = 30.00000000
[Epoch 11/100]:	w = 1.665,	mse-loss = 1.16278565
[Epoch 21/100]:	w = 1.934,	mse-loss = 0.04506890
[Epoch 31/100]:	w = 1.987,	mse-loss = 0.00174685
[Epoch 41/100]:	w = 1.997,	mse-loss = 0.00006770
[Epoch 51/100]:	w = 1.999,	mse-loss = 0.00000262
[Epoch 61/100]:	w = 2.000,	mse-loss = 0.00000010
[Epoch 71/100]:	w = 2.000,	mse-loss = 0.00000000
[Epoch 81/100]:	w = 2.000,	mse-loss = 0.00000000
[Epoch 91/100]:	w = 2.000,	mse-loss = 0.00000000
Prediction after training: f(5.0) = 10.000 (solution f(5.0)=10.0)


### Using Torch.NN

In [41]:
# 1) Design the model (input, output size, forward pass)
# 2) Construct the loss and optimizer
# 3) Training loop:
# - forward pass: compute prediction
# - backward pass: gradients
# - update weights

In [65]:
X_test: torch.tensor = torch.tensor([5], dtype=torch.float32)
Y_test: torch.tensor = torch.mul(X_test, 2)
X = torch.tensor([[1], [2], [3], [4]], dtype=torch.float32)
Y = torch.tensor([[2], [4], [6], [8]], dtype=torch.float32)
w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

n_samples, n_features = X.shape
input_size  = n_features
output_size = n_features

# Hyperparameters.
learning_rate = 0.01
nr_epochs = 100

# Model definition.
model = nn.Linear(input_size, output_size)

print(f"Prediction before training: f({X_test}) = {model(X_test).item():.3f}")

# Loss definition.
loss = nn.MSELoss()
# Optimizer definition.
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for epoch in range(nr_epochs):
    # Prediction from the forward pass.
    y_pred = model(X)

    # Loss computation.
    l = loss(Y, y_pred)

    # Gradient computation. Backward pass.
    l.backward() # dl/dw

    # Update the weights.
    optimizer.step()

    # Reset the gradients.
    optimizer.zero_grad()

    if epoch % (nr_epochs // 10) == 0:
        print(f"Epoch {epoch + 1}: w = {w:.3f}, loss = {l:.8f}")

print(f"Prediction after training: f({X_test}) = {model(X_test).item():.3f} (solution f({X_test})={Y_test})")

Prediction before training: f(tensor([5.])) = -0.130
Epoch 1: w = 0.000, loss = 28.13534546
Epoch 11: w = 0.000, loss = 0.97767603
Epoch 21: w = 0.000, loss = 0.26050517
Epoch 31: w = 0.000, loss = 0.22825977
Epoch 41: w = 0.000, loss = 0.21453196
Epoch 51: w = 0.000, loss = 0.20203356
Epoch 61: w = 0.000, loss = 0.19027387
Epoch 71: w = 0.000, loss = 0.17919886
Epoch 81: w = 0.000, loss = 0.16876847
Epoch 91: w = 0.000, loss = 0.15894532
Prediction after training: f(tensor([5.])) = 9.337 (solution f(tensor([5.]))=tensor([10.]))


MIN: 1:24:06

# NEXT CHAPTER ...

In [None]:
model_graph = draw_graph(resnet18(), input_size=(1,3,224,224), expand_nested=True)
model_graph.visual_graph

In [None]:
# CONTINUE WITH: https://youtu.be/c36lUUr864M?si=Ed3ezOjp2TDZD6Ev&t=4899
