# PyTorch Crash Course

#### Overview:

1. Tensor Basics
  - Create, Operations, NumPy, GPU Support
2. Autograd
  - Linear regression example
3. Training Loop with: Model, Loss & Optimizer
  - A typical PyTorch training pipeline
4. Neural Network
  - Also: GPU, Datasets, DataLoader, Transforms & Evaluation
5. Convolutional Neural Network
  - Also: Save/Load model

Created by [AssemblyAI](https://www.assemblyai.com)

Watch the video:

 [![Alt text](https://img.youtube.com/vi/OIenNRt2bjg/hqdefault.jpg)](https://youtu.be/mYUyaKmvu6Y)

## 1. Tensors

Everything in PyTorch is based on Tensor operations. A Tensor is a multi-dimensional matrix containing elements of a single data type:


In [None]:
import torch

# torch.empty(size): uninitiallized
x = torch.empty(1) # scalar
print("empty(1):", x, "\n")
x = torch.empty(3) # vector
print("empty(3):",x,"\n")
x = torch.empty(2, 3) # matrix
print("empty(2,3):",x, "\n")
x = torch.empty(2, 2, 3) # tensor, 3 dimensions
#x = torch.empty(2,2,2,3) # tensor, 4 dimensions
print("empty(2, 2, 3):",x, "\n")

# torch.rand(size): random numbers [0, 1]
x = torch.rand(5, 3)
print("rand(5,3):", x,"\n")

# torch.zeros(size), fill with 0
# torch.ones(size), fill with 1
x = torch.zeros(5, 3)
print("zeros(5,3):", x)

empty(1): tensor([3.3631e-44]) 

empty(3): tensor([4.5300e-26, 3.0979e-41, 3.7614e-14]) 

empty(2,3): tensor([[3.8702e+29, 3.0983e-41, 3.8709e+29],
        [3.0983e-41, 0.0000e+00, 0.0000e+00]]) 

empty(2, 2, 3): tensor([[[1.8626e+27, 3.0983e-41, 3.8697e+29],
         [3.0983e-41, 4.0357e-43, 0.0000e+00]],

        [[1.5835e-43, 0.0000e+00, 3.8709e+29],
         [3.0983e-41, 3.7804e+35, 4.5625e-41]]]) 

rand(5,3): tensor([[0.9555, 0.8833, 0.5758],
        [0.7426, 0.9859, 0.1000],
        [0.1440, 0.4261, 0.2612],
        [0.1387, 0.9501, 0.7602],
        [0.2124, 0.9570, 0.2540]]) 

zeros(5,3): tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


In [None]:
# check size
print("size", x.size())  # x.size(0)
print("shape", x.shape)  # x.shape[0]

size torch.Size([5, 3])
shape torch.Size([5, 3])


In [None]:
# check data type
print(x.dtype)

# specify types, float32 default
x = torch.zeros(5, 3, dtype=torch.float16)
print(x)

# check type
print(x.dtype)

torch.float32
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float16)
torch.float16


In [None]:
# construct from data
x = torch.tensor([5.5, 3])
print(x, x.dtype)

tensor([5.5000, 3.0000]) torch.float32


In [None]:
# requires_grad argument
# This will tell pytorch that it will need to calculate the gradients for this tensor
# later in your optimization steps
# i.e. this is a variable in your model that you want to optimize
x = torch.tensor([5.5, 3], requires_grad=True)
print(x)

tensor([5.5000, 3.0000], requires_grad=True)


#### Operations with Tensors

In [None]:
# Operations
x = torch.ones(2, 2)
y = torch.rand(2, 2)

# elementwise addition
z = x + y
# torch.add(x,y)

# in place addition, everythin with a trailing underscore is an inplace operation
# i.e. it will modify the variable
# y.add_(x)

print(x)
print(y)
print(z)

tensor([[1., 1.],
        [1., 1.]])
tensor([[0.7644, 0.0808],
        [0.7845, 0.3720]])
tensor([[1.7644, 1.0808],
        [1.7845, 1.3720]])


In [None]:
# subtraction
z = x - y
z = torch.sub(x, y)

# multiplication
z = x * y
z = torch.mul(x,y)

# division
z = x / y
z = torch.div(x,y)

In [None]:
# Slicing
x = torch.rand(5,3)
print(x)
print("x[:, 0]", x[:, 0]) # all rows, column 0
print("x[1, :]", x[1, :]) # row 1, all columns
print("x[1, 1]", x[1,1]) # element at 1, 1

# Get the actual value if only 1 element in your tensor
print("x[1,1].item()", x[1,1].item())

tensor([[0.7047, 0.2369, 0.7886],
        [0.9491, 0.2097, 0.8007],
        [0.2831, 0.6038, 0.0603],
        [0.9562, 0.0871, 0.5953],
        [0.8356, 0.6364, 0.6272]])
x[:, 0] tensor([0.7047, 0.9491, 0.2831, 0.9562, 0.8356])
x[1, :] tensor([0.9491, 0.2097, 0.8007])
x[1, 1] tensor(0.2097)
x[1,1].item() 0.20971983671188354


In [None]:
# Reshape with torch.view()
x = torch.randn(4, 4)

'''Reshapes x into a 1D tensor y with 16 elements. The view(16) command flattens
the 4x4 tensor into a single dimension with 16 elements.
'''
y = x.view(16)

'''Reshapes x into a 2D tensor z with 2 rows and 8 columns. The -1 tells PyTorch
to automatically infer the size of this dimension based on the other dimensions.
Here, since x has 16 elements in total, and one dimension is specified as 8, the
other dimension must be 2 (because 2 * 8 = 16).
'''
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
# if -1 it pytorch will automatically determine the necessary size
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


#### NumPy

Converting a Torch Tensor to a NumPy array and vice versa is very easy

In [None]:
a = torch.ones(5)
print(a)

# torch to numpy with .numpy()
b = a.numpy()
print(b)
print(type(b))

tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
<class 'numpy.ndarray'>


In [None]:
'''Careful: If the Tensor is on the CPU (not the GPU),
both objects will share the same memory location, so changing onewill also
change the other
'''
a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


In [None]:
# numpy to torch with .from_numpy(x), or torch.tensor() to copy it
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
c = torch.tensor(a)
print(a)
print(b)
print(c)

# again be careful when modifying
a += 1
print(a)
print(b)
print(c)
'''torch.from_numpy() creates a tensor that shares memory with the original NumPy array. Changes to one will reflect in the other.
torch.tensor() creates a new tensor that is independent of the original NumPy array, meaning changes to one do not affect the other.
'''

[1. 1. 1. 1. 1.]
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)


'torch.from_numpy() creates a tensor that shares memory with the original NumPy array. Changes to one will reflect in the other.\ntorch.tensor() creates a new tensor that is independent of the original NumPy array, meaning changes to one do not affect the other.\n'

#### GPU Support

By default all tensors are created on the CPU. But we can also move them to the GPU (if it's available ), or create them directly on the GPU. This is important when working with neural networks, especially for training models on GPUs, which are much faster for certain types of computations.

In [None]:
'''
torch.cuda.is_available(): This checks if a CUDA-capable GPU is available.
If a GPU is available, device will be set to 'cuda'. Otherwise, it will default
to 'cpu'.
'''
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

x = torch.rand(2,2).to(device)  # move tensors to GPU device
#x = x.to("cpu")
#x = x.to("cuda")

x = torch.rand(2,2, device=device)  # or directy create them on GPU

## 2. Autograd

The autograd package provides automatic differentiation for all operations on Tensors. Generally speaking, *torch.autograd* is an engine for computing the vector-Jacobian product. It computes partial derivates while applying the chain rule.

Set `requires_grad = True`:

In [None]:
import torch

# requires_grad = True -> tracks all operations on the tensor.
x = torch.randn(3, requires_grad=True)
y = x + 2

# y was created as a result of an operation, so it has a grad_fn attribute.
# grad_fn: references a Function that has created the Tensor
print(x) # created by the user -> grad_fn is None
print(y)
print(y.grad_fn)

tensor([ 1.3250, -1.6768, -1.2402], requires_grad=True)
tensor([3.3250, 0.3232, 0.7598], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7c7d2a04fd60>


In [None]:
# Do more operations on y
z = y * y * 3
print(z)
z = z.mean()
print(z)

tensor([33.1662,  0.3133,  1.7318], grad_fn=<MulBackward0>)
tensor(11.7371, grad_fn=<MeanBackward0>)


In [None]:
# Let's compute the gradients with backpropagation
# When we finish our computation we can call .backward() and have all the gradients computed automatically.
# The gradient for this tensor will be accumulated into .grad attribute.
# It is the partial derivate of the function w.r.t. the tensor

print(x.grad)
z.backward()
print(x.grad) # dz/dx

# !!! Careful!!! backward() accumulates the gradient for this tensor into .grad attribute.
# !!! We need to be careful during optimization !!! optimizer.zero_grad()

None
tensor([6.6499, 0.6463, 1.5195])


In [None]:
import torch
x = torch.randn(3, requires_grad=True)
y = x + 2

z = y * y * 3
z = z.mean()

print(x.grad)
z.backward()
print(x.grad)

None
tensor([2.6932, 3.3025, 0.1415])


##Autograd more tutorial

In [None]:
import torch
weights = torch.ones(4, requires_grad = True)
print (weights)

tensor([1., 1., 1., 1.], requires_grad=True)


In [None]:
import torch
weights = torch.ones(4, requires_grad = True)
print (weights)
for a in range(1):
  model_output = (weights*3).sum()
  print (model_output)

  model_output.backward()
  print(weights.grad)

tensor([1., 1., 1., 1.], requires_grad=True)
tensor(12., grad_fn=<SumBackward0>)
tensor([3., 3., 3., 3.])


### Stop a tensor from tracking history:
For example during the training loop when we want to update our weights, or after training during evaluation. These operations should not be part of the gradient computation. To prevent this, we can use:

- `x.requires_grad_(False)`
- `x.detach()`
- wrap in `with torch.no_grad():`

In [None]:
# .requires_grad_(...) changes an existing flag in-place.
a = torch.randn(2, 2)
b = (a * a).sum()
print(a.requires_grad)
print(b.grad_fn)

a.requires_grad_(True)
b = (a * a).sum()
print(a.requires_grad)
print(b.grad_fn)

False
None
True
<SumBackward0 object at 0x7b4bf582ae90>


In [None]:
# .detach(): get a new Tensor with the same content but no gradient computation:
a = torch.randn(2, 2, requires_grad=True)
b = a.detach()
print(a.requires_grad)
print(b.requires_grad)

True
False


In [None]:
# wrap in 'with torch.no_grad():'
a = torch.randn(2, 2, requires_grad=True)
print(a.requires_grad)
with torch.no_grad():
    b = a ** 2
    print(b.requires_grad)

True
False


## Gradient Descent Autograd
Linear Regression example:

$f(x) = w * x + b$

here : `f(x) = 2 * x`

In [None]:
import torch

# Linear regression
# f = w * x  + b
# here : f = 2 * x

X = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8], dtype=torch.float32)
Y = torch.tensor([2, 4, 6, 8, 10, 12, 14, 16], dtype=torch.float32)

w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

# model output
def forward(x):
    return w * x

# loss = MSE
def loss(y, y_pred):
    return ((y_pred - y)**2).mean()

X_test = 5.0

print(f'Prediction before training: f({X_test}) = {forward(X_test).item():.3f}')

Prediction before training: f(5.0) = 0.000


In [None]:
# Training
learning_rate = 0.01
n_epochs = 100

for epoch in range(n_epochs):
    # predict = forward pass
    y_pred = forward(X)

    # loss
    l = loss(Y, y_pred)

    # calculate gradients = backward pass
    l.backward()

    # update weights
    #w.data = w.data - learning_rate * w.grad
    with torch.no_grad(): # Bcz we don't want to track this calculation
      w -= learning_rate * w.grad

    # zero the gradients after updating
    w.grad.zero_() # Bcz it needs to empty the gradients before next iteration. else it will accumulate

    if (epoch+1) % 10 == 0:
        print(f'epoch {epoch+1}: w = {w.item():.3f}, loss = {l.item():.3f}')

print(f'Prediction after training: f({X_test}) = {forward(X_test).item():.3f}')

epoch 10: w = 1.998, loss = 0.000
epoch 20: w = 2.000, loss = 0.000
epoch 30: w = 2.000, loss = 0.000
epoch 40: w = 2.000, loss = 0.000
epoch 50: w = 2.000, loss = 0.000
epoch 60: w = 2.000, loss = 0.000
epoch 70: w = 2.000, loss = 0.000
epoch 80: w = 2.000, loss = 0.000
epoch 90: w = 2.000, loss = 0.000
epoch 100: w = 2.000, loss = 0.000
Prediction after training: f(5.0) = 10.000


## 3. Model, Loss & Optimizer

A typical PyTorch pipeline looks like this:

1. Design model (input, output, forward pass with different layers)
2. Construct loss and optimizer
3. Training loop:
  - Forward = compute prediction and loss
  - Backward = compute gradients
  - Update weights

In [None]:
import torch
import torch.nn as nn

# Linear regression
# f = w * x
# here : f = 2 * x

# 0) Training samples, watch the shape!
X = torch.tensor([[1], [2], [3], [4], [5], [6], [7], [8]], dtype=torch.float32)
Y = torch.tensor([[2], [4], [6], [8], [10], [12], [14], [16]], dtype=torch.float32)

n_samples, n_features = X.shape
print(f'n_samples = {n_samples}, n_features = {n_features}')

# 0) create a test sample
X_test = torch.tensor([5], dtype=torch.float32)

n_samples = 8, n_features = 1


In [None]:
# 1) Design Model, the model has to implement the forward pass!

# Here we could simply use a built-in model from PyTorch
# model = nn.Linear(input_size, output_size)

class LinearRegression(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegression, self).__init__()
        # define different layers
        self.lin = nn.Linear(input_dim, output_dim)         '''in the init function we usually defile the layers
                                                               and in the forward pass we implement the layers'''
    def forward(self, x):
        return self.lin(x)


input_size, output_size = n_features, n_features

model = LinearRegression(input_size, output_size)

print(f'Prediction before training: f({X_test.item()}) = {model(X_test).item():.3f}')

# 2) Define loss and optimizer
learning_rate = 0.01
n_epochs = 100

loss = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

# 3) Training loop
for epoch in range(n_epochs):
    # predict = forward pass with our model
    y_predicted = model(X)

    # loss
    l = loss(Y, y_predicted)

    # calculate gradients = backward pass
    l.backward()

    # update weights
    optimizer.step()

    # zero the gradients after updating
    optimizer.zero_grad()

    if (epoch+1) % 10 == 0:
        w, b = model.parameters() # unpack parameters
        print('epoch ', epoch+1, ': w = ', w[0][0].item(), ' loss = ', l.item())

print(f'Prediction after training: f({X_test.item()}) = {model(X_test).item():.3f}')

Prediction before training: f(5.0) = 2.953
epoch  10 : w =  1.8603134155273438  loss =  0.12667028605937958
epoch  20 : w =  1.8665486574172974  loss =  0.1168612539768219
epoch  30 : w =  1.8717821836471558  loss =  0.10787571966648102
epoch  40 : w =  1.876810073852539  loss =  0.09958120435476303
epoch  50 : w =  1.8816407918930054  loss =  0.09192441403865814
epoch  60 : w =  1.8862820863723755  loss =  0.08485632389783859
epoch  70 : w =  1.8907414674758911  loss =  0.0783318355679512
epoch  80 : w =  1.8950258493423462  loss =  0.0723089724779129
epoch  90 : w =  1.8991422653198242  loss =  0.06674908846616745
epoch  100 : w =  1.9030972719192505  loss =  0.061616718769073486
Prediction after training: f(5.0) = 10.060
