# 00. PyTorch Fundamentals

Resource notebook: https://learnpytorch.io/00_pytorch_fundamentals

Github: https://github.com/mrdbourke/pytorch-deep-learning



In [17]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print(torch.__version__)
torch.cuda.is_available()

2.0.1+cpu


False

In [18]:
# scalar
scalar = torch.tensor(7)
scalar


tensor(7)

In [19]:
# number of dimensions
scalar.ndim

0

In [20]:
scalar.item()

7

In [21]:
# vector
vector = torch.tensor([7,7])
vector.ndim

1

In [22]:
vector.shape

torch.Size([2])

In [23]:
# Matrix

MATRIX = torch.tensor([[7,8],
                       [9,10]])
MATRIX.ndim

2

In [24]:
MATRIX.shape

torch.Size([2, 2])

In [25]:
MATRIX[1, 0]

tensor(9)

In [26]:
# Tensor
TENSOR = torch.tensor([[[1,2,3],
                        [3,6,9],
                        [2,4,5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

In [27]:
TENSOR.shape

torch.Size([1, 3, 3])

In [28]:
TENSOR[0]

tensor([[1, 2, 3],
        [3, 6, 9],
        [2, 4, 5]])

In [29]:
TENSOR[0,:,2]

tensor([3, 9, 5])

## Introduction to tensors

### Random tensors

WHy tensors?

NN's start with random  numbers and them update those tensors to better represent the data

`Start with random -> look at the data -> update -> etc.`

In [30]:
# create a random tensor

random_tensor = torch.rand(3,4)
random_tensor

tensor([[0.3677, 0.1042, 0.0132, 0.6440],
        [0.4458, 0.6237, 0.9436, 0.5116],
        [0.0805, 0.9739, 0.2929, 0.3834]])

In [31]:
random_tensor.shape

torch.Size([3, 4])

In [32]:
random_tensor.ndim

2

In [33]:
# create a random tensor with similar shape to an image tensor
random_image_size_tensor = torch.rand(size=(224,224,3)) # height, width, color channels
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([224, 224, 3]), 3)

### Zeros and ones

In [34]:
# create a tensor of zeros
zeros = torch.zeros(3,4)
zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [35]:
zeros*random_tensor

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [36]:
# create a tensor full of ones
ones = torch.ones(3,4)
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [37]:
ones.dtype, type(ones)

(torch.float32, torch.Tensor)

### Create a range or tensors and tensors-like

In [38]:
# using torch.range
one_to_ten = torch.arange(1,10, 0.25)
one_to_ten

tensor([1.0000, 1.2500, 1.5000, 1.7500, 2.0000, 2.2500, 2.5000, 2.7500, 3.0000,
        3.2500, 3.5000, 3.7500, 4.0000, 4.2500, 4.5000, 4.7500, 5.0000, 5.2500,
        5.5000, 5.7500, 6.0000, 6.2500, 6.5000, 6.7500, 7.0000, 7.2500, 7.5000,
        7.7500, 8.0000, 8.2500, 8.5000, 8.7500, 9.0000, 9.2500, 9.5000, 9.7500])

In [39]:
# creating tensors-like
ten_zeros = torch.zeros_like(one_to_ten)
ten_zeros

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

### Tensor Data types

**Bote:** Tensor data types is one of the main sources of errors in PyTorch
 1. Tensors not right datatype
 2. Tensors not right shape
 3. Tensors not right device

In [40]:
# float 32 tensor
float_32_tensor = torch.tensor([3.0, 6.0, 9.0])
float_32_tensor, float_32_tensor.dtype

(tensor([3., 6., 9.]), torch.float32)

In [41]:
# float 16 tensor
float_64_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float64, # data type - type of data stored within the tensor
                               device="cpu",        # device in which PyTorch is going to perform the calculations ("None"/"cpu"/"cuda")
                               requires_grad=False) # whether PyTorch should track the gradients of this tensor, when it goes through certain operations
float_64_tensor

tensor([3., 6., 9.], dtype=torch.float64)

In [42]:
float_16_tensor = float_32_tensor.type(torch.float16)
float_16_tensor

tensor([3., 6., 9.], dtype=torch.float16)

In [43]:
# datatype error -> in this case it does not error, but the result becomes the smallest
float_16_tensor*float_32_tensor

tensor([ 9., 36., 81.])

In [44]:
int_32_tensor = torch.tensor([3,6,9], dtype=torch.int32)
int_32_tensor

tensor([3, 6, 9], dtype=torch.int32)

In [45]:
float_32_tensor*int_32_tensor

tensor([ 9., 36., 81.])

## Getting information from tensors

1. datatype: use `tensor.dtype`
2. shape: use `tensor.shape`
3. device: use `tensor.device`

In [46]:
# create a tensor
some_tensor = torch.rand(3,4)
some_tensor

tensor([[0.9269, 0.7815, 0.0232, 0.1609],
        [0.4165, 0.3051, 0.6323, 0.3991],
        [0.9281, 0.7704, 0.7306, 0.3784]])

In [47]:
# find details about some tensor
print(some_tensor)
print(f"Datatype: {some_tensor.dtype}")
print(f"Shape: {some_tensor.shape}")
print(f"Device: {some_tensor.device}")

tensor([[0.9269, 0.7815, 0.0232, 0.1609],
        [0.4165, 0.3051, 0.6323, 0.3991],
        [0.9281, 0.7704, 0.7306, 0.3784]])
Datatype: torch.float32
Shape: torch.Size([3, 4])
Device: cpu


## Manipulating tensors (tensor operations)

Tensor operations include
 - Addition
 - Subtraction
 - Multiplication (element-wise)
 - Division
 - Matrix multiplication

In [48]:
# Tensor addition 
tensor = torch.tensor([1.,2.,3.])
tensor + 5

tensor([6., 7., 8.])

In [49]:
tensor * 10

tensor([10., 20., 30.])

In [50]:
tensor /=10 # important to define tensor as a float datatype
tensor

tensor([0.1000, 0.2000, 0.3000])

## Matrix multiplication

Besides element-wise multiplication, PyTorch also allows performing matrix multiplications

**Two mains rules for multiplication**
1. Inner dimensions must match
* `(3,2) @ (3,2)` won't work
* `(3,2) @ (2, 3)` will work
* `(2,3) @ (3,2)` will work
2. The resulting matrix has the shape of the outer dimensions
* `(3,2) @ (2, 3)` will have a size of `(3, 3)`
* `(2,3) @ (3,2)` will have a size `(2, 2)`

In [51]:
tensor = torch.tensor([1,2,3])
print(f"tensor = {tensor}\n")

# element-wise multiplication
print(f"tensor*tensor = {tensor*tensor}\n")

# matrix multiplication
print(f"torch.matmul(tensor, tensor.T) = {torch.matmul(tensor, tensor.T)}") # <- this works!
print(f"torch.matmul(tensor, tensor) = {torch.matmul(tensor, tensor)}")     # <- this works despite wrong shape!
print(f"tensor @ tensor.T = {tensor @ tensor.T}")                           # <- this works!
print(f"tensor @ tensor = {tensor @ tensor}\n")                             # <- this works despite wrong shape! @ symbol is cooler but less commonly used than matmul

## torch.mm
# torch.mm is supposed to be a shortcut alias for torch.matmul, but it is not quite the same
# print(f"torch.mm(tensor, tensor.T) = {torch.mm(tensor, tensor.T)}")       # <- does not work because 'tensor' is actually a vector, not a matrix
# torch.mm explicitly requires both inputs to be matrix, and to follow the correct matrix size
tensor = torch.tensor([[1,2,3]])                                            # define as matrix
print(f"torch.mm(tensor, tensor.T) = {torch.mm(tensor, tensor.T)}")         # <- this works!
# print(f"torch.mm(tensor, tensor) = {torch.mm(tensor, tensor)}")           # <- does not work because matrices have wrong dimensions

tensor = tensor([1, 2, 3])

tensor*tensor = tensor([1, 4, 9])

torch.matmul(tensor, tensor.T) = 14
torch.matmul(tensor, tensor) = 14
tensor @ tensor.T = 14
tensor @ tensor = 14

torch.mm(tensor, tensor.T) = tensor([[14]])


  print(f"torch.matmul(tensor, tensor.T) = {torch.matmul(tensor, tensor.T)}") # <- this works!


In [52]:
### one of the most common errors in deep learning: shape errors
(torch.rand(2,3) @ torch.rand(3,2)).shape

torch.Size([2, 2])

In [53]:
# shapes for matrix multiplications
A = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]])
B = torch.tensor([[7, 10],
                  [8, 11], 
                  [9, 12]])

print(f"A.shape = {A.shape}, B.shape = {B.shape}")
print("A @ B does not work due to wrong shapes")
print(f"A @ B.T = \n{A @ B.T}")

A.shape = torch.Size([3, 2]), B.shape = torch.Size([3, 2])
A @ B does not work due to wrong shapes
A @ B.T = 
tensor([[ 27,  30,  33],
        [ 61,  68,  75],
        [ 95, 106, 117]])


## Tensor aggregation

In [54]:
x = torch.arange(0,100,10)
x, x.dtype

(tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90]), torch.int64)

In [55]:
torch.min(x), x.min()

(tensor(0), tensor(0))

In [56]:
torch.max(x), x.max()

(tensor(90), tensor(90))

In [57]:
# example of datatype error
torch.mean(x.type(torch.float32)), x.type(torch.float32).mean()   # does not work with integers

(tensor(45.), tensor(45.))

In [58]:
torch.sum(x), x.sum()

(tensor(450), tensor(450))

In [59]:
# find the minimizer and maximizer indices
torch.argmin(x), x.argmin(), torch.argmax(x), x.argmax()

(tensor(0), tensor(0), tensor(9), tensor(9))

## Reshaping, stacking, squeezing, and unsqueezing tensors

- `reshape` - change the shape of a tensor
- `view` - return a different view of the tensor, but use the same memoty
- `stack` - general function of vstack and hstack for vertical and horizontal stacking
- `squeeze` - remove all singleton dimensions
- `unsqueeze` - add a singleton dimension to a target tensor
- `permute` - return a view of a permutation of a tensor

In [60]:
x = torch.arange(1., 10.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

In [61]:
x.reshape(3, 3), x.view(3, 3), x.unsqueeze(0), x.unsqueeze(1)

(tensor([[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]),
 tensor([[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]),
 tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 tensor([[1.],
         [2.],
         [3.],
         [4.],
         [5.],
         [6.],
         [7.],
         [8.],
         [9.]]))

In [62]:
torch.stack((x.T,x.T)), torch.stack((x, x),dim=1)

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.],
         [1., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 tensor([[1., 1.],
         [2., 2.],
         [3., 3.],
         [4., 4.],
         [5., 5.],
         [6., 6.],
         [7., 7.],
         [8., 8.],
         [9., 9.]]))

In [63]:
print(x)
z = x.view(x.shape)     # effectively creates a copy of z, but allows changing dimensions
z += 1
print(x)

tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
tensor([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])


In [64]:
print(z)
y = z                   # also creates a copy, but keeps original dimensions
z += 1
print(y)        
z = z + 1               # breaks the link between z and y
print(y)            
z += 1
print(y)

tensor([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])
tensor([ 3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])
tensor([ 3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])
tensor([ 3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])


In [65]:
print(x)
xv = x.view(3,3)    # create link between x and xv
xv += 1             # change the latter
x, xv               # connected results

tensor([ 3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])


(tensor([ 4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.]),
 tensor([[ 4.,  5.,  6.],
         [ 7.,  8.,  9.],
         [10., 11., 12.]]))

In [66]:
# squeeze example 1
v = torch.tensor([[1,2,3]])
v, v.size(), v.squeeze()

(tensor([[1, 2, 3]]), torch.Size([1, 3]), tensor([1, 2, 3]))

In [67]:
# squeeze example 2
w = torch.tensor([[1],
                  [2],
                  [3]])
w, w.size(), w.squeeze()

(tensor([[1],
         [2],
         [3]]),
 torch.Size([3, 1]),
 tensor([1, 2, 3]))

In [68]:
# unsqueeze - adds a single dimension to a target sensor at a specific dim
vs = v.squeeze()
print(f"xs = {vs}, shape = {vs.shape}")
print(f"xs.unsqueeze(dim=0) = {vs.unsqueeze(dim=0)}, shape = {vs.unsqueeze(dim=0).shape}")
print(f"xs.unsqueeze(dim=1) = \n{vs.unsqueeze(dim=1)}, shape = {vs.unsqueeze(dim=1).shape}")

xs = tensor([1, 2, 3]), shape = torch.Size([3])
xs.unsqueeze(dim=0) = tensor([[1, 2, 3]]), shape = torch.Size([1, 3])
xs.unsqueeze(dim=1) = 
tensor([[1],
        [2],
        [3]]), shape = torch.Size([3, 1])


In [69]:
# torch,permute rearranges the dimensions of a target tensor in a specified order
y = torch.rand(size=[2,3,4])
print(f"w.shape = \n{y.shape}")
print(f"torch.permute(w, [1,0]).shape = \n{torch.permute(y, [2,0,1]).shape}")

w.shape = 
torch.Size([2, 3, 4])
torch.permute(w, [1,0]).shape = 
torch.Size([4, 2, 3])


## Index with numpy

It works in a very similar way to numpy

In [70]:
y_p = torch.permute(y, [2, 0, 1])
print(f"y[0, 0, 0] = {y[0, 0, 0]}")
y_p[0, 0, 0] = -1
print(f"y[0, 0, 0] = {y[0, 0, 0]}")

y[0, 0, 0] = 0.7797583341598511
y[0, 0, 0] = -1.0


In [71]:
p = torch.arange(1,10).reshape([1,3,3])
p

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [72]:
# recursive dimension indexation 
p[0], p[0, 0], p[0, 0, 0]

(tensor([[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]),
 tensor([1, 2, 3]),
 tensor(1))

In [73]:
# notation
p[0, 0, 0] == p[0][0][0]

tensor(True)

In [74]:
# semicolon
p[0,:,1]

tensor([2, 5, 8])

In [75]:
# to beware of -> comparable values, but different shapes
p[:, 0, 0], p[0, 0, 0], p[:, 0, 0] == p [0, 0, 0], p[:, 0, 0].shape, p[0, 0, 0].shape , p[:, 0, 0].shape == p[0, 0, 0].shape 

(tensor([1]),
 tensor(1),
 tensor([True]),
 torch.Size([1]),
 torch.Size([]),
 False)

## PyTorch and NumPy

There are functionalities to interact between numpy and torch

- Convert from the datatype of one library to another
  - convert from tensor to numpy: `torch.Tensor.numpy()`
  - convert from numpy to tensor: `torch.from_numpy()`

In [76]:
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [77]:
# PyTorch reflects nunpy's default datatype of float64, unless specified otherwise
another_tensor = torch.arange(1.0, 8.0)
yet_another = another_tensor.type(torch.float32)
array.dtype, tensor.dtype, another_tensor.dtype, yet_another.dtype

(dtype('float64'), torch.float64, torch.float32, torch.float32)

In [78]:
# tensor to numpy -> same datatype conversion applies
tensor = torch.ones(7)
n_tensor = tensor.numpy()
tensor, n_tensor.dtype

(tensor([1., 1., 1., 1., 1., 1., 1.]), dtype('float32'))

## Reproducibility (trying to take randomness out of random)

In short, neural networks strat with random numbers and use tensor operations to update those numbers to obtain a better representation of it.
This is repeate dover and onver until the data representation becomes meaningfull

To reduce the randomness in neural networks and PyTorch, comes the conceptof **random seed**

The random sees flavours the randomness -> this is called pseudo-randomness

In [79]:
A = torch.rand(3, 4)
B = torch.rand_like(A)

A, B, A == B

(tensor([[0.4681, 0.2717, 0.9445, 0.3037],
         [0.4831, 0.8655, 0.5243, 0.7733],
         [0.9188, 0.8339, 0.1196, 0.1071]]),
 tensor([[0.6141, 0.3936, 0.0040, 0.5842],
         [0.4539, 0.3378, 0.8914, 0.5789],
         [0.9347, 0.7731, 0.2527, 0.5470]]),
 tensor([[False, False, False, False],
         [False, False, False, False],
         [False, False, False, False]]))

In [80]:
# random but reproducible tensors
# sent the random seet
torch.manual_seed(42)
C = torch.rand(3, 4)
D = torch.rand_like(C)

C, D, C == D


(tensor([[0.8823, 0.9150, 0.3829, 0.9593],
         [0.3904, 0.6009, 0.2566, 0.7936],
         [0.9408, 0.1332, 0.9346, 0.5936]]),
 tensor([[0.8694, 0.5677, 0.7411, 0.4294],
         [0.8854, 0.5739, 0.2666, 0.6274],
         [0.2696, 0.4414, 0.2969, 0.8317]]),
 tensor([[False, False, False, False],
         [False, False, False, False],
         [False, False, False, False]]))

In [81]:
torch.manual_seed(42)   # set the manual seed
C = torch.rand(3, 4)
torch.manual_seed(42)   # set it again
D = torch.rand_like(C)

C, D, C == D

# the random seed only affects the next usage of the random function

(tensor([[0.8823, 0.9150, 0.3829, 0.9593],
         [0.3904, 0.6009, 0.2566, 0.7936],
         [0.9408, 0.1332, 0.9346, 0.5936]]),
 tensor([[0.8823, 0.9150, 0.3829, 0.9593],
         [0.3904, 0.6009, 0.2566, 0.7936],
         [0.9408, 0.1332, 0.9346, 0.5936]]),
 tensor([[True, True, True, True],
         [True, True, True, True],
         [True, True, True, True]]))

## Running tensors and PyToch objects on GPUs and making faster computations

GPUS == faster computation on numbers thanks to CUDA + NVIDIA hardware + PyTroch working behind the scenes

### Getting GPUs (in order of preference / size of requirements)

1. Using Google Colab to use GPUs for free
2. Use your own GPU - needs to be setup (hardware and software)
   1. Hardware - Tim Dettmers -> post for deep learning
3. Use cloud computing such as (computer rental)
   1. GCP (Google)
   2. AWS (Amazon)
   3. Azure (Microsoft)

For 2, 3 + PyTorch + GU drives (CUDA) takes a little bit of setting up -> PyTorch documentation 
- not so complicated but *you do* need the resources - which I currently don't :'(

In [82]:
# check for GPU access with PyTorch
torch.cuda.is_available()

False

In [83]:
# setup device-agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

# checkout PyTorch documentation on device-agnostic code

'cpu'

In [84]:
# count the number of GPUs available
torch.cuda.device_count()

0

## Putting tensors (and models) on the GPU

(because using GPU resuts in faster calculations)

In [85]:
# creating a tensor in default device

t = torch.tensor([1, 2, 3])

t, t.device

(tensor([1, 2, 3]), device(type='cpu'))

In [86]:
# move tensor to the target device
device = "cuda" if torch.cuda.is_available() else "cpu"
t_on_device = t.to(device)

t_on_device, t_on_device.device

# this tries to move to gpu, only if possible

(tensor([1, 2, 3]), device(type='cpu'))

## Moving a tensor to the CPU

In [87]:
# if we want to move one tensor to NumPy, it is important to move it first to the cpu
t_on_cpu = t.to("cpu")  # or t.cpu()  -  t.gpu() or t.cuda() do not exist!

a = t_on_cpu.numpy()

t_on_cpu, a

(tensor([1, 2, 3]), array([1, 2, 3], dtype=int64))