## 00. PyTorch Fundamentals

This is a following of PyTorch Zero To Mastery. My goal is to become comfortable with using PyTorch!
Since this is on Google Colab, I'll check it out, since I've been working on Kaggle so far.


On Runtime > Change runtime type
You can see the GPU specs, and stuff like the CUDA version

In [None]:
!nvidia-smi
### For TPU Info...
# !tpu-info

Sun Sep  1 16:28:03 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

Common libraries that you can easily bring out:

In [1]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(torch.__version__)

2.4.0+cu121


CUDA 12.1 it seems.
So if you go to the PyTorch Documentation website, they'll tell you the compute platform (CUDA) that you need for the latest version.

CUDA is NVIDIA's interface to run code on the GPU!
TPU is Tensor Processing Unit.

### Tensors
Now let's play with PyTorch tensors.
The main data structure we're handling in Neural Networks. It's multi-dimensional numerical data to represent something.

In [None]:
# scalar
scalar = torch.tensor(7) #Rank-0 tensor?
scalar

tensor(7)

Let's see tensor attributes

In [None]:
# Check dimensions
scalar.ndim

0

In [None]:
# Get tensor as python int
scalar.item()

7

In [None]:
# Vector
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

In [None]:
vector.ndim

1

In [None]:
vector.shape

torch.Size([2])

In [None]:
# MATRIX
MATRIX = torch.tensor([[7, 8]])
MATRIX

tensor([[7, 8]])

In [None]:
MATRIX_2 = torch.tensor([[7,8], [5, 6]])
MATRIX_2

tensor([[7, 8],
        [5, 6]])

In [None]:
MATRIX.ndim

2

In [None]:
MATRIX.shape

torch.Size([1, 2])

In [None]:
MATRIX_2.ndim

2

In [None]:
MATRIX_2.shape

torch.Size([2, 2])

In [None]:
# TENSOR
TENSOR = torch.tensor([[[1,2,3],
                        [4,5,6],
                        [7,8,9]],
                       [[10,11,12],
                        [13,14,15],
                        [16,17,18]]])
TENSOR

tensor([[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 7,  8,  9]],

        [[10, 11, 12],
         [13, 14, 15],
         [16, 17, 18]]])

In [None]:
TENSOR.ndim

3

In [None]:
TENSOR.shape

torch.Size([2, 3, 3])

You'll notice in the example below that the tensors are somewhat forced to comply to a standard. I can't just create any tensor; they need to make some sense.

In [None]:
T2 = torch.tensor([[1, 2, 3], [[4, 5, 6], [7, 8, 9]]])
T2

ValueError: expected sequence of length 3 at dim 1 (got 2)

Same below, we can't just have any shape

In [None]:
T3 = torch.tensor([[[4, 5, 6]], [[7, 8]]])
T3

ValueError: expected sequence of length 3 at dim 2 (got 2)

In [None]:
T4 = torch.tensor([[1, 2, 3], [4, 5, 6]])
T5 = torch.tensor([[[1, 2, 3], [4, 5, 6]]])

In [None]:
display(T5)
display(T5.shape)
display(T5.ndim)

tensor([[[1, 2, 3],
         [4, 5, 6]]])

torch.Size([1, 2, 3])

3

In [None]:
T6 = torch.tensor([[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]])

In [None]:
display(T6)
display(T6.ndim)
display(T6.shape)

tensor([[[ 1,  2,  3,  4,  5,  6],
         [ 7,  8,  9, 10, 11, 12]]])

3

torch.Size([1, 2, 6])

### Random Tensors
So just to mention a standard: Matrices and Tensors go uppercase, scalar and vectors go lowercase.

We use Random Tensors at initialization because that results in better learning. They're values close to 0 but not quite 0. We try to avoid gradient reduction or explosion.

In [None]:
# Create a random tensor or size / shape (3, 4)
random_tensor = torch.rand((3, 4))
display(random_tensor)
display(random_tensor.ndim)
display(random_tensor.shape)

rt2 = torch.rand(3, 4)
display(rt2)
display(rt2.ndim)
display(rt2.shape)

tensor([[0.3799, 0.9738, 0.2588, 0.1450],
        [0.7659, 0.1005, 0.7388, 0.8655],
        [0.2948, 0.9965, 0.8677, 0.0887]])

2

torch.Size([3, 4])

tensor([[0.1988, 0.6420, 0.5964, 0.0162],
        [0.6406, 0.6896, 0.1900, 0.1341],
        [0.6431, 0.9740, 0.1846, 0.1727]])

2

torch.Size([3, 4])

In [None]:
%%capture
rt3 = torch.rand(10, 10, 10)
rt3

In [None]:
random_image_size_tensor = torch.rand(size=(3, 224, 224))
random_image_size_tensor.shape

torch.Size([3, 224, 224])

### Zeros and ones

In [None]:
zeros = torch.zeros(size=(3,4))
zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [None]:
ones = torch.ones(size=(3, 4))
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [None]:
#Default is float32, unless you explicitly define one!
ones.dtype

torch.float32

### Torch Range
Range is a deprecated feature replaced by "arange"

In [None]:
torch.__version__

'2.4.0+cu121'

In [None]:
torch.arange(1,11)

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [None]:
torch.arange(0, 10)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
torch.arange(20, 30)

tensor([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [None]:
one_to_ten = torch.arange(start = 1, end = 10, step=1)
one_to_ten

tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
something_else = torch.arange(start = 0, end = 100, step = 10)
something_else

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [None]:
# Creating tensors like
#   You can replicate tensors without replicating with explicitly specifying the shape.
some_tensor_copying_shape = torch.zeros_like(input=something_else)
display(some_tensor_copying_shape)
display(some_tensor_copying_shape.shape)
display(some_tensor_copying_shape.ndim)

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

torch.Size([10])

1

In [None]:
one_to_ten_reshape = one_to_ten.reshape(shape=(1, 9))
display(one_to_ten_reshape)
display(one_to_ten_reshape.shape)
display(one_to_ten_reshape.ndim)

tensor([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

torch.Size([1, 9])

2

### Tensor Data Types


In [None]:
float_32_tensor = torch.tensor([3.0, 6.0, 9.0], dtype=None)
display(float_32_tensor)
display(float_32_tensor.dtype)
display(float_32_tensor.shape)
display(float_32_tensor.ndim)

tensor([3., 6., 9.])

torch.float32

torch.Size([3])

1

Because float32 is the default type, when specifying "None" it returns to that type.

In [None]:
tensorx = torch.tensor([3.0, 6.0, 9.0],
                       dtype=torch.float16, # Tensor DataType
                       device=None,         # Device the tensor is stored in (active memory)
                       requires_grad=False) # Track gradients with these tensor's operations
tensorx

tensor([3., 6., 9.], dtype=torch.float16)

These types have to do with precision. I saw once that you can train models using less RAM by using less precision. I think it was a LLaMa conference where I saw that mentioned. Anyway, while the model won't be as good, it's possible.

- Single Precision: 32
- Half Precision: 16

Be mindful or datatypes, since it might be one of your main problems (3)
- Tensors are not the right datatype
- Tensors are not the right shape
- Tensors are not on the right device


### Device
By default it's CPU.
GPU and TPU would mean you choose something like "cuda"
But if you try to do operations with tensors not on the same device, you may encounter problems.

### Require Grad
Cover the gradients.

In [None]:
### Convert dtype
float_16_tensor = float_32_tensor.type(torch.float16)
float_16_tensor

tensor([3., 6., 9.], dtype=torch.float16)

In [None]:
float_mult = float_16_tensor * float_32_tensor
float_mult

tensor([ 9., 36., 81.])

### Multiplying data types
It results in no error, but the precision or results might differ in the result!
It's important to know it may add up over time.

In [None]:
int_32_tensor = torch.tensor([3, 6, 9], dtype=torch.int32)
int_32_tensor

tensor([3, 6, 9], dtype=torch.int32)

In [None]:
float_int_mult = int_32_tensor * float_32_tensor
float_int_mult

tensor([ 9., 36., 81.])

In [None]:
attempt_tensor = torch.tensor([3, 6, 9], dtype=torch.long)

In [None]:
display(tensorx.dtype)
display(tensorx.shape)
display(tensorx.device)

torch.float16

torch.Size([3])

device(type='cpu')

In [None]:
def tensorprint(tensor):
    print(f"tensor: {tensor}, shape: {tensor.shape}, device: {tensor.device}")

tensorprint(tensorx)

tensor: tensor([3., 6., 9.], dtype=torch.float16), shape: torch.Size([3]), device: cpu


## Manipulate Tensors

Tensor operations.
- Addition
- Subtraction
- Multiplication (element-wise)
- Division
- Matrix multiplication

In [None]:
# Addition (element-wise)
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [None]:
# Multiplication (element-wise)
tensor = torch.tensor([1, 2, 3])
tensor = tensor * 10
tensor

tensor([10, 20, 30])

In [None]:
# Subtraction
tensor = torch.tensor([1, 2, 3])
tensor -= 10
tensor

tensor([-9, -8, -7])

In [None]:
tensor = torch.tensor([1, 2, 3])
tensor = torch.mul(tensor, 10)
tensor

tensor([10, 20, 30])

### Matrix Multiplication
Two common ways
1. Element-wise
2. Matrix Multiplication

Matrix Multiplication is perhaps the most common operation used in DL.

Dot Product, called the Scalar Product...
I remember this one. The rule is that the columns of the first must match the rows of the second to be able to multiply them.

(m x n) . (n x k)
so "n" matches in both.

In [None]:
# Matrix Multiplication
tensor1 = torch.tensor([1, 2, 3])
tensor2 = torch.matmul(tensor1, tensor1)
tensor2

tensor(14)

Because these two are Rank-1 tensors, there's no problem multiplying them. However, going into rank-2, you'll see this won't be possible, as the rules of the shapes required are not being followed.

In [None]:
tensor3 = torch.tensor([[1, 2, 3]])
display(tensor3.shape)
tensor4 = torch.matmul(tensor3, tensor3)

torch.Size([1, 3])

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x3 and 1x3)

Note that matmul is different depending on order.

In [None]:
tensor1 = torch.tensor([[1, 2, 3],
                        [4, 5, 6],
                        [7, 8, 9]])
tensor2 = torch.tensor([[10, 11, 12],
                        [13, 14, 15],
                        [16, 17, 18]])

tensorMatMul1 = torch.matmul(tensor1, tensor2)
tensorMatMul2 = torch.matmul(tensor2, tensor1)
display(tensorMatMul1)
display(tensorMatMul2)

tensor([[ 84,  90,  96],
        [201, 216, 231],
        [318, 342, 366]])

tensor([[138, 171, 204],
        [174, 216, 258],
        [210, 261, 312]])

PyTorch and similar libraries use Vectorization to make computation way faster than using for loops on these operations.

In [None]:
%%time
value = 0
tensor = torch.tensor([1, 2, 3])
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
print(value)

tensor(14)
CPU times: user 2.69 ms, sys: 979 µs, total: 3.67 ms
Wall time: 3.54 ms


In [None]:
 %%time
 tensor = torch.tensor([1, 2, 3])
 print(torch.matmul(tensor, tensor))
 #Another form is tensor @ tensor but that's weird. Don't do that.

tensor(14)
CPU times: user 828 µs, sys: 0 ns, total: 828 µs
Wall time: 838 µs


Do note that "print" also adds time. At least it it so in Leetcode, I'm assuming that takes quite a toll in time.

# MATRIX MULTIPLICATION RULES

1. Shape Rule
What must match are the INNER DIMENSIONS. That's right, plural.

So I asked ChatGPT about this. Here's the reasoning it provided:
"The last dimension of the first tensor must match the second-to-last dimension of the second tensor".

A.shape = (3, 2, 3)
B.shape = (2, 3, 5)

Tensor A has inner dimensions 2 x 3 (where 3 is the last dimension of A)
Tensor B has inner dimensions 2 x 3 (where 2 is the second-to-last dimension of B)

To multiply these tensors, the INNER DIMENSIONS that must match are
1. The last dimension of A
2. The second-to-last dimension of B.

The resulting shape is based on the outer dimensions:
1. The outer dimensions of A: (3, 2)
2. The outer dimension of B: (5)


So going back to the rules specified by the course:
1. The inner dimensions must match
2. The resulting matrix has the shape of the outer dimensions


However, there's still one more trick up the sleeve. Look at this:

In [None]:
tensor1 = torch.rand(size=(4, 2, 3))
tensor2 = torch.rand(size=(2, 3, 5))
display(tensor1.shape)
display(tensor2.shape)
tensor3 = torch.matmul(tensor1, tensor2)
tensor3.shape

torch.Size([4, 2, 3])

torch.Size([2, 3, 5])

RuntimeError: The size of tensor a (4) must match the size of tensor b (2) at non-singleton dimension 0

4 and 2 don't match. But didn't we just say that what mattered where the other dimensions? Well they lied to me too.

ChatGPT said that Batch size matters too.
Batch Size is the number of examples. In a tensor, the first dimension is what's usually saved to specify the number of examples. Thus, the first dimension of matrices must always match too in order to be able to multiply these together. Quite interesting

Batch Dimension Issue:

"The remaining dimensions of Tensor A and Tensor B need to be aligned for batch operations.
Tensor A has the shape (3,2,3)(3,2,3), so the first dimension (3) should match the corresponding batch dimension in Tensor B.
Tensor B has the shape (2,3,5)(2,3,5), so the first dimension here is 2, which doesn’t match Tensor A’s first dimension."

In [None]:
tensor1 = torch.rand(size=(4, 2, 3))
tensor2 = torch.rand(size=(4, 3, 5))
display(tensor1.shape)
display(tensor2.shape)
tensor3 = torch.matmul(tensor1, tensor2)
display(tensor3.shape)


torch.Size([4, 2, 3])

torch.Size([4, 3, 5])

torch.Size([4, 2, 5])

### Beware torch.mm()

Do note torch.mm() exists, but it's not exactly the same as Matmul.
It's for matrices only. It can't broadcast. [See here](https://stackoverflow.com/a/73941114).
The code below gives an error, for example.

In [None]:
tensor4 = torch.mm(tensor1, tensor2)
display(tensor4.shape)

RuntimeError: self must be a matrix

However the code below does work.

In [None]:
tensor5 = torch.tensor([[1, 2, 3], [4, 5, 6]])
tensor6 = torch.tensor([[7, 8],
                        [9, 10],
                        [11, 12]])
tensor7 = torch.mm(tensor5, tensor6)
tensor7.shape

torch.Size([2, 2])

## Transpose
You can manipulate shape with this. reshape() also exists!
The change in transpose is quite unintuitive at first, so it requires some practice. Though it will relief the viewer to note that most of it is already done for you in the built-in models and APIs.

In [None]:
tensor1 = torch.rand(size=(4, 2, 3))
tensor2 = torch.rand(size=(2, 3, 5))

tensor1T = tensor1.T
display(tensor1.shape)
display(tensor1T.shape)
display(tensor1)
display(tensor1T)

  tensor1T = tensor1.T


torch.Size([4, 2, 3])

torch.Size([3, 2, 4])

tensor([[[0.3667, 0.0947, 0.9330],
         [0.4262, 0.0107, 0.4054]],

        [[0.6728, 0.0431, 0.4432],
         [0.6445, 0.0532, 0.2225]],

        [[0.6918, 0.9444, 0.7496],
         [0.2251, 0.8825, 0.6611]],

        [[0.9980, 0.0872, 0.4365],
         [0.2493, 0.1969, 0.9101]]])

tensor([[[0.3667, 0.6728, 0.6918, 0.9980],
         [0.4262, 0.6445, 0.2251, 0.2493]],

        [[0.0947, 0.0431, 0.9444, 0.0872],
         [0.0107, 0.0532, 0.8825, 0.1969]],

        [[0.9330, 0.4432, 0.7496, 0.4365],
         [0.4054, 0.2225, 0.6611, 0.9101]]])

Wow, there's even a notice that .T is going to be deprecated.

In [None]:
tensor1 = torch.tensor([[[1, 2, 3],
                       [4, 5, 6],
                       [7, 8, 9]],
                      [[10, 11, 12],
                       [13, 14, 15],
                       [16, 17, 18]]])


tensor1T = tensor1.transpose(1, -1)
display(tensor1)
display(tensor1.shape)
display(tensor1T.shape)
display(tensor1T)

tensor([[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 7,  8,  9]],

        [[10, 11, 12],
         [13, 14, 15],
         [16, 17, 18]]])

torch.Size([2, 3, 3])

torch.Size([2, 3, 3])

tensor([[[ 1,  4,  7],
         [ 2,  5,  8],
         [ 3,  6,  9]],

        [[10, 13, 16],
         [11, 14, 17],
         [12, 15, 18]]])

In [None]:
tensor2 = torch.tensor([[[1, 2],
                         [3, 4],
                         [5, 6]],
                        [[7, 8],
                         [9, 10],
                         [11, 12]]])
tensor2T = tensor2.transpose(-1, 1)
display(tensor2)
display(tensor2.shape)
display(tensor2T.shape)
display(tensor2T)

tensor([[[ 1,  2],
         [ 3,  4],
         [ 5,  6]],

        [[ 7,  8],
         [ 9, 10],
         [11, 12]]])

torch.Size([2, 3, 2])

torch.Size([2, 2, 3])

tensor([[[ 1,  3,  5],
         [ 2,  4,  6]],

        [[ 7,  9, 11],
         [ 8, 10, 12]]])

# Tensor Aggregation
Find the max, min, mean, sum, and others...

In [None]:
x = torch.arange(1, 123, 11)
x

tensor([  1,  12,  23,  34,  45,  56,  67,  78,  89, 100, 111, 122])

In [None]:
torch.min(x), x.min()

(tensor(1), tensor(1))

In [None]:
torch.max(x), x.max()

(tensor(122), tensor(122))

In [None]:
torch.mean(x)

RuntimeError: mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Long

We've now found our first error based on datatype.
What's happening? The tensor "x" is dtype "int64", which is long.
What happens is that the OUPUT can't be int64 just as the input, so we must specify which output we want for the mean.

In [None]:
torch.mean(x, dtype=torch.float32)

tensor(61.5000)

In [None]:
torch.sum(x), x.sum()

(tensor(738), tensor(738))

# Positional min and max
argmin() and argmax() will return the index position of where the mix/max value is

In [None]:
x.argmin()

tensor(0)

In [None]:
x.argmax()

tensor(11)

In [None]:
x[x.argmin().item()]

tensor(1)

In [None]:
x[x.argmax().item()]

tensor(122)

### Reshaping, Viewing, and Stacking


In [None]:
tensor = torch.arange(1, 123, 11)
tensor

tensor([  1,  12,  23,  34,  45,  56,  67,  78,  89, 100, 111, 122])

Reshape is one of the sources of problems.
- Reshaping - reshapes an input tensor to a defined shape
- View - return a view of an input tensor of certain shape, but keep the memory as the original tensor
- Stacking - combine multiple tensors on top of each other.
- Squeeze - Removes all "1" dimensions from a tensor
- Unsqueeze - Add a "1" dimension to a target tensor
- Permute - Return a view of the input with dimensions permuted (swapped) in a certain way.

### Stack types
torch.stacks concatenates a sequence of tensors ALONG A NEW DIMENSION

In [None]:
#
x = torch.arange(1., 11., 1)
x, x.shape

(tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]), torch.Size([10]))

In [None]:
# Add an extra dimension
# The dimensions have to be compatible with the original dimensions.
x_reshaped = x.reshape(1, 10)
x_reshaped, x_reshaped.shape

(tensor([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]]),
 torch.Size([1, 10]))

In [None]:
x_reshaped = x.reshape(1,8) #This will tell us that this is not possible

RuntimeError: shape '[1, 8]' is invalid for input of size 10

In [None]:
x_reshaped_2 = x.reshape(2, 5)
x_reshaped_2

tensor([[ 1.,  2.,  3.,  4.,  5.],
        [ 6.,  7.,  8.,  9., 10.]])

In [None]:
xreshape3 = x.reshape(10, 1)
xreshape3, xreshape3.shape

(tensor([[ 1.],
         [ 2.],
         [ 3.],
         [ 4.],
         [ 5.],
         [ 6.],
         [ 7.],
         [ 8.],
         [ 9.],
         [10.]]),
 torch.Size([10, 1]))

In [None]:
xreshape4 = x.reshape(10, -1)
xreshape4, xreshape4.shape, xreshape4.ndim

(tensor([[ 1.],
         [ 2.],
         [ 3.],
         [ 4.],
         [ 5.],
         [ 6.],
         [ 7.],
         [ 8.],
         [ 9.],
         [10.]]),
 torch.Size([10, 1]),
 2)

In [None]:
xreshape5 = x.reshape(10, )
xreshape5, xreshape5.shape, xreshape5.ndim

(tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]),
 torch.Size([10]),
 1)

In [None]:
xreshape6 = x.reshape(10)
xreshape6, xreshape6.shape, xreshape6.ndim

(tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]),
 torch.Size([10]),
 1)

In [None]:
# View
display(x)
display(x.shape)
display(x.ndim)
z = x.view(1, 10)
z, z.shape, z.ndim

tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

torch.Size([10])

1

(tensor([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]]),
 torch.Size([1, 10]),
 2)

In [None]:
# Changing z changes x, as a view is not a new instance
z[:, 0] = 5
z, x

(tensor([[ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]]),
 tensor([ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]))

In [None]:
# Stack tensors
x_stacked = torch.stack([x, x, x, x], dim=0) #We define the dimension, but by default it's dim 0
x_stacked, x_stacked.shape, x_stacked.ndim

(tensor([[ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
         [ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
         [ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
         [ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]]),
 torch.Size([4, 10]),
 2)

In [None]:
xs = torch.stack([x, x, x, x], dim=1)
xs, xs.shape, xs.ndim

(tensor([[ 5.,  5.,  5.,  5.],
         [ 2.,  2.,  2.,  2.],
         [ 3.,  3.,  3.,  3.],
         [ 4.,  4.,  4.,  4.],
         [ 5.,  5.,  5.,  5.],
         [ 6.,  6.,  6.,  6.],
         [ 7.,  7.,  7.,  7.],
         [ 8.,  8.,  8.,  8.],
         [ 9.,  9.,  9.,  9.],
         [10., 10., 10., 10.]]),
 torch.Size([10, 4]),
 2)

In [None]:
xs2 = torch.stack([x, x, x, x], dim=2) #Here my guess is we don't have a third dimension, so this happens.
xs2, xs2.shape, xs2.ndim

IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

## Squeeze, unsqueeze, Permute
Squeeze removes all "1" dimensions from a target. If you have a tensor that you want to turn into a list, you can see how this can be useful.
Dimensions which have a "1" size will be "dissolved" so to speak. I don't think that necessarily means data loss.

In [None]:
x = torch.rand((2, 1))
y = torch.squeeze(x)
x, y, x.shape, y.shape, x.ndim, y.ndim

(tensor([[0.1501],
         [0.0598]]),
 tensor([0.1501, 0.0598]),
 torch.Size([2, 1]),
 torch.Size([2]),
 2,
 1)

In [None]:
x = torch.rand((1, 20))
y = torch.squeeze(x)
x, y, x.shape, y.shape, x.ndim, y.ndim

(tensor([[0.1413, 0.9571, 0.0811, 0.7496, 0.9587, 0.2009, 0.1154, 0.1904, 0.6180,
          0.4428, 0.9342, 0.2858, 0.5183, 0.1647, 0.8905, 0.8437, 0.1531, 0.8048,
          0.6827, 0.0692]]),
 tensor([0.1413, 0.9571, 0.0811, 0.7496, 0.9587, 0.2009, 0.1154, 0.1904, 0.6180,
         0.4428, 0.9342, 0.2858, 0.5183, 0.1647, 0.8905, 0.8437, 0.1531, 0.8048,
         0.6827, 0.0692]),
 torch.Size([1, 20]),
 torch.Size([20]),
 2,
 1)

In [None]:
# Unsqueeze: Adds a single dimension to a target tensor at a specific dim (dimension)
x = torch.randn((3, 2))
y = torch.unsqueeze(x, dim=0)
x, y, x.shape, y.shape, x.ndim, y.ndim

(tensor([[-0.3700, -0.8110],
         [ 1.1649,  0.7109],
         [-0.2753, -0.2158]]),
 tensor([[[-0.3700, -0.8110],
          [ 1.1649,  0.7109],
          [-0.2753, -0.2158]]]),
 torch.Size([3, 2]),
 torch.Size([1, 3, 2]),
 2,
 3)

In [None]:
x = torch.randn((1, 1))
y = torch.unsqueeze(x, dim=1)
x, y, x.shape, y.shape, x.ndim, y.ndim

(tensor([[0.1340]]),
 tensor([[[0.1340]]]),
 torch.Size([1, 1]),
 torch.Size([1, 1, 1]),
 2,
 3)

In [None]:
x = torch.randn((2, 7))
y = torch.unsqueeze(x, dim=1)
x, y, x.shape, y.shape, x.ndim, y.ndim

(tensor([[-0.7359, -1.3037,  0.1634,  0.3618, -0.5923,  2.2682,  0.3230],
         [-3.4899,  1.1723, -0.9126,  0.2069, -1.5390, -1.0569,  0.7860]]),
 tensor([[[-0.7359, -1.3037,  0.1634,  0.3618, -0.5923,  2.2682,  0.3230]],
 
         [[-3.4899,  1.1723, -0.9126,  0.2069, -1.5390, -1.0569,  0.7860]]]),
 torch.Size([2, 7]),
 torch.Size([2, 1, 7]),
 2,
 3)

In [None]:
#Permute - rearranges the dimensions of a target tensor in a specified order
# Note that it returns a VIEW of the original tensor input. That means modifying it modifies the original.
# It's common to see this in images.

x = torch.randn(2, 3, 5)
display(x.size())
y = torch.permute(x, (2, 0, 1))
display(y.size())
display(x)
display(y)

torch.Size([2, 3, 5])

torch.Size([5, 2, 3])

tensor([[[ 8.3887e-02,  2.8687e-03, -2.7764e-01,  6.0835e-01, -2.8858e-02],
         [ 1.4438e-01,  2.4628e+00,  3.0026e+00, -2.8761e-01,  7.5003e-01],
         [-7.4273e-01,  1.1583e+00, -1.9774e-01, -1.0895e+00, -1.7543e+00]],

        [[ 3.9556e-01,  7.0030e-01, -3.1054e-01, -5.1727e-01, -2.4754e-01],
         [ 7.5251e-01, -7.1377e-02, -3.3968e-03,  6.3299e-01,  7.6840e-01],
         [ 2.1363e+00, -4.0524e-02,  3.1125e-01,  6.1325e-01, -7.0543e-02]]])

tensor([[[ 8.3887e-02,  1.4438e-01, -7.4273e-01],
         [ 3.9556e-01,  7.5251e-01,  2.1363e+00]],

        [[ 2.8687e-03,  2.4628e+00,  1.1583e+00],
         [ 7.0030e-01, -7.1377e-02, -4.0524e-02]],

        [[-2.7764e-01,  3.0026e+00, -1.9774e-01],
         [-3.1054e-01, -3.3968e-03,  3.1125e-01]],

        [[ 6.0835e-01, -2.8761e-01, -1.0895e+00],
         [-5.1727e-01,  6.3299e-01,  6.1325e-01]],

        [[-2.8858e-02,  7.5003e-01, -1.7543e+00],
         [-2.4754e-01,  7.6840e-01, -7.0543e-02]]])

In [None]:
x = torch.randn((3, 224, 224))
y = torch.permute(x, ((1, 2, 0))) #Shift and place the # of channels at the end.
display(y.shape)
display(x.shape)
display(x)
display(y)

torch.Size([224, 224, 3])

torch.Size([3, 224, 224])

tensor([[[-2.3946,  1.2970, -0.5927,  ...,  0.1899, -0.6041,  0.4061],
         [ 0.4394,  1.0205,  0.2047,  ..., -0.6674,  0.3413, -1.4071],
         [-0.3512, -1.4653, -0.5636,  ...,  0.3621, -0.9611, -0.3425],
         ...,
         [ 0.4298,  0.3118,  0.1490,  ...,  2.3798, -0.9221, -0.5985],
         [ 1.0519,  1.0841,  1.2245,  ...,  0.0734,  0.3956, -0.0513],
         [ 1.5514,  0.5091,  0.5157,  ...,  0.2007,  0.9563, -0.8880]],

        [[ 0.3922,  0.8031,  0.5586,  ...,  0.5706,  1.1159, -0.1373],
         [-0.2979,  1.2947, -0.9456,  ...,  2.4974,  0.3412, -0.5854],
         [-0.1415, -0.5968,  0.3512,  ..., -0.2781, -0.1609, -0.3078],
         ...,
         [ 0.9989,  0.1504, -1.4251,  ...,  1.2049,  1.3794,  0.4523],
         [ 0.9639,  1.6826,  1.0754,  ...,  0.9314, -0.0096, -0.1676],
         [-0.8348,  0.4865,  0.6066,  ...,  1.3751, -0.5372, -0.2740]],

        [[-0.2232, -0.4847,  1.1859,  ...,  1.3367,  0.2214,  0.2265],
         [-1.2972, -0.5633,  0.5234,  ..., -2

tensor([[[-2.3946,  0.3922, -0.2232],
         [ 1.2970,  0.8031, -0.4847],
         [-0.5927,  0.5586,  1.1859],
         ...,
         [ 0.1899,  0.5706,  1.3367],
         [-0.6041,  1.1159,  0.2214],
         [ 0.4061, -0.1373,  0.2265]],

        [[ 0.4394, -0.2979, -1.2972],
         [ 1.0205,  1.2947, -0.5633],
         [ 0.2047, -0.9456,  0.5234],
         ...,
         [-0.6674,  2.4974, -2.3734],
         [ 0.3413,  0.3412, -0.1284],
         [-1.4071, -0.5854,  0.7480]],

        [[-0.3512, -0.1415,  1.0942],
         [-1.4653, -0.5968, -0.5249],
         [-0.5636,  0.3512, -0.1862],
         ...,
         [ 0.3621, -0.2781, -1.6743],
         [-0.9611, -0.1609,  1.8947],
         [-0.3425, -0.3078, -0.2329]],

        ...,

        [[ 0.4298,  0.9989,  1.1082],
         [ 0.3118,  0.1504,  1.5492],
         [ 0.1490, -1.4251, -0.5784],
         ...,
         [ 2.3798,  1.2049,  0.1407],
         [-0.9221,  1.3794, -0.6532],
         [-0.5985,  0.4523, -0.6498]],

        [[

In [None]:
x[0, 0, :] = 0.37
y[:, 0, 0] = 0.37
x, y

(tensor([[[ 0.3700,  0.3700,  0.3700,  ...,  0.3700,  0.3700,  0.3700],
          [ 0.3700,  1.0205,  0.2047,  ..., -0.6674,  0.3413, -1.4071],
          [ 0.3700, -1.4653, -0.5636,  ...,  0.3621, -0.9611, -0.3425],
          ...,
          [ 0.3700,  0.3118,  0.1490,  ...,  2.3798, -0.9221, -0.5985],
          [ 0.3700,  1.0841,  1.2245,  ...,  0.0734,  0.3956, -0.0513],
          [ 0.3700,  0.5091,  0.5157,  ...,  0.2007,  0.9563, -0.8880]],
 
         [[ 0.3922,  0.8031,  0.5586,  ...,  0.5706,  1.1159, -0.1373],
          [-0.2979,  1.2947, -0.9456,  ...,  2.4974,  0.3412, -0.5854],
          [-0.1415, -0.5968,  0.3512,  ..., -0.2781, -0.1609, -0.3078],
          ...,
          [ 0.9989,  0.1504, -1.4251,  ...,  1.2049,  1.3794,  0.4523],
          [ 0.9639,  1.6826,  1.0754,  ...,  0.9314, -0.0096, -0.1676],
          [-0.8348,  0.4865,  0.6066,  ...,  1.3751, -0.5372, -0.2740]],
 
         [[-0.2232, -0.4847,  1.1859,  ...,  1.3367,  0.2214,  0.2265],
          [-1.2972, -0.5633,

### Indexing
Similar to NumPy indexing.

In [5]:
tensor = torch.arange(1, 10).reshape(1, 3, 3)
tensor, tensor.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [7]:
tensor2 = torch.arange(1, 10).reshape(3, 3)
tensor2, tensor.shape

(tensor([[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]),
 torch.Size([1, 3, 3]))

In [8]:
tensor[0], tensor2[0]

(tensor([[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]),
 tensor([1, 2, 3]))

In [9]:
tensor[0, 1], tensor2[0, 1]

(tensor([4, 5, 6]), tensor(2))

In [10]:
tensor[0, 1, :], tensor2[0, :]

(tensor([4, 5, 6]), tensor([1, 2, 3]))

In [17]:
tensor[0, 0, 1], tensor2[0, 1] #So for the first tensor it's like the first example, if you had examples at the beginning.

(tensor(2), tensor(2))

In [18]:
tensor[:, :, 0]

tensor([[1, 4, 7]])

In [20]:
# Get all values of dim 0, but only the 1 index value of 1st and 2nd dim
display(tensor[0, 1, 1])
display(tensor[:, 1, 1])

tensor(5)

tensor([5])

In [24]:
display(tensor[:][0][1])
display(tensor[0,0,1])

tensor([4, 5, 6])

tensor(2)

In [26]:
display(tensor[0][2][2])
display(tensor[:, 2, 2])

tensor([9])

## PyTorch tensors and NumPy

PyTorch can interact ith NumPy

- NumPy to PyTorch: torch.from_numpy(ndarray)
- PyTorch Tensor to NumPy: torch.Tensor.numpy()

In [28]:
# NumPy array to Tensor
import torch
import numpy as np

array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array) #pytorch reflects numpy's default float64 type unless you specify otherwise.
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [29]:
array.dtype, tensor.dtype

(dtype('float64'), torch.float64)

In [31]:
# Let's see what we can do with types
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array).type(torch.float32)
array.dtype, tensor.dtype

(dtype('float64'), torch.float32)

In [32]:
# If we change array, will tensor change?
array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]), tensor([1., 2., 3., 4., 5., 6., 7.]))

In [33]:
# Answer is no.

In [35]:
#Tensor to NumPy array
tensor = torch.ones(7)
tensor

tensor([1., 1., 1., 1., 1., 1., 1.])

In [36]:
numpy_tensor = tensor.numpy()
display(tensor)
display(numpy_tensor)
tensor.dtype, numpy_tensor.dtype

tensor([1., 1., 1., 1., 1., 1., 1.])

array([1., 1., 1., 1., 1., 1., 1.], dtype=float32)

(torch.float32, dtype('float32'))

In [37]:
# Change the tensor, what happens to NumPy Tensor?
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

In [38]:
# Answer is, again, nothing happens. They're not a "view" of each other.

### REPRODUCIBILITY
Let's take randomness out of it. It does warn you it might be a bit slower

"Deterministic operations are often slower than nondeterministic operations, so single-run performance may decrease for your model. However, determinism may save time in development by facilitating experimentation, debugging, and regression testing."
[Randomness](https://pytorch.org/docs/stable/notes/randomness.html)

Reduce randomness through a choice of Random Seed.

In [44]:
torch.rand(3,3)

tensor([[0.0666, 0.4437, 0.3970],
        [0.3873, 0.8852, 0.3361],
        [0.9485, 0.6775, 0.5284]])

In [46]:
import torch

# Create 2 random tensors
random_tensor_a = torch.rand(3, 4)
random_tensor_b = torch.rand(3, 4)
random_tensor_a, random_tensor_b

(tensor([[2.7432e-01, 5.0127e-01, 3.7443e-01, 7.5925e-01],
         [7.4464e-01, 9.3932e-02, 8.3673e-01, 6.6380e-01],
         [8.8637e-01, 4.8061e-01, 5.1773e-04, 5.9743e-01]]),
 tensor([[0.3761, 0.8186, 0.5187, 0.2612],
         [0.7272, 0.7018, 0.3157, 0.6203],
         [0.8976, 0.0164, 0.6051, 0.4654]]))

In [47]:
random_tensor_a == random_tensor_b

tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

In [48]:
# Set a random seed
RANDOM_SEED = 2
torch.manual_seed(RANDOM_SEED)

<torch._C.Generator at 0x7d137af84c30>

In [50]:
random_tensor_c = torch.rand(3, 4)
random_tensor_d = torch.rand(3, 4)
random_tensor_c, random_tensor_d

(tensor([[0.4525, 0.6317, 0.4760, 0.2200],
         [0.2166, 0.2571, 0.0458, 0.1755],
         [0.6177, 0.8291, 0.5246, 0.2708]]),
 tensor([[0.7197, 0.3081, 0.3892, 0.2259],
         [0.3430, 0.0367, 0.7133, 0.6944],
         [0.5993, 0.7455, 0.7119, 0.5221]]))

In [51]:
# Try running the above cells again, and you'll see you get the same results again
torch.manual_seed(RANDOM_SEED)
random_tensor_e = torch.rand(3, 4)
torch.manual_seed(RANDOM_SEED)
random_tensor_f = torch.rand(3, 4)
display(random_tensor_e)
display(random_tensor_f)
random_tensor_e == random_tensor_f

tensor([[0.6147, 0.3810, 0.6371, 0.4745],
        [0.7136, 0.6190, 0.4425, 0.0958],
        [0.6142, 0.0573, 0.5657, 0.5332]])

tensor([[0.6147, 0.3810, 0.6371, 0.4745],
        [0.7136, 0.6190, 0.4425, 0.0958],
        [0.6142, 0.0573, 0.5657, 0.5332]])

tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

### GPU ACCESS
CUDA + NVIDIA Hardware + PyTorch together working behind the scenes!

1. Use Google Colab free or pro versions
2. Use your own GPU
3. Rent a Cloud GPU on Google Cloud Platform, AWS, Azure, etc etc.

Pytorch + GPU Drivers (CUDA) takes a bit to set up. Refer to the PyTorch website for further instruction.

In [1]:
## Check for GPU Access
!nvidia-smi

Tue Sep  3 04:47:51 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   63C    P8              12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
import torch
torch.cuda.is_available()

True

In [4]:
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

on PyTorch "Device-agnostic code" has some stuff about setting torch.device("cuda") or torch.device("cpu")

PyTorch could run computations on either, so it's best practice to setup device-agnostic code.

So let's see some device-agnostic code now...

In [5]:
tensor = torch.tensor([1, 2, 3])
print(tensor, tensor.device)

tensor([1, 2, 3]) cpu


In [6]:
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')

Thus, now you see the reason for "device-agnostic code" having a section on explaining that. It's essentially just like "debug" and "release" directives which just help us move from one environment to another easily.

In [8]:
tensor_on_cpu = tensor_on_gpu.to("cpu")
tensor_on_cpu, tensor_on_cpu.device

(tensor([1, 2, 3]), device(type='cpu'))

In [9]:
tensor_on_gpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

In [13]:
# Now look at that, we can't quite just work with the tensor on GPU without being careful.
numpy_array_back = tensor_on_gpu.cpu().numpy()
numpy_array_back #You can't print tensor_back.device bc this is an ndarray

array([1, 2, 3])