<a href="https://colab.research.google.com/github/capabledjay/Deep_Learning_With_Pytorch/blob/main/01_PyTorch_Fundamental.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Pytorch Fundamentals

Resourse notebook: https://www.learnpytorch.io/00_pytorch_fundamentals/


In [None]:
!nvidia-smi


/bin/bash: nvidia-smi: command not found


In [None]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(torch.__version__)

2.0.1+cu118


## Introduction to Tensors
Tensors are the fundamental building block of machine and deep learning.
Their job is to represent data in a numerical way. For example, you could represent an image as a tensor with shape [3, 224, 224] which would mean [colour_channels, height, width], as in the image has 3 colour channels (red, green, blue), a height of 224 pixels and a width of 224 pixels.

### Creating tensors


##### Scalar
A scalar is a single number and in tensor-speak it's a zero dimension tensor.
####

In [None]:
scalar = torch.tensor(7)
scalar

tensor(7)

In [None]:
# to check for the dimension of the tensor
scalar.ndim

0

In [None]:
# Get tensor back as a Py int
scalar.item()

7

### vector.
A vector is a single dimension tensor but can contain many numbers.
As in, you could have a vector [3, 2] to describe [bedrooms, bathrooms] in your house. Or you could have [3, 2, 2] to describe [bedrooms, bathrooms, car_parks] in your house.
The important trend here is that a vector is flexible in what it can represent (the same with tensors).

In [None]:
# Vector
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

In [None]:
# Check the number of dimensions of vector
vector.ndim

1

You can tell the number of dimensions a tensor in PyTorch has by the number of square brackets on the outside ([) and you only need to count one side.

Another important concept for tensors is their shape attribute. The shape tells you how the elements inside them are arranged.

In [None]:
# Check shape of vector
vector.shape

torch.Size([2])

In [None]:
# Matrix
MATRIX = torch.tensor([[7, 8],
                       [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

Matrices are as flexible as vectors, except they've got an extra dimension.

In [None]:
# Check number of dimensions
MATRIX.ndim

2

In [None]:
MATRIX.shape

torch.Size([2, 2])

In [None]:
 # Tensor
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

In [None]:
# Check number of dimensions for TENSOR
TENSOR.ndim

3

In [None]:
# Check shape of TENSOR
TENSOR.shape

torch.Size([1, 3, 3])

Alright, it outputs torch.Size([1, 3, 3]).

The dimensions go outer to inner.

That means there's 1 dimension of 3 by 3.

In [None]:
ex = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [23,45,67],
                        [2, 4, 5]]])
ex.shape

torch.Size([1, 4, 3])

### Random tensor

when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've being doing).
Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

- In essence:

Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...

As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers.


https://pytorch.org/docs/stable/generated/torch.rand.html

In [None]:
# Create a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

(tensor([[0.7111, 0.8073, 0.5544, 0.0411],
         [0.0853, 0.3423, 0.3698, 0.9162],
         [0.5859, 0.1308, 0.3887, 0.7044]]),
 torch.float32)


The flexibility of torch.rand() is that we can adjust the size to be whatever we want.

For example, say you wanted a random tensor in the common image shape of [224, 224, 3] ([height, width, color_channels]).

In [None]:
# Create a random tensor of size (224, 224, 3)
random_image_size_tensor = torch.rand(size=(3,224, 224))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([3, 224, 224]), 3)

In [None]:
br = torch.rand(2,8,2)
br

tensor([[[0.6864, 0.4334],
         [0.5526, 0.5496],
         [0.8339, 0.8593],
         [0.3090, 0.2148],
         [0.8744, 0.6257],
         [0.0743, 0.8579],
         [0.5279, 0.7529],
         [0.6846, 0.3628]],

        [[0.6420, 0.7663],
         [0.6816, 0.5441],
         [0.1423, 0.4845],
         [0.9353, 0.7832],
         [0.9244, 0.3467],
         [0.1676, 0.5647],
         [0.2798, 0.9789],
         [0.5698, 0.6920]]])

### Zeros and ones¶
Sometimes you'll just want to fill tensors with zeros or ones.

This happens a lot with masking (like masking some of the values in one tensor with zeros to let a model know not to learn them).

Let's create a tensor full of zeros with torch.zeros()

Again, the size parameter comes into play.

In [None]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype


(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [None]:

# We can do the same to create a tensor of all ones except using
#torch.ones() instead.

# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

### Creating a range and tensors like

Sometimes you might want a range of numbers, such as 1 to 10 or 0 to 100.

You can use torch.arange(start, end, step) to do so.

Where:

start = start of range (e.g. 0)
end = end of range (e.g. 10)
step = how many steps in between each value (e.g. 1)

In [None]:

# Create a range of values 0 to 10
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
range = torch.arange(0,100,2)
range

tensor([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
        36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,
        72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])


Sometimes you might want one tensor of a certain type with the same shape as another tensor.

For example, a tensor of all zeros with the same shape as a previous tensor.

To do so you can use torch.zeros_like(input) or torch.ones_like(input) which return a tensor filled with zeros or ones in the same shape as the input respectively.

In [None]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [None]:
range_1 = torch.ones_like(range)
range_1

tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1])

### Tensor datatypes
There are many different tensor datatypes available in [PyTorch](https://pytorch.org/docs/stable/tensors.html#data-types) .
Some are specific for CPU and some are better for GPU.

Getting to know which is which can take some time.
Generally if you see torch.cuda anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA).

The most common type (and generally the default) is torch.float32 or torch.float.
This is referred to as "32-bit floating point".

But there's also 16-bit floating point (torch.float16 or torch.half) and 64-bit floating point (torch.float64 or torch.double).

And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.

Plus more!

- Note: An integer is a flat round number like 7 whereas a float has a decimal 7.0.

In [None]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations performed on the tensor are recorded

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

The reason for all of these is to do with precision in computing.

Precision is the amount of detail used to describe a number.

The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.

This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on, the more compute you have to use.

So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate)

Aside from shape issues (tensor shapes don't match up), two of the other most common issues you'll come across in PyTorch are datatype and device issues.

For example, one of tensors is torch.float32 and the other is torch.float16 (PyTorch often likes tensors to be the same format).

Or one of your tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be on the same device).

We'll see more of this device talk later on.

In [None]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16,
) # torch.half would also work

print(float_16_tensor.dtype)
float_16_tensor.device

torch.float16


device(type='cpu')

In [None]:
 # changing dtype 16 to dtype 32

float_16_tensor = float_32_tensor.type(torch.float32)
float_16_tensor.dtype

torch.float32

### Getting information from tensors
Once you've created tensors (or someone else or a PyTorch module has created them for you), you might want to get some information from them.

We've seen these before but three of the most common attributes you'll want to find out about tensors are:

- shape - what shape is the tensor? (some operations require specific shape rules)
- dtype - what datatype are the elements within the tensor stored in?
- device - what device is the tensor stored on? (usually GPU or CPU)

Let's create a random tensor and find out details about it.

In [None]:
# Create a tensor
some_tensor = torch.rand(3, 4)

some_tensor
# Find out details about it
print(some_tensor)
print(some_tensor.size())
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}")
# will default to CPU

tensor([[0.0778, 0.9481, 0.3571, 0.8013],
        [0.7830, 0.6475, 0.9675, 0.4556],
        [0.0968, 0.2390, 0.5183, 0.3644]])
torch.Size([3, 4])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Matrix multiplication (is all you need)

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

PyTorch implements matrix multiplication functionality in the torch.matmul() method.

The main two rules for matrix multiplication to remember are:

The inner dimensions must match:
* (3, 2) @ (3, 2) won't work
* (2, 3) @ (3, 2) will work
* (3, 2) @ (2, 3) will work

The resulting matrix has the shape of the outter dimension

* (2, 3) @ (3,2) --> (2 ,2 )
* (3, 2) @ (2, 3) --> (3, 3)

Note: "@" in Python is the symbol for matrix multiplication.

http://matrixmultiplication.xyz/

Let's create a tensor and perform element-wise multiplication and matrix multiplication on it.

In [None]:
import torch
tensor = torch.tensor([1, 2, 3])
print(tensor)
print(tensor.shape)

tensor([1, 2, 3])
torch.Size([3])



The difference between element-wise multiplication and matrix multiplication is the addition of values.

For our tensor variable with values [1, 2, 3]:

|Operation |Calculation| Code|

|Element-wise multiplication |[1*1, 2*2, |3*3] = [1, 4, 9]|tensor * tensor|

|Matrix multiplication	|[1*1 + 2*2 + 3*3] = [14]	| tensor.matmul(tensor)|

In [None]:
# Element-wise matrix multiplication
tensor * tensor

tensor([1, 4, 9])

In [None]:
# Matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

In [None]:
# Can also use the "@" symbol for matrix multiplication, though not recommended
tensor @ tensor

tensor(14)

You can do matrix multiplication by hand but it's not recommended.

The in-built torch.matmul() method is faster.

In [None]:
tensor

tensor([1, 2, 3])

In [None]:
%%time
# Matrix multiplication by hand
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
  value += tensor[i]*tensor[i]
value

TypeError: ignored

In [None]:
%%time
torch.matmul(tensor, tensor)

CPU times: user 60 µs, sys: 8 µs, total: 68 µs
Wall time: 72.7 µs


tensor(14)


One of the most common errors in deep learning (shape errors)
Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

In [None]:
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error because it doesn"t
# meet the first rule of matrix mult. inner dimension must match )

RuntimeError: ignored

To fix this error, we need to fix the shapes of the tensors so that the inner dimension can match.
And one of the ways to do this is with a transpose (switch the dimensions of a given tensor)

You can perform transposes in PyTorch using either:

* torch.transpose(input, dim0, dim1) - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be swapped.
* tensor.T - where tensor is the desired tensor to transpose.

In [None]:
 # View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [None]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [None]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output)
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])



You can also use torch.mm() which is a short for torch.matmul().

In [None]:
# torch.mm is a shortcut for matmul
torch.mm(tensor_A,tensor_B.T)

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])


Without the transpose, the rules of matrix mulitplication aren't fulfilled and we get an error like above.

[How about a visual?](http://matrixmultiplication.xyz/)

You can create your own matrix multiplication visuals like this at http://matrixmultiplication.xyz/.

Note: A matrix multiplication like this is also referred to as the dot product of two matrices.

### Manipulating tensors (tensor operations)

In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:

* Addition
* Substraction
* Multiplication (element-wise)
* Division
* Matrix multiplication

And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

Stacking these building blocks in the right way, you can create the most sophisticated of neural networks (just like lego!).

#### Basic operations.
Let's start with a few of the fundamental operations, addition (+), subtraction (-), mutliplication (*).

In [None]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [None]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])


Notice how the tensor values above didn't end up being tensor([110, 120, 130]), this is because the values inside the tensor don't change unless they're reassigned.

In [None]:
# Tensors don't change unless reassigned
tensor

tensor([1, 2, 3])

In [None]:
# Subtract and reassign
tensor - 10


tensor([-9, -8, -7])

In [None]:
# PyTorch also has a bunch of built-in functions like torch.mul()
#(short for multiplication) and torch.add() to perform basic operations.

# Can also use torch functions
torch.multiply(tensor, 10)

tensor([10, 20, 30])

In [None]:
# However, it's more common to use the operator symbols like * instead of torch.mul()

# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


#### Tensors aggregation

Finding the min, max, mean, sum, etc (aggregation)
Now we've seen a few ways to manipulate tensors, let's run through a few ways to aggregate them (go from more values to less values).

First we'll create a tensor and then find the max, min, mean and sum of it.

In [None]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [None]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Sum: 450


In [None]:
print(f"Mean: {x.mean()}")  # this will error


RuntimeError: ignored

You may find some methods such as torch.mean() require tensors to be in torch.float32 (the most common) or another specific datatype, otherwise the operation will fail which is one of the most likely error to come across when deal with nural network.

looking at the datatype of "x" which is int64 or long tensors and it turns out that the torch.mean() function cant work with int64, so we have to change the datatype to float

In [None]:
print(f"Mean: {x.type(torch.float64).mean()}") # won't work without float datatype


Mean: 45.0


In [None]:
# You can also do the same as above with torch methods.
print(torch.max(x)),
print(torch.min(x)),
print(torch.mean(x.type(torch.float32))),
print(torch.sum(x))

tensor(90)
tensor(0)
tensor(45.)
tensor(450)


#### Positional min/max
You can also find the index of a tensor where the max or minimum occurs with torch.argmax() and torch.argmin() respectively.

This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the softmax activation function).

In [None]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


#### Change tensor datatype
As mentioned, a common issue with deep learning operations is having your tensors in different datatypes.

If one tensor is in torch.float64 and another is in torch.float32, you might run into some errors.

But there's a fix.

You can change the datatypes of tensors using torch.Tensor.type(dtype=None) where the dtype parameter is the datatype you'd like to use.

First we'll create a tensor and check it's datatype (the default is torch.float32).

In [None]:
# Create a tensor and check its datatype
dt_x = torch.arange(10., 100., 10.)
dt_x.dtype

torch.float32

In [None]:
# Now we'll create another tensor the same as before but change
# its datatype to torch.float16.

dt_x_float16 = dt_x.type(torch.float16)
dt_x_float16


tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [None]:
# Create a int8 tensor
dt_x_int8 = dt_x.type(torch.int8)
dt_x_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)


#### Reshaping, stacking, squeezing and unsqueezing
Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them.

To do so, some popular methods are:

* torch.reshape(input, shape)
Reshapes tensors to a define shape, can also use torch.Tensor.reshape().

* torch.Tensor.view(shape)
Returns a view of the original tensor in a different shape but shares the same data as the original tensor.

* torch.stack(tensors, dim=0)
Concatenates a sequence of tensors along a new dimension( side by side or on top each other "vstack"), all tensors must be same size.

* torch.squeeze(input)
Squeezes input to remove all the dimenions with value 1.

* torch.unsqueeze(input, dim)
Returns input with a dimension value of 1 added at dim.

* torch.permute(input, dims)
Returns a view of the original input with its dimensions permuted (rearranged) to dims.

In [None]:
# Create a tensor
import torch
x = torch.arange(1., 10.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

In [None]:
# Add an extra dimension
x_reshaped = x.reshape(1, 7)
x_reshaped, x_reshaped.shape

# this will give us error because, i'm trying to fit 9 elements in to 7 element

RuntimeError: ignored

In [None]:
# Add an extra dimension
x_reshaped = x.reshape(1,9)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [None]:
# Add an extra dimension
xreshaped = x.reshape(9,1)
xreshaped, xreshaped.shape

(tensor([[1.],
         [2.],
         [3.],
         [4.],
         [5.],
         [6.],
         [7.],
         [8.],
         [9.]]),
 torch.Size([9, 1]))

In [None]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = torch.arange(1,8)
z, z.shape

(tensor([1, 2, 3, 4, 5, 6, 7]), torch.Size([7]))

In [None]:
# change the view
z = x.view(1, 9)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

Remember though, changing the view of a tensor with torch.view() really only creates a new view of the same tensor.

So changing the view changes the original tensor too.

In [None]:
# Changing z changes x
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.]))


If we wanted to stack our new tensor on top of itself five times, we could do so with torch.stack().

In [None]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0) # try changing dim to dim=1 and see what happens
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.]])

How about removing all single dimensions from a tensor?

To do so you can use torch.squeeze() (I remember this as squeezing the tensor to only have dimensions over 1).

In [None]:
# Remove all single dimension from a target tensor
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimension from x_reshaped
x_squeezed = x_reshaped.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]])
Previous shape: torch.Size([1, 9])

New tensor: tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.])
New shape: torch.Size([9])


And to do the reverse of torch.squeeze() you can use torch.unsqueeze() to add a dimension value of 1 at a specific index.

In [None]:
# add a single dimension to a target tensor
print(f"Previous tensor: {x_squeezed}")
print(f"Previous shape: {x_squeezed.shape}")

## Add an extra dimension with unsqueeze
x_unsqueezed = x_squeezed.unsqueeze(dim=0)
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")

Previous tensor: tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.])
Previous shape: torch.Size([9])

New tensor: tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]])
New shape: torch.Size([1, 9])


You can also rearrange the order of axes values with torch.permute(input, dims), where the input gets turned into a view with new dims.

https://pytorch.org/docs/stable/generated/torch.permute.html


Note: Because permuting returns a view (shares the same data as the original), the values in the permuted tensor will be the same as the original tensor and if you change the values in the view, it will change the values of the original

In [None]:
x = torch.randn(2, 3, 5)
x


tensor([[[ 0.3544,  1.5348,  2.2011, -0.7174,  1.3705],
         [ 1.0776, -0.5504, -0.1082, -1.2674, -0.4907],
         [ 1.0082, -0.1155,  0.0454,  1.4390, -1.0486]],

        [[ 0.1140, -0.5333, -1.5091,  0.0754,  1.6487],
         [ 0.7953, -0.6653,  0.2446, -1.2251,  0.1467],
         [-1.9990,  0.5144, -0.9029,  1.4824,  1.1535]]])

In [None]:
print(x.size())
#Permute the original tensor to rearrange the axis order using index of the dimension
torch.permute(x, (2,1,0)).size()

torch.Size([2, 3, 5])


torch.Size([5, 3, 2])

In [None]:
# Create tensor with specific shape
x_original = torch.rand(size=(224, 224, 3))
# x_original
# Permute the original tensor to rearrange the axis order
x_permuted = x_original.permute(2, 0, 1)
# shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


#### Indexing (selecting data from tensors)
Sometimes you'll want to select specific data from tensors (for example, only the first column or second row).

To do so, you can use indexing.

If you've ever done indexing on Python lists or NumPy arrays, indexing in PyTorch with tensors is very similar.

In [None]:
# Create a tensor
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [None]:
# Let's index bracket by bracket
print(f"First square bracket:\n{x[0]}")
print(f"Second square bracket: {x[0][0]}")
print(f"Third square bracket: {x[0][0][0]}")

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


You can also use : to specify "all values in this dimension" and then use a comma (,) to add another dimension.

In [None]:
# Get all values of 0th dimension and the 0 index of 1st dimension
x[:, 0]

tensor([[1, 2, 3]])

In [None]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]

tensor([[2, 5, 8]])

In [None]:
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
x[:, 1, 1]

tensor([5])

In [None]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :] # same as x[0][0]

tensor([1, 2, 3])

Indexing can be quite confusing to begin with, especially with larger tensors (I still have to try indexing multiple times to get it right). But with a bit of practice and following the data explorer's motto (*visualize, visualize, visualize*), you'll start to get the hang of it.

### PyTorch tensors & NumPy
Since NumPy is a popular Python numerical computing library and  PyTorch has functionality to interact with it very nicely.

Let's say you have your data in numpy, the two main methods you'll want to use for NumPy to PyTorch (and back again) are:

* ***torch.from_numpy***(ndarray)
NumPy array to PyTorch tensor.
* ***torch.Tensor.numpy***()
PyTorch tensor to NumPy array.

Let's try them out.

In [None]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, array.dtype, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 dtype('float64'),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))


Note: By default, NumPy arrays are created with the datatype float64 and if you convert it to a PyTorch tensor, it'll keep the same datatype (as above).

However, many PyTorch calculations default to using float32.

So if you want to convert your NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor (float32), you can use tensor = torch.from_numpy(array).type(torch.float32).

Because we reassigned tensor above, if you change the tensor, the array stays the same.

In [None]:
tensor = torch.from_numpy(array).type(torch.float32)
tensor
# tensor.dtype

tensor([1., 2., 3., 4., 5., 6., 7.])

In [None]:
# Change the array, keep the tensor
array = array + 1
array,tensor

(array([2., 3., 4., 5., 6., 7., 8.]), tensor([1., 2., 3., 4., 5., 6., 7.]))

In [None]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))


And the same rule applies as above, if you change the original tensor, the new numpy_tensor stays the same.

In [None]:
# Change the tensor, keep the array the same
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

### Reproducibility (trying to take the random out of random)

As you learn more about neural networks and machine learning, you'll start to discover how much randomness plays a part.
Well, pseudorandomness that is. Because after all, as they're designed, a computer is fundamentally deterministic (each step is predictable) so the randomness they create are simulated randomness (though there is debate on this too, but since I'm not a computer scientist, I'll let you find out more yourself).

How does this relate to neural networks and deep learning then?

We've discussed neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to improve those random numbers using tensor operations (and a few other things we haven't discussed yet) to better describe patterns in data.

In short:

*start with random numbers* -> *tensor operations* -> *try to make better* (again and again and again)

Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness.

Why?

So you can perform repeatable experiments.

For example, you create an algorithm capable of achieving X performance.

And then your friend tries it out to verify you're not crazy.

How could they do such a thing?

That's where reproducibility comes in.

In other words, can you get the same (or very similar) results on your computer running the same code as I get on mine?

Let's see a brief example of reproducibility in PyTorch.

We'll start by creating two random tensors, since they're random, you'd expect them to be different right?

In [None]:
import torch

# Create two random tensors
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B

Tensor A:
tensor([[0.9279, 0.5251, 0.2649, 0.0012],
        [0.9150, 0.7956, 0.7839, 0.8019],
        [0.2431, 0.6610, 0.5384, 0.0024]])

Tensor B:
tensor([[0.4098, 0.9367, 0.7978, 0.4399],
        [0.3663, 0.4419, 0.9110, 0.5213],
        [0.0453, 0.1829, 0.2517, 0.4548]])

Does Tensor A equal Tensor B? (anywhere)


tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

Just as you might've expected, the tensors come out with different values.

But what if you wanted to created two random tensors with the same values.
As in, the tensors would still contain random values but they would be of the same flavor (i.e reduces the randomness).
To reduce randomness in neural network and PyTorch comes the concept of a ** random seed**

where seed is an integer (like 42 but it could be anything), essentially what random seed does is "flavour" the randomness

In [None]:
import torch

# Create two random tensors
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(random_tensor_A)
print(random_tensor_B)

print("\n Does Tensor A equal Tensor B? (anywhere)")

print(random_tensor_A == random_tensor_B)

tensor([[0.2071, 0.4555, 0.1133, 0.2157],
        [0.1925, 0.9753, 0.4528, 0.7645],
        [0.8878, 0.1839, 0.3233, 0.4757]])
tensor([[0.0596, 0.6914, 0.6497, 0.5110],
        [0.0972, 0.9576, 0.7176, 0.2409],
        [0.4836, 0.5331, 0.2669, 0.2871]])

 Does Tensor A equal Tensor B? (anywhere)
tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])


In [1]:
# To make the above code reproducible, let' set the random seed
import torch
import random

RANDOM_SEED=42 # try changing this to different values and see what happens to the numbers below

torch.manual_seed(seed=RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called
# Without this, tensor_D would be different to tensor_C

torch.manual_seed(seed=RANDOM_SEED) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Does Tensor C equal Tensor D? (anywhere)


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

Resource: What we've just covered only scratches the surface of reproducibility in PyTorch. For more, on reproducbility in general and random seeds, I'd checkout:
https://pytorch.org/docs/stable/notes/randomness.html

https://en.wikipedia.org/wiki/Random_seed

### Running tensors on GPUs (and making faster computations)

Deep learning algorithms require a lot of numerical operations.

And by default these operations are often done on a CPU (computer processing unit).

However, there's another common piece of hardware called a GPU (graphics processing unit), which is often much faster at performing the specific types of operations neural networks need (matrix multiplications) than CPUs.

Your computer might have one.

If so, you should look to use it whenever you can to train neural networks because chances are it'll speed up the training time dramatically.

There are a few ways to first get access to a GPU and secondly get PyTorch to use the GPU.

Note: When I reference "GPU" throughout this course, I'm referencing a Nvidia GPU with CUDA enabled (CUDA is a computing platform and API that helps allow GPUs be used for general purpose computing & not just graphics) unless otherwise specified.

GPUs = faster computation on numbers, thanks to CUDA + NVIDIA hardware + pyTorch working the scenes to make everthing hunky dory (good)

#### Getting a GPU

* Easiest - use google colab for free GPU (with options to upgrade as well)
* Use your own GPU - takes a little bit of set up and requires investment of buyings a GPU, there is a lot of options.
* Use cloud computing - GCP,AWS, AZure, these servies allow you to rent computer on cloud and access them

Personally, I use a combination of Google Colab and my own personal computer for small scale experiments  and go to cloud resources when I need more compute power.

To check if you've got access to a Nvidia GPU, you can run !nvidia-smi where the ! (also called bang) means "run this on the command line".

In [None]:
!nvidia-smi

Mon Jul 10 12:48:38 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   67C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

#### check for GPU access with PyTorch

You can test if PyTorch has access to a GPU using
*torch.cuda.is_available()*.

In [4]:
# Check for GPU
import torch
torch.cuda.is_available()

True

If the above outputs True, PyTorch can see and use the GPU, if it outputs False, it can't see the GPU and in that case, you'll have to go back through the installation steps.

Now, let's say you wanted to setup your code so it ran on CPU or the GPU if it was available.

That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using.

Let's create a device variable to store what kind of device is available.

In [5]:
# Set device type (device agnostic code )

device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [None]:
# Count number of devices
torch.cuda.device_count()

1

### Putting tensors (and models) on the GPU

Why do this?

GPUs offer far faster numerical computing than CPUs do and if a GPU isn't available

In [2]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)


tensor([1, 2, 3]) cpu


You can put tensors (and models, we'll see this later) on a specific device by calling to(device) on them. Where device is the target device you'd like the tensor (or model) to go to.

let's try and move the trnsors to GPU
because of our "device" agnostic code (see above)

In [6]:

# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')

Note: Putting a tensor on GPU using to(device) (e.g. some_tensor.to(device)) returns a copy of that tensor, e.g. the same tensor will be on CPU and GPU

In [11]:
print(tensor_on_gpu)
print(tensor, tensor.device)

tensor([1, 2, 3], device='cuda:0')
tensor([1, 2, 3]) cpu


### Moving tensors back to the CPU¶
What if we wanted to move the tensor back to CPU?

For example, you'll want to do this if you want to interact with your tensors with NumPy (NumPy does not leverage the GPU).

Let's try using the torch.Tensor.numpy() method on our tensor_on_gpu.

In [7]:
# If tensor is on GPU, can't transform it to NumPy (this will error)
tensor_on_gpu.numpy()

TypeError: ignored


Instead, to get a tensor back to CPU and usable with NumPy we can use Tensor.cpu().



In [8]:
# Instead, copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3])

The above returns a copy of the GPU tensor in CPU memory so the original tensor is still on GPU.

In [9]:
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')