<a href="https://colab.research.google.com/github/capabledjay/Deep_Learning_With_Pytorch/blob/main/pytorch_fundamental.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Pytorch Fundamentals

Resourse notebook: https://www.learnpytorch.io/00_pytorch_fundamentals/


In [None]:
!nvidia-smi


/bin/bash: nvidia-smi: command not found


In [None]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(torch.__version__)

2.0.1+cu118


## Introduction to Tensors
Tensors are the fundamental building block of machine and deep learning.
Their job is to represent data in a numerical way. For example, you could represent an image as a tensor with shape [3, 224, 224] which would mean [colour_channels, height, width], as in the image has 3 colour channels (red, green, blue), a height of 224 pixels and a width of 224 pixels.

### Creating tensors


##### Scalar
A scalar is a single number and in tensor-speak it's a zero dimension tensor.
####

In [None]:
scalar = torch.tensor(7)
scalar

tensor(7)

In [None]:
# to check for the dimension of the tensor
scalar.ndim

0

In [None]:
# Get tensor back as a Py int
scalar.item()

7

### vector.
A vector is a single dimension tensor but can contain many numbers.
As in, you could have a vector [3, 2] to describe [bedrooms, bathrooms] in your house. Or you could have [3, 2, 2] to describe [bedrooms, bathrooms, car_parks] in your house.
The important trend here is that a vector is flexible in what it can represent (the same with tensors).

In [None]:
# Vector
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

In [None]:
# Check the number of dimensions of vector
vector.ndim

1

You can tell the number of dimensions a tensor in PyTorch has by the number of square brackets on the outside ([) and you only need to count one side.

Another important concept for tensors is their shape attribute. The shape tells you how the elements inside them are arranged.

In [None]:
# Check shape of vector
vector.shape

torch.Size([2])

In [None]:
# Matrix
MATRIX = torch.tensor([[7, 8],
                       [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

Matrices are as flexible as vectors, except they've got an extra dimension.

In [None]:
# Check number of dimensions
MATRIX.ndim

2

In [None]:
MATRIX.shape

torch.Size([2, 2])

In [None]:
 # Tensor
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

In [None]:
# Check number of dimensions for TENSOR
TENSOR.ndim

3

In [None]:
# Check shape of TENSOR
TENSOR.shape

torch.Size([1, 3, 3])

Alright, it outputs torch.Size([1, 3, 3]).

The dimensions go outer to inner.

That means there's 1 dimension of 3 by 3.

In [None]:
ex = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [23,45,67],
                        [2, 4, 5]]])
ex.shape

torch.Size([1, 4, 3])

### Random tensor

when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've being doing).
Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

- In essence:

Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...

As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers.


https://pytorch.org/docs/stable/generated/torch.rand.html

In [None]:
# Create a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtypeS

(tensor([[0.8735, 0.3544, 0.3589, 0.0126],
         [0.5254, 0.5635, 0.0786, 0.4689],
         [0.4702, 0.8953, 0.2103, 0.1890]]),
 torch.float32)


The flexibility of torch.rand() is that we can adjust the size to be whatever we want.

For example, say you wanted a random tensor in the common image shape of [224, 224, 3] ([height, width, color_channels]).

In [None]:
# Create a random tensor of size (224, 224, 3)
random_image_size_tensor = torch.rand(size=(3,224, 224))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([3, 224, 224]), 3)

In [None]:
br = torch.rand(2,8,2)
br

tensor([[[0.6983, 0.4422],
         [0.7733, 0.1523],
         [0.7349, 0.8948],
         [0.1700, 0.9402],
         [0.0172, 0.4256],
         [0.8873, 0.2292],
         [0.7505, 0.5905],
         [0.9046, 0.4819]],

        [[0.4433, 0.1452],
         [0.7549, 0.1878],
         [0.5214, 0.5514],
         [0.0203, 0.3691],
         [0.9530, 0.8714],
         [0.3053, 0.2304],
         [0.6446, 0.7426],
         [0.9741, 0.3512]]])

### Zeros and ones¶
Sometimes you'll just want to fill tensors with zeros or ones.

This happens a lot with masking (like masking some of the values in one tensor with zeros to let a model know not to learn them).

Let's create a tensor full of zeros with torch.zeros()

Again, the size parameter comes into play.

In [None]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype


(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [None]:

# We can do the same to create a tensor of all ones except using
#torch.ones() instead.

# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

### Creating a range and tensors like

Sometimes you might want a range of numbers, such as 1 to 10 or 0 to 100.

You can use torch.arange(start, end, step) to do so.

Where:

start = start of range (e.g. 0)
end = end of range (e.g. 10)
step = how many steps in between each value (e.g. 1)

In [None]:

# Create a range of values 0 to 10
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
range = torch.arange(0,100,2)
range

tensor([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
        36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,
        72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])


Sometimes you might want one tensor of a certain type with the same shape as another tensor.

For example, a tensor of all zeros with the same shape as a previous tensor.

To do so you can use torch.zeros_like(input) or torch.ones_like(input) which return a tensor filled with zeros or ones in the same shape as the input respectively.

In [None]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [None]:
range_1 = torch.ones_like(range)
range_1

tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1])

### Tensor datatypes
There are many different tensor datatypes available in [PyTorch](https://pytorch.org/docs/stable/tensors.html#data-types) .
Some are specific for CPU and some are better for GPU.

Getting to know which is which can take some time.
Generally if you see torch.cuda anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA).

The most common type (and generally the default) is torch.float32 or torch.float.
This is referred to as "32-bit floating point".

But there's also 16-bit floating point (torch.float16 or torch.half) and 64-bit floating point (torch.float64 or torch.double).

And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.

Plus more!

- Note: An integer is a flat round number like 7 whereas a float has a decimal 7.0.

In [None]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations performed on the tensor are recorded

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

The reason for all of these is to do with precision in computing.

Precision is the amount of detail used to describe a number.

The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.

This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on, the more compute you have to use.

So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate)

Aside from shape issues (tensor shapes don't match up), two of the other most common issues you'll come across in PyTorch are datatype and device issues.

For example, one of tensors is torch.float32 and the other is torch.float16 (PyTorch often likes tensors to be the same format).

Or one of your tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be on the same device).

We'll see more of this device talk later on.

In [None]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16,
) # torch.half would also work

print(float_16_tensor.dtype)
float_16_tensor.device

torch.float16


device(type='cpu')

In [None]:
 # changing dtype 16 to dtype 32

float_16_tensor = float_32_tensor.type(torch.float32)
float_16_tensor.dtype

torch.float32

### Getting information from tensors
Once you've created tensors (or someone else or a PyTorch module has created them for you), you might want to get some information from them.

We've seen these before but three of the most common attributes you'll want to find out about tensors are:

- shape - what shape is the tensor? (some operations require specific shape rules)
- dtype - what datatype are the elements within the tensor stored in?
- device - what device is the tensor stored on? (usually GPU or CPU)

Let's create a random tensor and find out details about it.

In [None]:
# Create a tensor
some_tensor = torch.rand(3, 4)

some_tensor
# Find out details about it
print(some_tensor)
print(some_tensor.size())
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}")
# will default to CPU

tensor([[0.4453, 0.3855, 0.3359, 0.8075],
        [0.5886, 0.3414, 0.7554, 0.6098],
        [0.4715, 0.0811, 0.6481, 0.0993]])
torch.Size([3, 4])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Matrix multiplication (is all you need)

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

PyTorch implements matrix multiplication functionality in the torch.matmul() method.

The main two rules for matrix multiplication to remember are:

The inner dimensions must match:
* (3, 2) @ (3, 2) won't work
* (2, 3) @ (3, 2) will work
* (3, 2) @ (2, 3) will work

The resulting matrix has the shape of the outter dimension

* (2, 3) @ (3,2) --> (2 ,2 )
* (3, 2) @ (2, 3) --> (3, 3)

Note: "@" in Python is the symbol for matrix multiplication.

http://matrixmultiplication.xyz/

Let's create a tensor and perform element-wise multiplication and matrix multiplication on it.

In [15]:
import torch
tensor = torch.tensor([1, 2, 3])
print(tensor)
print(tensor.shape)

tensor([1, 2, 3])
torch.Size([3])



The difference between element-wise multiplication and matrix multiplication is the addition of values.

For our tensor variable with values [1, 2, 3]:

|Operation |Calculation| Code|

|Element-wise multiplication |[1*1, 2*2, |3*3] = [1, 4, 9]|tensor * tensor|

|Matrix multiplication	|[1*1 + 2*2 + 3*3] = [14]	| tensor.matmul(tensor)|

In [3]:
# Element-wise matrix multiplication
tensor * tensor

tensor([1, 4, 9])

In [4]:
# Matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

In [5]:
# Can also use the "@" symbol for matrix multiplication, though not recommended
tensor @ tensor

tensor(14)

You can do matrix multiplication by hand but it's not recommended.

The in-built torch.matmul() method is faster.

In [6]:
%%time
# Matrix multiplication by hand
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
value

CPU times: user 1.78 ms, sys: 0 ns, total: 1.78 ms
Wall time: 7.91 ms


tensor(14)

In [7]:
%%time
torch.matmul(tensor, tensor)

CPU times: user 52 µs, sys: 10 µs, total: 62 µs
Wall time: 63.7 µs


tensor(14)


One of the most common errors in deep learning (shape errors)
Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

In [9]:
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error because it doesn"t
# meet the first rule of matrix mult. inner dimension must match )

RuntimeError: ignored

To fix this error, we need to fix the shapes of the tensors so that the inner dimension can match.
And one of the ways to do this is with a transpose (switch the dimensions of a given tensor)

You can perform transposes in PyTorch using either:

* torch.transpose(input, dim0, dim1) - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be swapped.
* tensor.T - where tensor is the desired tensor to transpose.

In [10]:
 # View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [11]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [12]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output)
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])



You can also use torch.mm() which is a short for torch.matmul().

In [13]:
# torch.mm is a shortcut for matmul
torch.mm(tensor_A,tensor_B.T)

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])


Without the transpose, the rules of matrix mulitplication aren't fulfilled and we get an error like above.

[How about a visual?](http://matrixmultiplication.xyz/)

You can create your own matrix multiplication visuals like this at http://matrixmultiplication.xyz/.

Note: A matrix multiplication like this is also referred to as the dot product of two matrices.

### Manipulating tensors (tensor operations)

In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:

* Addition
* Substraction
* Multiplication (element-wise)
* Division
* Matrix multiplication

And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

Stacking these building blocks in the right way, you can create the most sophisticated of neural networks (just like lego!).

#### Basic operations.
Let's start with a few of the fundamental operations, addition (+), subtraction (-), mutliplication (*).

In [None]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [None]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])


Notice how the tensor values above didn't end up being tensor([110, 120, 130]), this is because the values inside the tensor don't change unless they're reassigned.

In [None]:
# Tensors don't change unless reassigned
tensor

tensor([1, 2, 3])

In [None]:
# Subtract and reassign
tensor - 10


tensor([-9, -8, -7])

In [None]:
# PyTorch also has a bunch of built-in functions like torch.mul()
#(short for multiplication) and torch.add() to perform basic operations.

# Can also use torch functions
torch.multiply(tensor, 10)

tensor([10, 20, 30])

In [None]:
# However, it's more common to use the operator symbols like * instead of torch.mul()

# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


#### Tensors aggregation

Finding the min, max, mean, sum, etc (aggregation)
Now we've seen a few ways to manipulate tensors, let's run through a few ways to aggregate them (go from more values to less values).

First we'll create a tensor and then find the max, min, mean and sum of it.

In [16]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [20]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Sum: 450


In [18]:
print(f"Mean: {x.mean()}")  # this will error


RuntimeError: ignored

You may find some methods such as torch.mean() require tensors to be in torch.float32 (the most common) or another specific datatype, otherwise the operation will fail which is one of the most likely error to come across when deal with nural network.

looking at the datatype of "x" which is int64 or long tensors and it turns out that the torch.mean() function cant work with int64, so we have to change the datatype to float

In [22]:
print(f"Mean: {x.type(torch.float64).mean()}") # won't work without float datatype


Mean: 45.0


In [24]:
# You can also do the same as above with torch methods.
print(torch.max(x)),
print(torch.min(x)),
print(torch.mean(x.type(torch.float32))),
print(torch.sum(x))

tensor(90)
tensor(0)
tensor(45.)
tensor(450)


#### Positional min/max
You can also find the index of a tensor where the max or minimum occurs with torch.argmax() and torch.argmin() respectively.

This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the softmax activation function).

In [25]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


#### Change tensor datatype
As mentioned, a common issue with deep learning operations is having your tensors in different datatypes.

If one tensor is in torch.float64 and another is in torch.float32, you might run into some errors.

But there's a fix.

You can change the datatypes of tensors using torch.Tensor.type(dtype=None) where the dtype parameter is the datatype you'd like to use.

First we'll create a tensor and check it's datatype (the default is torch.float32).

In [27]:
# Create a tensor and check its datatype
dt_x = torch.arange(10., 100., 10.)
dt_x.dtype

torch.float32

In [33]:
# Now we'll create another tensor the same as before but change
# its datatype to torch.float16.

dt_x_float16 = dt_x.type(torch.float16)
dt_x_float16


tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [34]:
# Create a int8 tensor
dt_x_int8 = dt_x.type(torch.int8)
dt_x_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)


#### Reshaping, stacking, squeezing and unsqueezing
Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them.

To do so, some popular methods are:

* torch.reshape(input, shape)
Reshapes tensors to a define shape, can also use torch.Tensor.reshape().

* torch.Tensor.view(shape)
Returns a view of the original tensor in a different shape but shares the same data as the original tensor.

* torch.stack(tensors, dim=0)
Concatenates a sequence of tensors along a new dimension( side by side or on top each other "vstack"), all tensors must be same size.

* torch.squeeze(input)
Squeezes input to remove all the dimenions with value 1.

* torch.unsqueeze(input, dim)
Returns input with a dimension value of 1 added at dim.

* torch.permute(input, dims)
Returns a view of the original input with its dimensions permuted (rearranged) to dims.

In [36]:
# Create a tensor
import torch
x = torch.arange(1., 10.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))