# **PyTorch Fundamentals**

[PyTorch](https://pytorch.org/) is an open source machine learning and deep learning framework.

## **What for?**

PyTorch allows us to **manipulate and process data** and **write machine learning algorithms** using Python code.

## **Who uses?**

Many of the worlds largest technology companies such as [Meta](https://ai.facebook.com/blog/pytorch-builds-the-future-of-ai-and-machine-learning-at-facebook/) (Facebook), Tesla and Microsoft as well as artificial intelligence research companies such as [OpenAI](https://openai.com/blog/openai-pytorch/)  use PyTorch to power research and bring machine learning to their products. PyTorch is also used in other industries such as agriculture to power computer vision on [tractors](https://medium.com/pytorch/ai-for-ag-production-machine-learning-for-agriculture-e8cfdb9849a1).

![pytorch being used across industry and research](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-pytorch-being-used-across-research-and-industry.png).

## **Why?**

As of Jun 2023, PyTorch is the most used deep learning framework on [Papers With Code](https://paperswithcode.com/trends), a website for tracking machine learning research papers and the code repositories attached with them.

PyTorch also helps take care of many things such as GPU acceleration (making code run faster) behind the scenes.



## **What to cover in this module**


| **Topic** | **Contents** |
| ----- | ----- |
| **Introduction to tensors** | Tensors are the basic building block of all of machine learning and deep learning. |
| **Creating tensors** | Tensors can represent almost any kind of data (images, words, tables of numbers). |
| **Information from tensors** | If you can put information into a tensor, you'll want to get it out too. |
| **Manipulating tensors** | Machine learning algorithms (like neural networks) involve manipulating tensors in many different ways such as adding, multiplying, combining. |
| **Dealing with tensor shapes** | One of the most common issues in machine learning is dealing with shape mismatches (trying to mixed wrong shaped tensors with other tensors). |
| **Indexing on tensors** | If you've indexed on a Python list or NumPy array, it's very similar with tensors, except they can have far more dimensions. |
| **Mixing tensors and arrays** | PyTorch plays with tensors ([`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html)), NumPy likes arrays ([`np.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)) sometimes you'll want to mix and match these. |
| **Reproducibility** | Machine learning is very experimental and since it uses a lot of *randomness* to work, sometimes you'll want that *randomness* to not be so random. |
| **Running tensors on GPU** | GPUs (Graphics Processing Units) make your code faster, PyTorch makes it easy to run your code on GPUs. |


## **Importing PyTorch**

> **Note:** Before running any of the code in this notebook, you should have gone through the [PyTorch setup](https://pytorch.org/get-started/locally/)  steps. However, **if you're running on Google Colab**, everything should work (Google Colab comes with PyTorch and other libraries installed).

Let's start by importing PyTorch and checking the version we're using.

In [None]:
import torch
torch.__version__

'2.0.1+cu118'

Wonderful, it looks like we've got PyTorch 2.0.0+.

# **1. Introduction to tensors**

Tensors are the fundamental building block of machine learning.
Their job is to **represent data in a numerical way**.


![example of going from an input image to a tensor representation of the image, image gets broken down into 3 colour channels as well as numbers to represent the height and width](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-tensor-shape-example-of-image.png)

For example, we could represent an image as a tensor with shape `[3, 224, 224]` which would mean `[colour_channels, height, width]`, as in the image has `3` colour channels (red, green, blue), a height of `224` pixels and a width of `224` pixels. In tensor-speak (the language used to describe tensors), the tensor would have three dimensions, one for `colour_channels`, `height` and `width`.

## **1.1. Creating tensors**

PyTorch loves tensors. Read through the documentation on [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html).

#### **Scalar**
A **scalar** is a single number and in tensor-speak it's a zero dimension tensor.

In [None]:
# Scalar
scalar = torch.tensor(7)
scalar

tensor(7)

See how the above printed out `tensor(7)`? That means although `scalar` is a single number, it's of type `torch.Tensor`. We can check the dimensions of a tensor using the `ndim` attribute.

In [None]:
scalar.ndim # a scalar has no dimension


0

What if we wanted to retrieve the number from the tensor? As in, turn it from `torch.Tensor` to a Python integer? To do we can use the `item()` method.

In [None]:
# Get tensor back as Python number (only works with one-element tensors)
scalar.item()

7

#### **Vector**.

A vector is a **single dimension** tensor but can contain **many numbers**. The important trend here is that a vector is flexible in what it can represent (the same with tensors).

In [None]:
# Vector
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

Wonderful, `vector` now contains two 7's. How many dimensions do you think it'll have?

In [None]:
# Check the number of dimensions of vector
vector.ndim

1

The `vector` contains **two numbers** but only has a **single dimension**. The dimension of a tensor is often referred to as its **rank** or **order**. It indicates the **number of indices or axes** required to access individual elements within the tensor.

Another important concept for tensors is their `shape` attribute. The shape tells us how the elements inside them are arranged. Let's check out the shape of `vector`.

In [None]:
# Check shape of vector
vector.shape

torch.Size([2])

The **shape** of a tensor describes the **number of elements along each of its dimensions or axes**. It specifies how many elements are present in each dimension.

The above returns `torch.Size([2])` which means the vector has a shape of `[2]` in the single dimension.

In [None]:
scalar.shape

torch.Size([])

#### **Matrix**

In [None]:
# Matrix
MATRIX = torch.tensor([[7, 8],
                       [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

Matrices are as flexible as vectors, except they've got an **extra dimension**.



In [None]:
# Check number of dimensions
MATRIX.ndim

2

`MATRIX` has **two dimensions**. What `shape` do you think it will have?

In [None]:
MATRIX.shape

torch.Size([2, 2])

We get the output `torch.Size([2, 2])` because the `MATRIX` is two elements deep and two elements wide. How about we create a **tensor**?

In [None]:
# Tensor
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

In [None]:
TENSOR.ndim

3

In [None]:
TENSOR.shape

torch.Size([1, 3, 3])

## **1.2. Tensor datatypes**

There are many different tensor [datatypes](https://pytorch.org/docs/stable/tensors.html#data-types) available in PyTorch. Some are **specific for CPU** and some are **better for GPU** (generally if we see `torch.cuda` anywhere, the tensor is being used for GPU since Nvidia GPUs use a computing toolkit called CUDA).

The most common (and default) type is `torch.float32` or `torch.float`. This is referred to as "**32-bit floating point**".
But there's also **16-bit floating point** (`torch.float16` or `torch.half`) and **64-bit floating point** (`torch.float64` or `torch.double`).

There's also **8-bit**, **16-bit**, **32-bit** and **64-bit** integers.
An integer is a flat round number like `7` whereas a float has a decimal `7.0`.
The reason for all of these is to do with **precision**, amount of detail used to describe a number, in computing. The higher the precision value (8, 16, 32), the **more detail** used to express a number. So **lower precision datatypes are generally faster to compute** on but **sacrifice some performance on evaluation metrics** like accuracy (faster to compute but less accurate).

Let's see how to create some tensors with specific datatypes. We can do so using the `dtype` parameter.

In [None]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, track (record) gradients with this tensor operations performed

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

 **Note**:
 > Aside from **shape** issues (tensor shapes don't match up), two of the other most common issues are **datatype** (eg, during dot product) and **device** issues. For example, one of tensors is `torch.float32` and the other is `torch.float16` (**PyTorch often likes tensors to be the *same* format**). Or one of your tensors is on the CPU and the other is on the GPU (**PyTorch likes calculations between tensors to be on the *same* device**).

For now let's create a tensor with `dtype=torch.float16`.

In [None]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype

torch.float16

In [None]:
float_32_tensor2 = float_16_tensor.type(torch.float32)

float_32_tensor2, float_32_tensor2.dtype

(tensor([3., 6., 9.]), torch.float32)

You can change the datatypes of tensors using [`torch.Tensor.type(dtype=None)`](https://pytorch.org/docs/stable/generated/torch.Tensor.type.html) where the `dtype` parameter is the datatype you'd like to use.

In [None]:
# Create a tensor and check its datatype
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

Now we'll create another tensor the same as before but change its datatype to `torch.float16`.



In [None]:
# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

And we can do something similar to make a `torch.int8` tensor.

In [None]:
# Create a int8 tensor
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

> **Note:** Different datatypes can be confusing to begin with. But, the l**ower the number** (e.g. 32, 16, 8), the **less precise** a computer stores the value. And with a lower amount of storage, this generally results in **faster computation** and a **smaller overall model**. Mobile-based neural networks often operate with **8-bit integers**, smaller and faster to run but less accurate than their `float32` counterparts. For more on this, I'd read up about [precision in computing](https://en.wikipedia.org/wiki/Precision_(computer_science)).

> **Exercise:** So far we've covered a fair few tensor methods but there's a bunch more in the [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html) documentation. Spending some time in the documentation is recommended by scrolling through and looking into any that catch your eye.

## **1.3. Tensor Attributes**

Once you've created tensors, you might want to get some information from them. We've seen these before but three of the most common attributes you'll want to find out about tensors are:
* `shape` - what shape is the tensor? (some operations require specific shape rules)
* `dtype` - what datatype are the elements within the tensor stored in?
* `device` - what device is the tensor stored on? (usually GPU or CPU)


Both **attributes and functions are used to represent properties and behaviors of objects**, but they have distinct characteristics and purposes:

> **Attributes**: Attributes are variables that belong to an object or a class. They store information or state about the object or class. Attributes can be accessed and modified using **dot** notation (`object.attribute` or `class.attribute`). They are usually accessed directly **without parentheses**.

> **Functions**: They define a set of instructions to be executed when called or invoked. Functions can take **input** parameters (arguments) and may **return** a value. They are invoked by using parentheses (`object.method()` or `function()`).

Let's create a random tensor and find out details about it.

In [None]:
# Create a tensor
some_tensor = torch.rand(3, 4)
print(some_tensor)

tensor([[0.6748, 0.1052, 0.7866, 0.1365],
        [0.9178, 0.6855, 0.8798, 0.8456],
        [0.1475, 0.3835, 0.2620, 0.7551]])


In [None]:
# Find out details about it
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}") # will default to CPU

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


> **Note:** When you run into issues in PyTorch, it's very often one to do with one of the three attributes above:
  * "*what shape are my tensors? what datatype are they and where are they stored?*"

## **1.4. Indexing**

If you've ever done indexing on Python lists or NumPy arrays, indexing in PyTorch with tensors is very similar.

In [None]:
# Create a tensor
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [None]:
# Check number of dimensions
x.ndim

3

In [None]:
# Check also the shape
x.shape

torch.Size([1, 3, 3])

In [None]:
x[0]

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

Alright, it outputs `torch.Size([1, 3, 3])`.
The dimensions go outer to inner.
That means there's 1 dimension of 3 by 3.

![example of different tensor dimensions](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-pytorch-different-tensor-dimensions.png)

> **Note:** You might've noticed me using lowercase letters for `scalar` and `vector` and uppercase letters for `MATRIX` and `TENSOR`. This was on purpose. In practice, you'll often see scalars and vectors denoted as lowercase letters such as `y` or `a`. And matrices and tensors denoted as uppercase letters such as `X` or `W`.
>
> You also might notice the names martrix and tensor used interchangably. This is common. Since in PyTorch you're often dealing with `torch.Tensor`s (hence the tensor name), however, the shape and dimensions of what's inside will dictate what it actually is.

Let's summarise.

| Name | What is it? | Number of dimensions | Lower or upper (usually/example) |
| ----- | ----- | ----- | ----- |
| **scalar** | a single number | 0 | Lower (`a`) |
| **vector** | a set of numbers in a single dimension| 1 | Lower (`y`) |
| **matrix** | a 2-dimensional array of numbers | 2 | Upper (`Q`) |
| **tensor** | an n-dimensional array of numbers | n | Upper (`X`) |

![scalar vector matrix tensor and what they look like](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-scalar-vector-matrix-tensor.png)

Indexing values goes outer dimension -> inner dimension (check out the square brackets).

In [None]:
# Let's index bracket by bracket
print(f"The given tensor is: \n{x}")
print(f"First square bracket:\n{x[0]}") # same as x[0,:,:]
print(f"Second square bracket: {x[0][0]}") # same as x[0,0], x[0,0,:]
print(f"Third square bracket: {x[0][0][0]}") # same as x[0,0,0]
print(f"Third square bracket second item: {x[0][0][1]}") # same as x[0,0,1]

The given tensor is: 
tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])
First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1
Third square bracket second item: 2


You can also use `:` to specify "**all values in this dimension**" and then use a comma (`,`) to add another dimension.

In [None]:
# Get all values of 0th dimension and the 0 index of 1st dimension
x[:, 0]

tensor([[1, 2, 3]])

In [None]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]

tensor([[2, 5, 8]])

In [None]:
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
x[:, 1, 1]

tensor([5])

In [None]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :] # same as x[0][0]

tensor([1, 2, 3])

Indexing can be quite confusing to begin with, especially with larger tensors (I still have to try indexing multiple times to get it right).

In [None]:
# Slicing
x = torch.rand(5,3)
print(x)
print(x[:, 0]) # all rows, column 0
print(x[1, :]) # row 1, all columns
print(x[1,1]) # element at 1, 1

## **1.5. Random tensors**

A machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

In essence:

`Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...`

As a data scientist, we can define how the machine learning model starts (**initialization**), looks at data (**representation**) and updates (**optimization**) its random numbers.

For now, let's see how to create a tensor of random numbers using [`torch.rand()`](https://pytorch.org/docs/stable/generated/torch.rand.html) and passing in the `size` parameter.

In [None]:
x = torch.rand(5,3) # random uniform in the range [0, 1)]
print(x)
x = torch.randn(5,3) # random normal
print(x)

In [None]:
# Create a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

(tensor([[0.5239, 0.8177, 0.3471, 0.6224],
         [0.5464, 0.7673, 0.1950, 0.1994],
         [0.2721, 0.9299, 0.0354, 0.6212]]),
 torch.float32)

The flexibility of `torch.rand()` is that we can adjust the `size` to be whatever we want. For example, say you wanted a random tensor in the common image shape of `[224, 224, 3]` (`[height, width, color_channels`]).

In [None]:
# Create a random tensor of size (224, 224, 3)
random_image_size_tensor = torch.rand(size=(224, 224, 3))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([224, 224, 3]), 3)

**Uninitialized Tensors (`torch.empty(size)`)**

In [None]:
x = torch.empty(1) # scalar
print(x)

tensor([4.6975e-35])


In [None]:
x = torch.empty(3) # vector, 1D
print(x)

tensor([4.6977e-35, 0.0000e+00, 1.5975e-43])


In [None]:
x = torch.empty(2,3) # matrix, 2D
print(x)

tensor([[4.6977e-35, 0.0000e+00, 1.5975e-43],
        [1.3873e-43, 1.4574e-43, 6.4460e-44]])


In [None]:
x = torch.empty(2,2,3) # tensor, 3 dimensions
#x = torch.empty(2,2,2,3) # tensor, 4 dimensions
print(x)

tensor([[[4.6976e-35, 0.0000e+00, 2.3694e-38],
         [2.3694e-38, 2.3694e-38, 2.3694e-38]],

        [[3.7293e-08, 1.4838e-41, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00]]])


## **1.6. Reproducibility**

Neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to improve those random numbers using tensor operations (and a few other things) to better describe patterns in data.

In short:

**start with random numbers -> tensor operations -> try to make better (again and again and again)**

Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness.
Why?
You can get the same (or very similar) results on your computer running the same code repeatable experiments. That's where **reproducibility** comes in.

Let's see a brief example of reproducibility in PyTorch. We'll start by creating two random tensors, since they're random, you'd expect them to be different right?

In [None]:
import torch

# Create two random tensors
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B

Tensor A:
tensor([[0.8016, 0.3649, 0.6286, 0.9663],
        [0.7687, 0.4566, 0.5745, 0.9200],
        [0.3230, 0.8613, 0.0919, 0.3102]])

Tensor B:
tensor([[0.9536, 0.6002, 0.0351, 0.6826],
        [0.3743, 0.5220, 0.1336, 0.9666],
        [0.9754, 0.8474, 0.8988, 0.1105]])

Does Tensor A equal Tensor B? (anywhere)


tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

Just as you might've expected, the tensors come out with different values.
But what if you wanted to created two random tensors with the *same* values.
As in, the tensors would still contain random values but they would be of the same flavour.
That's where [`torch.manual_seed(seed)`](https://pytorch.org/docs/stable/generated/torch.manual_seed.html) comes in, where `seed` is an integer (like `42` but it could be anything) that flavours the randomness.
Let's try it out by creating some more *flavoured* random tensors.

In [None]:
import torch
import random

# # Set the random seed
RANDOM_SEED=42 # try changing this to different values and see what happens to the numbers below
torch.manual_seed(seed=RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called
# Without this, tensor_D would be different to tensor_C
torch.random.manual_seed(seed=RANDOM_SEED) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Does Tensor C equal Tensor D? (anywhere)


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

It looks like setting the seed worked.

Checkout more on the PyTorch reproducibility : [documentation](https://pytorch.org/docs/stable/notes/randomness.html).

## **1.7. Zeros and ones**

Sometimes you'll just want to fill tensors with zeros or ones.
This happens a lot with masking (like masking some of the values in one tensor with zeros to let a model know not to learn them). Let's create a tensor full of zeros with [`torch.zeros()`](https://pytorch.org/docs/stable/generated/torch.zeros.html) Again, the `size` parameter comes into play.

In [None]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [None]:
random_tensor * zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

We can do the same to create a tensor of all ones except using [`torch.ones()` ](https://pytorch.org/docs/stable/generated/torch.ones.html) instead.

In [None]:
# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

In [None]:
x = torch.zeros(2, 3) # fill with 0
print(x)

tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [None]:
x = torch.ones(2, 3) # fill with 1
print(x)

tensor([[1., 1., 1.],
        [1., 1., 1.]])


In [None]:
x = torch.ones(2, 3)
print(x.dtype) # check data type

torch.float32


In [None]:
x = torch.ones(2, 3, dtype=torch.int)
print(x.dtype)

torch.int32


In [None]:
x = torch.ones(2, 3, dtype=torch.double)
print(x.dtype)

torch.float64


In [None]:
x = torch.ones(2, 3, dtype=torch.float16)
print(x.dtype)

torch.float16


In [None]:
x = torch.ones(2, 3, dtype=torch.float16)
print(x.size())

torch.Size([2, 3])


## **1.8. Creating a range and tensors like**

Sometimes you might want a range of numbers, such as 1 to 10 or 0 to 100. You can use `torch.arange(start, end, step)` to do so, where:
* `start` = start of range (e.g. 0)
* `end` = end of range (e.g. 10)
* `step` = how many steps in between each value (e.g. 1)

> **Note:** In Python, you can use `range()` to create a range. However in PyTorch, `torch.range()` is deprecated and may show an error in the future.

In [None]:
# Use torch.arange(), torch.range() is deprecated
zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the future

# Create a range of values 0 to 10
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

  zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the future


tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Sometimes you might want one tensor of a certain type with the same shape as another tensor. For example, a tensor of all zeros with the same shape as a previous tensor. To do so you can use [`torch.zeros_like(input)`](https://pytorch.org/docs/stable/generated/torch.zeros_like.html) or [`torch.ones_like(input)`](https://pytorch.org/docs/1.9.1/generated/torch.ones_like.html) which return a tensor filled with zeros or ones in the same shape as the `input` respectively.

In [None]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

**Iterable vs Iterator**


> **Note**: In Python, an iterable and an iterator are related concepts but have distinct roles:
* An **iterable** is any object that can be looped over or iterated. It is a more general concept. Examples of iterables include **lists**, **tuples**, **strings**, **dictionaries**, and **sets**.
* An **iterator** is an object that represents a stream of data and provides a way to access elements sequentially. It implements two essential methods: `iter()` returns the iterator object itself and `next()` returns the next element from the stream. When there are no more elements, `next()` raises the `StopIteration` exception. Iterators maintain the state and remember the position in the iteration. Iterators are typically used in `for` loops or when you explicitly want to iterate over a sequence of items one by one.

In [None]:
my_list = [1, 2, 3]
print(f"list: {my_list}")
print(f"Iterable:")
for item in my_list:
    print(item)

In [None]:
iter_obj = iter(my_list)
print(f"list: {iter_obj}")
print(next(iter_obj))
print(next(iter_obj))
print(next(iter_obj))
print(next(iter_obj))

# **2. PyTorch Tensors and NumPy Arrays**

Since NumPy is a popular Python numerical computing library, PyTorch has functionality to interact with it nicely.
The two main methods you'll want to use for NumPy to PyTorch (and back again) are:
* [`torch.from_numpy(ndarray)`](https://pytorch.org/docs/stable/generated/torch.from_numpy.html) - NumPy array -> PyTorch tensor.
* [`torch.Tensor.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html) - PyTorch tensor -> NumPy array.

Let's try them out.

In [None]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

**Note:**
> By default, NumPy arrays are created with the datatype `float64` and if you convert it to a PyTorch tensor, it'll keep the same datatype (as above). However, many PyTorch calculations default to using `float32`.
So if you want to convert your **NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor (float32)**, you can use
* `tensor = torch.from_numpy(array).type(torch.float32)`.

In [None]:
array.dtype, tensor.dtype

(dtype('float64'), torch.float64)

But `torch.float64` is **not the default datatype**. It is `torch.float32`. Eg:

In [None]:
torch.arange(1.0,8.0).dtype

torch.float32

Because we reassigned `tensor` above, if you change the tensor, the array stays the same.

In [None]:
# Change the array, keep the tensor
array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

And if you want to go from PyTorch tensor to NumPy array, you can call `tensor.numpy()`.

In [None]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

And the same rule applies as above, if you change the original `tensor`, the new `numpy_tensor` stays the same.

In [None]:
# Change the tensor, keep the array the same
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

In [None]:
# Numpy
# Converting a Torch Tensor to a NumPy array and vice versa is very easy
a = torch.ones(5)
print(a)
# torch to numpy with .numpy()
b = a.numpy()
print(b)
print(type(b))

**Careful**:
> If the Tensor is on the CPU (not the GPU), both objects will share the same memory location, so **changing one will also change the other**.

In [None]:
a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


In [None]:
# numpy to torch with .from_numpy(x)
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
print(a)
print(b)

[1. 1. 1. 1. 1.]
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)


In [None]:
# again be careful when modifying
a += 1
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


# **3. Manipulating tensors (tensor operations)**

In deep learning, data (**images**, **text**, **video**, **audio**, **protein structures**, etc) gets represented as tensors. A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a **representation of the patterns** in the input data.


To perform operations on tensors in PyTorch, there are key points to be considered:

> **Compatible shapes**: For element-wise operations (e.g., addition, subtraction, multiplication), the tensors involved in the operation must have the **same shape** or be broadcastable to the **same shape**.

> **Data type compatibility**: Tensors involved in operations should have **compatible data types**. Mixing incompatible data types in operations can lead to **errors** or **unexpected behavior**.

> **Device compatibility**: Tensors must reside on the **same device** (either CPU or GPU) in order to perform operations together. PyTorch provides mechanisms to move tensors between devices using methods like `.to()` or `.cuda()`.

## **3.1. Basic operations**

Let's start with a few of the fundamental operations, addition (`+`), subtraction (`-`), multiplication (`*`). They work just as you think they would.

In [None]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [None]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

Notice how the tensor values above didn't end up being `tensor([110, 120, 130])`, this is because the *values inside the tensor don't change unless they're reassigned*.

In [None]:
# Tensors don't change unless reassigned
tensor

tensor([1, 2, 3])

Let's subtract a number and this time we'll reassign the `tensor` variable.

In [None]:
# Subtract and reassign
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [None]:
# Add and reassign
tensor = tensor + 10
tensor

tensor([1, 2, 3])

PyTorch also has a bunch of built-in functions like [`torch.mul()`](https://pytorch.org/docs/stable/generated/torch.mul.html#torch.mul) (short for multiplication) and [`torch.add()`](https://pytorch.org/docs/stable/generated/torch.add.html) to perform basic operations.

In [None]:
x = torch.rand(2, 2)
y = torch.rand(2, 2)
print(x)
print(y)

In [None]:
z = x + y # elementwise addition
print(z)
z=torch.add(x,y)
print(z)

In [None]:
# substraction
z = x - y
print(z)
z = torch.sub(x, y)
print(z)

In [None]:
# multiplication
z = x * y
print(z)
z = torch.mul(x,y)
print(z)

In [None]:
# division
z = x / y
print(z)
z = torch.div(x,y)
print(z)

In [None]:
# Can also use torch functions
torch.multiply(tensor, 10)

tensor([10, 20, 30])

In [None]:
# Original tensor is still unchanged
tensor

tensor([1, 2, 3])

However, it's more common to use the operator symbols like `*` instead of `torch.mul()`

## **3.2. In-Place Operations**

PyTorch supports in-place operations that **modify the tensor in-place**, indicated by a trailing **underscore** (e.g., `x.add_(y)`).

In [None]:
x = torch.rand(2, 2)
y = torch.rand(2, 2)
print(x)
print(y)

**Note**:

> In-place operations have some restrictions and may **not** be allowed in certain situations, such as when the tensor is involved in **autograd (automatic differentiation)** or when the tensor is **part of a computational graph**.

## **3.3. Matrix Multiplication**
PyTorch implements matrix multiplication functionality in the [`torch.matmul()`](https://pytorch.org/docs/stable/generated/torch.matmul.html) method. The main two points for matrix multiplication to remember are:
1. The **inner dimensions must match**:
  * `(3, 2) @ (3, 2)` won't work
  * `(2, 3) @ (3, 2)` will work
  * `(3, 2) @ (2, 3)` will work
2. The **resulting matrix has the shape of the outer dimensions**:
 * `(2, 3) @ (3, 2)` -> `(2, 2)`
 * `(3, 2) @ (2, 3)` -> `(3, 3)`

> **Note:** "`@`" in Python is the symbol for matrix multiplication. Rules for matrix multiplication using [`torch.matmul()`](https://pytorch.org/docs/stable/generated/torch.matmul.html) in the PyTorch documentation.

In [None]:
import torch
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

In [None]:
# Matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

In [None]:
# Can also use the "@" symbol for matrix multiplication, though not recommended
tensor @ tensor

tensor(14)

In [None]:
# Shapes need to be in the right way
A = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype=torch.float32)

B = torch.tensor([[7, 10],
                  [8, 11],
                  [9, 12]], dtype=torch.float32)

torch.matmul(A, B)

tensor([[ 50.,  68.],
        [122., 167.]])

You can do matrix multiplication by hand but it's not recommended. The in-built `torch.matmul()` method is faster.
The `%%time` magic command is used to measure and display the execution time of a specific code cell or block.

In [None]:
tensor = torch.tensor([1, 2, 3])

In [None]:
%%time
# Matrix multiplication by hand
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
value

CPU times: user 392 µs, sys: 0 ns, total: 392 µs
Wall time: 401 µs


tensor(14)

In [None]:
%%time
torch.matmul(tensor, tensor)

CPU times: user 150 µs, sys: 19 µs, total: 169 µs
Wall time: 174 µs


tensor(14)

Manual:

  CPU times: user 762 µs, sys: 1.03 ms, total: 1.8 ms

  Wall time: 1.67 ms (**1.67 milli seconds**)

  tensor(14)

Built-in:
  CPU times: user 103 µs, sys: 0 ns, total: 103 µs

  Wall time: 107 µs (**107 micro seconds**)

  tensor(14)


## **3.4. Element-Wise Multiplication**

In [None]:
# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
tensor = torch.tensor([1, 2, 3])
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


In [None]:
X = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype=torch.float32)
Y = torch.tensor([[7, 8, 9],
                  [10, 11, 12]], dtype=torch.float32)
X * Y

tensor([[ 7., 16., 27.],
        [40., 55., 72.]])

The difference between **element-wise** multiplication and **matrix** multiplication is the **addition of values**. For our `tensor` variable with values `[1, 2, 3]`:

| Operation | Calculation | Code |
| ----- | ----- | ----- |
| **Element-wise multiplication** | `[1*1, 2*2, 3*3]` = `[1, 4, 9]` | `tensor * tensor` |
| **Matrix multiplication** | `[1*1 + 2*2 + 3*3]` = `[14]` | `tensor.matmul(tensor)` |


In [None]:
# Element-wise matrix multiplication
tensor * tensor

tensor([1, 4, 9])

## **3.5. Matrix Transpose**

One of the most common errors in deep learning is **shape errors** (**shape mismatches**).

In [None]:
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error)

RuntimeError: ignored

We can make matrix multiplication work between `tensor_A` and `tensor_B` by making their inner dimensions match. One of the ways to do this is with a **transpose** (switch the dimensions of a given tensor). You can perform transposes in PyTorch using either:
* `torch.transpose(input, dim0, dim1)` - where `input` is the desired tensor to transpose and `dim0` and `dim1` are the dimensions to be swapped.
* `tensor.T` - where `tensor` is the desired tensor to transpose.

Let's try the latter.

In [None]:
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [None]:
# View tensor_A.T and tensor_B
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [None]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [None]:
# The operation works when one of the two tensors is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output)
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


You can also use [`torch.mm()`](https://pytorch.org/docs/stable/generated/torch.mm.html) which is a short for `torch.matmul()`.

In [None]:
# torch.mm is a shortcut for matmul
torch.mm(tensor_A, tensor_B.T)

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

**Note:** A matrix multiplication like this is also referred to as the **dot product** of two matrices.



**Mismatched Data Types**

In [None]:
a = torch.tensor([1, 2, 3])  # integer tensor
b = torch.tensor([1.0, 2.0, 3.0])  # float tensor

c = torch.matmul(a, b)  # Error: Mismatched data types

RuntimeError: ignored

## **3.6. Linear Transformation in NN**

**Neural networks** are full of matrix multiplications and dot products. The [`torch.nn.Linear()`](https://pytorch.org/docs/1.9.1/generated/torch.nn.Linear.html) module (we'll see this in action later on), also known as a **feed-forward** layer or **fully connected** layer, implements a matrix multiplication between an input `x` and a weights matrix `W`.

$$
{\bf y}_{n\times 1} = X_{n\times J}W_{1\times J}' + {\bf b}_{n\times 1}
$$
????????????
$$
y_i = {\bf x}_i^{(1\times J)} W_{1\times J}' + b_i; i=1,2,...,n
$$

Where:
* `x` is the input to the layer (deep learning is a stack of layers like `torch.nn.Linear()` and others on top of each other).
* `W` is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent patterns in the data (notice the "`T`", that's because the weights matrix gets transposed).

* `b` is the bias term used to slightly offset the weights and inputs.
* `y` is the output (a manipulation of the input in the hopes to discover patterns in it).

Let's play around with a linear layer. Try changing the values of `in_features` and `out_features` below and see what happens. Do you notice anything to do with the shapes?

In [None]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input
                         out_features=1) # out_features = describes outer value
x = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]], dtype=torch.float32)
print(f"Input shape: {x.shape}\n")
output = linear(x)
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[1.5488],
        [3.8038],
        [6.0588]], grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 1])


In [None]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input
                         out_features=2) # out_features = describes outer value
x = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]], dtype=torch.float32)
print(f"Input shape: {x.shape}\n")
output = linear(x)
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[1.5595, 1.2761],
        [3.8145, 2.2439],
        [6.0695, 3.2117]], grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 2])


In [None]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input
                         out_features=6) # out_features = describes outer value
x = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]], dtype=torch.float32)
print(f"Input shape: {x.shape}\n")
output = linear(x)
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


$$
y_{n\times 1} = x_{n\times k}\cdot{W_{n\times k}^T} + b_{n\times 1}?
$$

> **Question:** What happens if you change `in_features` from 2 to 3 above? Does it error? How could you change the shape of the input (`x`) to accomodate to the error? Hint: what did we have to do to `tensor_B` above?

## **3.7. Tensor Aggregation**

Find the **max**, **min**, **mean** and **sum** of it. Note that you may find some methods such as `tensor.mean()` require tensors to be in `torch.float32` (the most common) or another specific datatype, otherwise the operation will fail.




In [None]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [None]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450




You can also do the same as above with `torch` methods.

In [None]:
torch.max(x), torch.min(x), torch.mean(x.type(torch.float32)), torch.sum(x)

(tensor(90), tensor(0), tensor(45.), tensor(450))

## **3.8. Positional Min/Max**

You can also find the location index of a tensor **where the max or minimum occurs** with [`torch.argmax()`](https://pytorch.org/docs/stable/generated/torch.argmax.html) and [`torch.argmin()`](https://pytorch.org/docs/stable/generated/torch.argmin.html) respectively.
This is helpful in case we just want the **position where the highest (or lowest) value is** and **not the actual value itself** (we'll see this in a later section when using the [softmax activation function](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html)).

In [None]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


In [None]:
tensor[0], tensor[8]

(tensor(10), tensor(90))

## **3.9. Reshaping, stacking, squeezing and unsqueezing**

Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them. To do so, some popular methods are:

> **Reshaping**: Reshaping refers to **changing the shape of a tensor** while maintaining the same number of elements. In PyTorch, the common operation to reshape a tensor is `view()`, which allows you to specify the desired shape. For example, you can reshape a tensor of shape (2, 3) to (3, 2) using `tensor.view(3, 2)`.

> **Stacking**: Stacking involves **combining multiple tensors** along a new dimension. PyTorch provides the `torch.stack()` function, which stacks tensors along a specified dimension. For example, if you have two tensors of shape (2, 3) and (2, 3), you can stack them along the 0th dimension using `torch.stack([tensor1, tensor2], dim=0)` to get a resulting tensor of shape (2, 2, 3). There are also addtional functions to create mutliple tensors on top of each other `vstack()` or side by side `hstack()`.

> **Squeezing**: Squeezing is the operation of **removing dimensions with size 1** from a tensor. It reduces the rank of the tensor. The `torch.squeeze()` function is used to remove dimensions of size 1. For example, if you have a tensor of shape (1, 3, 1, 5), you can remove the dimensions of size 1 using `torch.squeeze(tensor)` to get a resulting tensor of shape (3, 5).

> **Unsqueezing**: Unsqueezing is the opposite operation of squeezing. It **adds dimensions with size 1** to a tensor. The `torch.unsqueeze()` function is used to add dimensions of size 1 at a specified position. For example, if you have a tensor of shape (3, 5), you can unsqueeze it along dimension 0 using `torch.unsqueeze(tensor, dim=0)` to get a resulting tensor of shape (1, 3, 5).

| Method | One-line description |
| ----- | ----- |
| [`torch.reshape(input, shape)`](https://pytorch.org/docs/stable/generated/torch.reshape.html#torch.reshape) | Reshapes `input` to `shape` (if compatible), can also use `torch.Tensor.reshape()`. |
| [`torch.Tensor.view(shape)`](https://pytorch.org/docs/stable/generated/torch.Tensor.view.html) | Returns a view of the original tensor in a different `shape` but shares the same data as the original tensor. |
| [`torch.stack(tensors, dim=0)`](https://pytorch.org/docs/1.9.1/generated/torch.stack.html) | Concatenates a sequence of `tensors` along a new dimension (`dim`), all `tensors` must be same size. |
| [`torch.squeeze(input)`](https://pytorch.org/docs/stable/generated/torch.squeeze.html) | Squeezes `input` to remove all the dimenions with value `1`. |
| [`torch.unsqueeze(input, dim)`](https://pytorch.org/docs/1.9.1/generated/torch.unsqueeze.html) | Returns `input` with a dimension value of `1` added at `dim`. |
| [`torch.permute(input, dims)`](https://pytorch.org/docs/stable/generated/torch.permute.html) | Returns a *view* of the original `input` with its dimensions permuted (rearranged) to `dims`. |

Why do any of these?

Because deep learning models (neural networks) are all about **manipulating tensors** in some way. And because of the rules of matrix multiplication, if you've got shape mismatches, you'll run into errors. These methods help you make the right elements of your tensors are mixing with the right elements of other tensors.

In [None]:
# Create a tensor
import torch
x = torch.arange(1., 10.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

Now let's add an extra dimension with `torch.reshape()`.

In [None]:
# Add an extra dimension
x_reshaped = x.reshape(1, 9)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [None]:
x_reshaped = x.reshape(9,1) # 9x1 = shape of tensor 9
x_reshaped, x_reshaped.shape

(tensor([[1.],
         [2.],
         [3.],
         [4.],
         [5.],
         [6.],
         [7.],
         [8.],
         [9.]]),
 torch.Size([9, 1]))

In [None]:
x_reshaped = x.reshape(3,3) # 3x3 = shape of tensor 9
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]),
 torch.Size([3, 3]))

We can also change the view with `torch.view()`.

In [None]:
# Change view (keeps same data as original but changes view)
z = x.view(1, 9)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

Remember though, changing the view of a tensor with `torch.view()` really only creates a new view of the *same* tensor. So changing the view changes the original tensor too because it shares the same memory as the original one.

In [None]:
# Changing z changes x
z[:, 0] = 5 # first element is changed to 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.]))

In [None]:
# Reshape with torch.view()
x = torch.randn(4, 4)
print(x)
y = x.view(16) # 1D tensor, 16=4*4, no of items should be equal
print(y)
z = x.view(-1, 8)  # a by 8 tensor, -1 makes a to be determined by default
print(z)
print(x.size(), y.size(), z.size())

If we wanted to stack our new tensor on top of itself five times, we could do so with `torch.stack()`.

In [None]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0) # stacked vertically
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.]])

In [None]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=1) # stacked horizontally
x_stacked

tensor([[5., 5., 5., 5.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.],
        [4., 4., 4., 4.],
        [5., 5., 5., 5.],
        [6., 6., 6., 6.],
        [7., 7., 7., 7.],
        [8., 8., 8., 8.],
        [9., 9., 9., 9.]])

How about removing all single dimensions from a tensor?
To do so you can use `torch.squeeze()` (I remember this as **squeezing** the tensor to only have dimensions over 1).

In [None]:
y = torch.arange(1., 10.)
y_reshaped = y.reshape(1, 9)
print(f"Previous tensor: {y_reshaped}")
print(f"Previous shape: {y_reshaped.shape}")

# Remove extra dimension from x_reshaped
y_squeezed = y_reshaped.squeeze() # one square bracket is removed
print(f"\nNew tensor: {y_squeezed}")
print(f"New shape: {y_squeezed.shape}")

Previous tensor: tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]])
Previous shape: torch.Size([1, 9])

New tensor: tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
New shape: torch.Size([9])


And to do the reverse of `torch.squeeze()`, you can use `torch.unsqueeze()` to add a dimension value of 1 at a specific index.

In [None]:
print(f"Previous tensor: {y_squeezed}")
print(f"Previous shape: {y_squeezed.shape}")

## Add an extra dimension with unsqueeze
y_unsqueezed = y_squeezed.unsqueeze(dim=0)
print(f"\nNew tensor: {y_unsqueezed}")
print(f"New shape: {y_unsqueezed.shape}")

Previous tensor: tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
Previous shape: torch.Size([9])

New tensor: tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]])
New shape: torch.Size([1, 9])


In [None]:
print(f"Previous tensor: {y_squeezed}")
print(f"Previous shape: {y_squeezed.shape}")

## Add an extra dimension with unsqueeze
y_unsqueezed = y_squeezed.unsqueeze(dim=1) # dim = 1
print(f"\nNew tensor: {y_unsqueezed}")
print(f"New shape: {y_unsqueezed.shape}")

Previous tensor: tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
Previous shape: torch.Size([9])

New tensor: tensor([[1.],
        [2.],
        [3.],
        [4.],
        [5.],
        [6.],
        [7.],
        [8.],
        [9.]])
New shape: torch.Size([9, 1])


In [None]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimension from x_reshaped
x_squeezed = x_reshaped.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
Previous shape: torch.Size([3, 3])

New tensor: tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
New shape: torch.Size([3, 3])


The `unsqueeze` method in PyTorch does **not have a default dimension**. It requires you to explicitly specify the dimension along which you want to add a singleton dimension (dimension of size 1).

In [None]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimension from x_reshaped
x_unsqueezed = x_reshaped.unsqueeze(dim=0) # dim = 0
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")

Previous tensor: tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
Previous shape: torch.Size([3, 3])

New tensor: tensor([[[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]])
New shape: torch.Size([1, 3, 3])


In [None]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimension from x_reshaped
x_unsqueezed = x_reshaped.unsqueeze(dim=1) # dim = 1
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")

Previous tensor: tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
Previous shape: torch.Size([3, 3])

New tensor: tensor([[[1., 2., 3.]],

        [[4., 5., 6.]],

        [[7., 8., 9.]]])
New shape: torch.Size([3, 1, 3])


You can also rearrange the order of axes values with `torch.permute(input, dims)`, where the `input` gets turned into a *view* with new `dims`.

In [None]:
# Create tensor with specific shape
x_original = torch.rand(size=(224, 224, 3)) # [height, width, color_channels]

# Permute the original tensor to rearrange the axis (or dim) order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0
                                         # [color_channels, height, width]

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


> **Note**: Because permuting returns a **view** (**shares the same data as the original**), the values in the permuted tensor will be the same as the original tensor and if you change the values in the view, it will change the values of the original.

# **4. Gradient Computation**

In PyTorch, there is an attribute of tensors called `requires_grad` property that allows for **automatic differentiation** and **gradient computation** during backpropagation. It is used to determine whether gradients need to be computed for a particular tensor as part of the **computational graph**.
When you create a tensor in PyTorch, by default, the `requires_grad` property is set to `False`. This means that the tensor does **not track operations** on it and does **not participate in gradient computation**. However, if you want to compute gradients with respect to that tensor, you can set `requires_grad` to `True`.

Below are examples to illustrate the concept:

In [None]:
x = torch.rand(3, requires_grad=True)
print(x)
y = x + 2
print(y)

tensor([0.9979, 0.0458, 0.9733], requires_grad=True)
tensor([2.9979, 2.0458, 2.9733], grad_fn=<AddBackward0>)


The `grad_fn` attribute references the function that created the tensor and keeps track of the operations applied to the tensor.
In the example, `grad_fn=<AddBackward0>` indicates that the tensor was created by **adding two or more tensors** together using the `torch.add()` function or the `+` operator. The `AddBackward0` represents the specific function used for the addition operation, and the `0` refers to the fact that it is the first occurrence of the backward operation for this particular addition.

In [None]:
x = torch.rand(3, requires_grad=True)
print(x)
print(x.requires_grad)
print(x.grad_fn)
y = x - 2
print(y)
print(y.requires_grad)
print(y.grad_fn)

tensor([0.7659, 0.8687, 0.4430], requires_grad=True)
True
None
tensor([-1.2341, -1.1313, -1.5570], grad_fn=<SubBackward0>)
True
<SubBackward0 object at 0x7fba8eaf99f0>


In [None]:
x = torch.rand(3, requires_grad=True)
print(x)
y = x * 2
print(y)

tensor([0.1603, 0.3280, 0.1625], requires_grad=True)
tensor([0.3205, 0.6561, 0.3250], grad_fn=<MulBackward0>)


In [None]:
x = torch.rand(3, requires_grad=True)
print(x)
y = x / 2
print(y)

tensor([0.9482, 0.7802, 0.2670], requires_grad=True)
tensor([0.4741, 0.3901, 0.1335], grad_fn=<DivBackward0>)


In [None]:
x = torch.rand(3, requires_grad=True)
print(x)
y = x**2
print(y)

tensor([0.5760, 0.9867, 0.5467], requires_grad=True)
tensor([0.3318, 0.9735, 0.2989], grad_fn=<PowBackward0>)


In [None]:
z = y*y*2
print(z)
z= z.mean()
print(z)

tensor([0.2201, 1.8954, 0.1786], grad_fn=<MulBackward0>)
tensor(0.7647, grad_fn=<MeanBackward0>)


Let's compute the gradients with **backpropagation**. When we finish our computation we can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute. It is the **partial derivate** of the function w.r.t. the tensor.

$$x\rightarrow y=f(x)=x^2\rightarrow z=g(y)=g(f(x))=2y^2$$
?????????????

$$
\frac{\partial z}{\partial x}=\frac{\partial z}{\partial f}\frac{\partial f}{\partial x}=4y (2x) =8x^3
$$


In [None]:
z.backward() # Since z is a scalar, no need to put an argument in ()
print(x.grad) # dz/dx

tensor([0.5096, 2.5614, 0.4357])


Generally speaking, `torch.autograd` is an engine for computing **vector-Jacobian product**. It computes **partial derivates applying the chain rule**.

In [None]:
import torch
# Create a scalar tensor with requires_grad=True
x = torch.tensor(3.0, requires_grad=True)

# Perform computations
y = 2 * x + 1

# Calculate gradients
y.backward()

# Access the gradient
print(x.grad)  # prints: tensor(2.)

tensor(2.)


In [None]:
# Create a scalar tensor with requires_grad=True
x = torch.tensor(3.0, requires_grad=True)

# Perform computations
y = 2 * x + 1 + x**2 + 1

# Calculate gradients
y.backward()

# Access the gradient
print(x.grad)  # prints: tensor(8.)

tensor(8.)


In [None]:
# Create a tensor without gradient tracking
x = torch.tensor([1, 2, 3])

# Set requires_grad=True to enable gradient tracking
x = torch.tensor([1, 2, 3], requires_grad=True)

# Perform some computations
y = x * 2
z = y.mean()

# Perform backpropagation
z.backward()

# Access gradients
print(x.grad)

In [None]:
# Create a tensor and enable gradient tracking
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

# Perform some computations
y = x.pow(3).sum()
y

tensor(36., grad_fn=<SumBackward0>)

In [None]:
# Calculate gradients
y.backward()
y

tensor(36., grad_fn=<SumBackward0>)

In [None]:
# Access the gradients
print(x.grad)  # prints: tensor([3., 12., 27.])

tensor([ 3., 12., 27.])


In [None]:
# Create a tensor and enable gradient tracking
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

# Perform some computations
y = x.pow(2).sum()

# Calculate gradients
y.backward()

# Access the gradients
print(x.grad)  # prints: tensor([2., 4., 6.])

tensor([2., 4., 6.])


If a tensor is non-scalar (more than 1 elements), we need to specify arguments for `backward()` specify a gradient argument that is a tensor of matching shape. Needed for vector-Jacobian product.

In [None]:
x = torch.randn(3, requires_grad=True)
print(x)
y = x + 2
print(y)
z = y*y*2
print(z)
v= torch.tensor([0.1, 1.0, 0.001], dtype=torch.float32)
print(v)
z.backward(v)
print(x.grad) # dz/dx

In [None]:
x = torch.randn(3, requires_grad=True)

y = x * 2
for _ in range(10):
    y = y * 2
print(y)
print(y.shape)
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float32)
y.backward(v)
print(x.grad)

**Gradient tracking compatibility**: Note that **not all operations support gradient computation**, especially those that involve **non-differentiable operations like indexing or integer division**.

## **4.1. Disabling Gradient Tracking**

PyTorch does **not support in-place operations** on tensors that are part of the computational graph for **gradient calculation** because it can lead to **incorrect or inconsistent gradient calculations**. If you specifically want to perform in-place operations on a tensor that requires gradients, you can detach the tensor before the operation. **Detaching a tensor removes it from the computational graph and disables gradient tracking.** However, keep in mind that detached tensors will not have gradients, so you won't be able to backpropagate through them.

Three options: 1) `x.requires_grad_(False)`; 2) `x.detach()`; 3) wrap in context manager '`with torch.no_grad():` or `torch.inference_mode()`'.

By detaching a tensor, you can effectively prevent it from participating in subsequent gradient computations, **reducing memory consumption** and **computation overhead** during backpropagation.

In [None]:
x = torch.randn(3, requires_grad=True)
print(x)
x = x.requires_grad_(False)
print(x)

tensor([ 1.0670, -1.6403,  0.1405], requires_grad=True)
tensor([ 1.0670, -1.6403,  0.1405])


In [None]:
a = torch.randn(2, 2)
print(a)
print(a.requires_grad)

b = ((a * 3) / (a - 1))
print(b)
print(b.grad_fn)

a.requires_grad_(True)
print(a.requires_grad)

b = (a * a).sum()
print(b)
print(b.grad_fn)

tensor([[ 1.7842,  0.2817],
        [-1.9815, -0.6158]])
False
tensor([[ 6.8256, -1.1764],
        [ 1.9938,  1.1433]])
None
True
tensor(7.5680, grad_fn=<SumBackward0>)
<SumBackward0 object at 0x7fba8eaf8af0>


`x.detach()` get a new Tensor with the same content but no gradient computation:

In [None]:
a = torch.randn(2, 2, requires_grad=True)
print(a)
print(a.requires_grad)
b = a.detach()
print(b)
print(b.requires_grad)

tensor([[ 0.2120, -0.1929],
        [-0.0765, -1.0866]], requires_grad=True)
True
tensor([[ 0.2120, -0.1929],
        [-0.0765, -1.0866]])
False


Wrap in '`with torch.no_grad():`/`torch.inference_mode():`

In [None]:
a = torch.randn(2, 2, requires_grad=True)
print(a)
print(a.requires_grad)
with torch.no_grad():
  y = a**2
  print(y)
  print(y.requires_grad)

tensor([[-0.4048, -0.5917],
        [ 0.3134, -0.4440]], requires_grad=True)
True
tensor([[0.1638, 0.3501],
        [0.0982, 0.1971]])
False


In [None]:
a = torch.randn(2, 2, requires_grad=True)
print(a)
print(a.requires_grad)
with torch.inference_mode():
  y = a**2
  print(y)
  print(y.requires_grad)

tensor([[2.4532, 2.4012],
        [0.9637, 1.3640]], requires_grad=True)
True
tensor([[6.0181, 5.7656],
        [0.9287, 1.8604]])
False


## **4.2. Setting Gradients to Zero**

`backward()` accumulates the gradient for this tensor into `.grad` attribute. We need to be careful during optimization !!! Use `.zero_()` to empty the gradients before a new optimization step!

Suppose $y$ is a function to be minimized interms of $w$.

In [None]:
x = 3
w = torch.ones(4, requires_grad=True)

for step in range(1):
    # just a dummy operation
    y = (w*x).sum()   # forward pass
    y.backward()      # backward pass: dy/dw
    print(w.grad)

tensor([3., 3., 3., 3.])


In [None]:
x = 3
w = torch.ones(4, requires_grad=True)

for step in range(2):
    y = (w*x).sum()
    y.backward()
    print(w.grad)

tensor([3., 3., 3., 3.])
tensor([6., 6., 6., 6.])


In [None]:
x = 3
w = torch.ones(4, requires_grad=True)

for step in range(5):
    y = (w*x).sum()
    y.backward()
    print(w.grad)

tensor([3., 3., 3., 3.])
tensor([6., 6., 6., 6.])
tensor([9., 9., 9., 9.])
tensor([12., 12., 12., 12.])
tensor([15., 15., 15., 15.])


Hence, at each step of iteration, gradients are accumulated.

In [None]:
x = 3
w = torch.ones(4, requires_grad=True)

for step in range(5):
    y = (w*x).sum()
    y.backward()
    print(w.grad)
    # this is important!
    w.grad.zero_()

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


The gradients above are correct.

Now, let us add an update part to the above dummy operation.

In [None]:
x = 3
w = torch.ones(4, requires_grad=True)

for step in range(5):
    y = (w*x).sum()
    y.backward()
    print(w.grad)

    # optimize model, i.e. adjust weights...
    w -= 0.1 * w.grad

print(w)
print(y)

tensor([3., 3., 3., 3.])


RuntimeError: ignored

In [None]:
x = 3
w = torch.ones(4, requires_grad=True)

for step in range(5):
    y = (w*x).sum()
    y.backward()
    print(w.grad)

    # optimize model, i.e. adjust weights...
    with torch.no_grad():
        w -= 0.1 * w.grad

print(w)
print(y)

tensor([3., 3., 3., 3.])
tensor([6., 6., 6., 6.])
tensor([9., 9., 9., 9.])
tensor([12., 12., 12., 12.])
tensor([15., 15., 15., 15.])
tensor([-3.5000, -3.5000, -3.5000, -3.5000], requires_grad=True)
tensor(-24., grad_fn=<SumBackward0>)


In [None]:
x = 3
w = torch.ones(4, requires_grad=True)

for step in range(5):
    y = (w*x).sum()
    y.backward()
    print(w.grad)

    # optimize model, i.e. adjust weights...
    with torch.no_grad():
        w -= 0.1 * w.grad

    # this is important! It affects the final weights w & output y
    w.grad.zero_()

print(w)
print(y)

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([-0.5000, -0.5000, -0.5000, -0.5000], requires_grad=True)
tensor(-2.4000, grad_fn=<SumBackward0>)


**Builtin Optimizers**: The `.zero_grad()` method is commonly used in deep learning to clear gradients for all the model's parameters before the backward pass and parameter update.

The `.zero_grad()` method is different `grad.zero_()`. The `grad.zero_()` is used to zero out the gradients of a specific tensor. It is called on the grad attribute of a tensor and resets the gradients to zero for that **particular tensor**. It is typically used when you want to manually manage gradients or update **specific gradients independently**.

Whereas the `.zero_grad()` method is used to zero out the gradients of **all parameters** that are being optimized by an optimizer. It is **called on an optimizer object and resets the gradients to zero for all the parameters that the optimizer is tracking**. It is a convenient way to zero the gradients of all the model's parameters at once.

In [None]:
w = {"w_0": torch.tensor(1.0, requires_grad=True),
     "w_1": torch.tensor(1.0, requires_grad=True),
     "w_2": torch.tensor(1.0, requires_grad=True),
     "w_3": torch.tensor(1.0, requires_grad=True)}
optimizer = torch.optim.SGD(w.values(), lr=0.01)
print(optimizer)

loss = (w["w_0"] + 2 * w["w_1"] + 3 * w["w_2"] + 4 * w["w_3"]) ** 2 # Perform forward pass
loss.backward()                                                     # Perform backward passes

print("Gradients before zeroing:")
for name, param in w.items():
    print(name, param.grad)

optimizer.step()
optimizer.zero_grad() # Zero the gradients

print("\nGradients after zeroing:")
for name, param in w.items():
    print(name, param.grad)

SGD (
Parameter Group 0
    dampening: 0
    differentiable: False
    foreach: None
    lr: 0.01
    maximize: False
    momentum: 0
    nesterov: False
    weight_decay: 0
)
Gradients before zeroing:
w_0 tensor(20.)
w_1 tensor(40.)
w_2 tensor(60.)
w_3 tensor(80.)

Gradients after zeroing:
w_0 None
w_1 None
w_2 None
w_3 None


# **5. Utilizing GPUs**

Deep learning algorithms require a **lot of numerical operations**.
And by default these operations are often done on a **CPU** (**computer processing unit**).
However, there's another common piece of hardware called a **GPU** (**graphics processing unit**), which is **often much faster** at performing the specific types of operations neural networks need (matrix multiplications) than CPUs.
There are a few ways to first get access to a GPU and secondly get PyTorch to use the GPU.

> **Note:** When I reference "GPU" throughout this course, I'm referencing a [Nvidia GPU with CUDA](https://developer.nvidia.com/cuda-gpus) enabled (CUDA is a computing platform and API that helps allow GPUs be used for general purpose computing & not just graphics) unless otherwise specified.



## **5.1. Getting a GPU**

You may already know what's going on when I say GPU. But if not, there are a few ways to get access to one.

| **Method** | **Difficulty to setup** | **Pros** | **Cons** | **How to setup** |
| ----- | ----- | ----- | ----- | ----- |
| Google Colab | Easy | Free to use, almost zero setup required, can share work with others as easy as a link | Doesn't save your data outputs, limited compute, subject to timeouts | [Follow the Google Colab Guide](https://colab.research.google.com/notebooks/gpu.ipynb) |
| Use your own | Medium | Run everything locally on your own machine | GPUs aren't free, require upfront cost | Follow the [PyTorch installation guidelines](https://pytorch.org/get-started/locally/) |
| Cloud computing (AWS, GCP, Azure) | Medium-Hard | Small upfront cost, access to almost infinite compute | Can get expensive if running continually, takes some time to setup right | Follow the [PyTorch installation guidelines](https://pytorch.org/get-started/cloud-partners/) |

There are more options for using GPUs but the above three will suffice for now. If you're looking to purchase a GPU of your own but not sure what to get, [Tim Dettmers has an excellent guide](https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/).


The "`!nvidia-smi`" command is commonly used to check the status and details of NVIDIA GPUs, including GPU utilization, memory usage, temperature, and driver version. It provides an overview of the GPU resources and can be helpful for troubleshooting, monitoring GPU usage, or verifying the installation of GPU drivers.

To check if you've got access to a Nvidia GPU, you can run `!nvidia-smi` where the `!` (also called bang) means "run this on the command line".

In [None]:
!nvidia-smi

Fri Jun  9 02:23:07 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P8    12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

If you don't have a Nvidia GPU accessible, the above will output something like:

```
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
```

In that case, go back up and follow the install steps.

If you do have a GPU, the line above will output something like:

```
Wed Jan 19 22:09:08 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

## **5.2. Getting PyTorch to run on GPU**

Once you've got a GPU ready to access, the next step is getting PyTorch to use for storing data (tensors) and computing on data (performing operations on tensors).
To do so, you can use the [`torch.cuda`](https://pytorch.org/docs/stable/cuda.html) package.
Rather than talk about it, let's try it out.
You can test if PyTorch has access to a GPU using [`torch.cuda.is_available()`](https://pytorch.org/docs/stable/generated/torch.cuda.is_available.html#torch.cuda.is_available).


In [None]:
# Check for GPU
import torch
torch.cuda.is_available()

True

If the above outputs `True`, PyTorch can see and use the GPU, if it outputs `False`, it can't see the GPU and in that case, you'll have to go back through the installation steps.

Now, let's say you wanted to setup your code so it ran on CPU *or* the GPU if it was available.
That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using.
Let's create a `device` variable to store what kind of device is available.

In [None]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

If the above outputs `"cuda"` it means we can set all of our PyTorch code to use the available CUDA device (a GPU) and if it output `"cpu"`, our PyTorch code will stick with the CPU. Note that, in PyTorch, it's best practice to write [**device agnostic code**](https://pytorch.org/docs/master/notes/cuda.html#device-agnostic-code). This means code that'll run on CPU (always available) or GPU (if available).

If you want to do faster computing you can use a GPU but if you want to do *much* faster computing, you can use multiple GPUs.
You can count the number of GPUs PyTorch has access to using [`torch.cuda.device_count()`](https://pytorch.org/docs/stable/generated/torch.cuda.device_count.html#torch.cuda.device_count).

In [None]:
# Count number of devices
torch.cuda.device_count()

1

Knowing the number of GPUs PyTorch has access to is helpful incase you wanted to run a specific process on one GPU and another process on another (PyTorch also has features to let you run a process across *all* GPUs).

By default all tensors are created on the CPU, but you can also move them to the GPU (only if it's available)

In [None]:
if torch.cuda.is_available:
  device = torch.device("cuda")
  x = torch.ones(5, device = device)
  print(x)
  y = torch.ones(5)
  print(y)
  y = y.to(device)
  z = x + y
  print(z)
  #z = z.numpy() provides error b/c numpy only handle CPU tensors, not GPU's.
  # move to CPU again
  z = z.to("cpu")  # ``.to`` can also change dtype together!
  print(z)

tensor([1., 1., 1., 1., 1.], device='cuda:0')
tensor([1., 1., 1., 1., 1.])
tensor([2., 2., 2., 2., 2.], device='cuda:0')
tensor([2., 2., 2., 2., 2.])


## **5.3. Putting Tensors (and Models) on GPU**

You can put tensors (and models, we'll see this later) on a specific device by calling [`to(device)`](https://pytorch.org/docs/stable/generated/torch.Tensor.to.html) on them. Where `device` is the target device you'd like the tensor (or model) to go to.
Why do this?
GPUs offer far faster numerical computing than CPUs do and if a GPU isn't available, because of our **device agnostic code** (see above), it'll run on the CPU.
 **Note:** Putting a tensor on GPU using `to(device)` (e.g. `some_tensor.to(device)`) returns a copy of that tensor, e.g. the same tensor will be on CPU and GPU. To overwrite tensors, reassign them:
* `some_tensor = some_tensor.to(device)`

Let's try creating a tensor and putting it on the GPU (if it's available).

In [None]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3]) cpu


tensor([1, 2, 3], device='cuda:0')

If you have a GPU available, the above code will output something like:

```
tensor([1, 2, 3]) cpu
tensor([1, 2, 3], device='cuda:0')
```

Notice the second tensor has `device='cuda:0'`, this means it's stored on the 0th GPU available (GPUs are 0 indexed, if two GPUs were available, they'd be `'cuda:0'` and `'cuda:1'` respectively, up to `'cuda:n'`).



## **5.4. Moving tensors back to CPU**

What if we wanted to move the tensor back to CPU?

For example, you'll want to do this if you want to interact with your tensors with NumPy (NumPy does not leverage the GPU).

Let's try using the [`torch.Tensor.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html) method on our `tensor_on_gpu`.

In [None]:
# If tensor is on GPU, can't transform it to NumPy (this will error)
tensor_on_gpu.numpy()

TypeError: ignored

Instead, to get a tensor back to CPU and usable with NumPy we can use [`Tensor.cpu()`](https://pytorch.org/docs/stable/generated/torch.Tensor.cpu.html).
This copies the tensor to CPU memory so it's usable with CPUs.

In [None]:
# Instead, copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3])

The above returns a copy of the GPU tensor in CPU memory so the original tensor is still on GPU.

In [None]:
tensor_on_gpu, tensor_back_on_cpu

(tensor([1, 2, 3], device='cuda:0'), array([1, 2, 3]))

## Exercises

All of the exercises are focused on practicing the code above.

You should be able to complete them by referencing each section or by following the resource(s) linked.

**Resources:**

* [Exercise template notebook for 00](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/00_pytorch_fundamentals_exercises.ipynb).
* [Example solutions notebook for 00](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/solutions/00_pytorch_fundamentals_exercise_solutions.ipynb) (try the exercises *before* looking at this).

1. Documentation reading - A big part of deep learning (and learning to code in general) is getting familiar with the documentation of a certain framework you're using. We'll be using the PyTorch documentation a lot throughout the rest of this course. So I'd recommend spending 10-minutes reading the following (it's okay if you don't get some things for now, the focus is not yet full understanding, it's awareness). See the documentation on [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch-tensor) and for [`torch.cuda`](https://pytorch.org/docs/master/notes/cuda.html#cuda-semantics).
2. Create a random tensor with shape `(7, 7)`.
3. Perform a matrix multiplication on the tensor from 2 with another random tensor with shape `(1, 7)` (hint: you may have to transpose the second tensor).
4. Set the random seed to `0` and do exercises 2 & 3 over again.
5. Speaking of random seeds, we saw how to set it with `torch.manual_seed()` but is there a GPU equivalent? (hint: you'll need to look into the documentation for `torch.cuda` for this one). If there is, set the GPU random seed to `1234`.
6. Create two random tensors of shape `(2, 3)` and send them both to the GPU (you'll need access to a GPU for this). Set `torch.manual_seed(1234)` when creating the tensors (this doesn't have to be the GPU random seed).
7. Perform a matrix multiplication on the tensors you created in 6 (again, you may have to adjust the shapes of one of the tensors).
8. Find the maximum and minimum values of the output of 7.
9. Find the maximum and minimum index values of the output of 7.
10. Make a random tensor with shape `(1, 1, 1, 10)` and then create a new tensor with all the `1` dimensions removed to be left with a tensor of shape `(10)`. Set the seed to `7` when you create it and print out the first tensor and it's shape as well as the second tensor and it's shape.

## Extra-curriculum

* Spend 1-hour going through the [PyTorch basics tutorial](https://pytorch.org/tutorials/beginner/basics/intro.html) (I'd recommend the [Quickstart](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) and [Tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html) sections).
* To learn more on how a tensor can represent data, see this video: [What's a tensor?](https://youtu.be/f5liqUk0ZTw)