# **PyTorch Fundamentals**

## **Importing PyTorch**

In [104]:
import torch
torch.__version__

'2.8.0'

## **Introduction to Tensors**
### **Creating tensors**

PyTorch loves tensors. So much so there's a whole documentation page dedicated to the [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html) class. [Read through the documentation on `torch.Tensor`](https://pytorch.org/docs/stable/tensors.html). The first thing we're going to create is a **scalar**. A scalar is a single number and in tensor-speak it's a zero dimension tensor.

In [105]:
scalar = torch.tensor(7)
scalar

tensor(7)

See how the above printed out `tensor(7)`? That means although `scalar` is a single number, it's of type `torch.Tensor`. We can check the dimensions of a tensor using the `ndim` attribute.

In [106]:
scalar.ndim

0

What if we wanted to retrieve the number from the tensor? As in, turn it from `torch.Tensor` to a Python integer? To do we can use the `item()` method.

In [107]:
scalar.item()

7

A vector is a single dimension tensor but can contain many numbers. As in, we could have a vector `[3, 2]` to describe `[bedrooms, bathrooms]` in our house. Or we could have $[3, 2, 2]$ to describe `[bedrooms, bathrooms, car_parks]` in our house. The important trend here is that a vector is flexible in what it can represent (the same with tensors).

In [108]:
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

In [109]:
vector.ndim

1

The `vector` contains two numbers but only has a single dimension. We can tell the number of dimensions a tensor in PyTorch has by the number of square brackets on the outside (`[`) and you only need to count one side. Another important concept for tensors is their `shape` attribute. The shape tells you how the elements inside them are arranged.

In [110]:
vector.shape

torch.Size([2])

The above returns `torch.Size([2])` which means our vector has a shape of `[2]`. This is because of the two elements we placed inside the square brackets (`[7, 7]`). Let's now see a **matrix**.

In [111]:
MATRIX = torch.tensor([[7, 8], 
                       [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [112]:
MATRIX.ndim

2

In [113]:
MATRIX.shape

torch.Size([2, 2])

We get the output `torch.Size([2, 2])` because `MATRIX` is two elements deep and two elements wide. Let's create a **tensor** now.

In [114]:
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

In [115]:
TENSOR.ndim

3

In [116]:
TENSOR.shape

torch.Size([1, 3, 3])

Alright, it outputs `torch.Size([1, 3, 3])`. The dimensions go outer to inner. Size is basically the number of objects in each dimension going from outer to inner.

$$
\begin{array}{cc}

\textbf{Scalar} & \textbf{Vector} \\[6pt]
7 & 
\left[\begin{array}{c} 
7 \\ 
4 
\end{array}\right] 
\quad \text{or} \quad 
\left[\begin{array}{cc} 
7 & 4 
\end{array}\right] \\[16pt]

\textbf{Matrix} & \textbf{Tensor} \\[6pt]
\left[\begin{array}{cc}
7 & 10 \\
4 & 3 \\
5 & 1
\end{array}\right]
&
\left[
\begin{array}{cccc}
\left[\begin{array}{c}7&4\end{array}\right] &
\left[\begin{array}{c}0&1\end{array}\right] \\
\left[\begin{array}{c}1&9\end{array}\right] &
\left[\begin{array}{c}2&3\end{array}\right] \\
\left[\begin{array}{c}5&6\end{array}\right] &
\left[\begin{array}{c}8&8\end{array}\right] \\
\end{array}
\right]

\end{array}
$$

> **Note:** We use lowercase letters for `scalar` and `vector` and uppercase letters for `MATRIX` and `TENSOR`. This was on purpose. In practice, we'll often see scalars and vectors denoted as lowercase letters such as `y` or `a`. And matrices and tensors denoted as uppercase letters such as `X` or `W`.
>
> We also might notice the names martrix and tensor used interchangably. This is common. Since in PyTorch you're often dealing with `torch.Tensor`s (hence the tensor name), however, the shape and dimensions of what's inside will dictate what it actually is.

Let's summarise.

| Name | What is it? | Number of dimensions | Lower or upper (usually/example) |
| ----- | ----- | ----- | ----- |
| **scalar** | a single number | $0$ | Lower (`a`) | 
| **vector** | a number with direction (e.g. wind speed with direction) but can also have many other numbers | $1$ | Lower (`y`) |
| **matrix** | a $2$-dimensional array of numbers | $2$ | Upper (`Q`) |
| **tensor** | an $n$-dimensional array of numbers | can be any number, a $0$-dimension tensor is a scalar, a $1$-dimension tensor is a vector | Upper (`X`) | 

### **Random tensors**

We've established tensors represent some form of data. Machine learning models such as neural networks manipulate and seek patterns within tensors. But when building machine learning models with PyTorch, it's rare we'll create tensors by hand (like what we've been doing). Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

> In essence: $\newline$
> **Start with random numbers $\implies$ look at data $\implies$ update random numbers $\implies$ look at data $\implies$ update random numbers...**

As a data scientist, we can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers. Let's see how to create a tensor of random numbers. We can do so using [`torch.rand()`](https://pytorch.org/docs/stable/generated/torch.rand.html) and passing in the `size` parameter.

In [117]:
random_tensor = torch.rand(size=(2,2,2))
random_tensor, random_tensor.dtype

(tensor([[[0.8694, 0.5677],
          [0.7411, 0.4294]],
 
         [[0.8854, 0.5739],
          [0.2666, 0.6274]]]),
 torch.float32)

The flexibility of `torch.rand()` is that we can adjust the `size` to be whatever we want. For example, say you wanted a random tensor in the common image shape of $[224, 224, 3]$ (`[height, width, color_channels`]).

In [118]:
random_image_size_tensor = torch.rand(size=(224, 224, 3))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([224, 224, 3]), 3)

### **Zeros and ones**

Sometimes we'll just want to fill tensors with zeros or ones. This happens a lot with masking (like masking some of the values in one tensor with zeros to let a model know not to learn them). Let's create a tensor full of zeros with [`torch.zeros()`](https://pytorch.org/docs/stable/generated/torch.zeros.html). Again, the `size` parameter comes into play.

In [119]:
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

We can do the same to create a tensor of all ones except using [`torch.ones()` ](https://pytorch.org/docs/stable/generated/torch.ones.html) instead.

In [120]:
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

### **Creating a range and tensors like**

Sometimes we might want a range of numbers, such as $1 \to 10$ or $0 \to 100$. We can use `torch.arange(start, end, step)` to do so.
- `start` = start of range (e.g. $0$)
- `end` = end of range (e.g. $10$)
- `step` = how many steps in between each value (e.g. $1$)
$\newline$
> **Note:** In Python, we can use `range()` to create a range. However in PyTorch, `torch.arange()` is deprecated and may show an error in the future.

In [121]:
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Sometimes we might want one tensor of a certain type with the same shape as another tensor. For example, a tensor of all zeros with the same shape as a previous tensor.  To do so we can use [`torch.zeros_like(input)`](https://pytorch.org/docs/stable/generated/torch.zeros_like.html) or [`torch.ones_like(input)`](https://pytorch.org/docs/1.9.1/generated/torch.ones_like.html) which return a tensor filled with zeros or ones in the same shape as the `input` respectively.

In [122]:
ten_zeros = torch.zeros_like(input=zero_to_ten)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [123]:
ten_ones = torch.ones_like(input=ten_zeros)
ten_ones

tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

### **Tensor datatypes**

There are many different [tensor datatypes](https://pytorch.org/docs/stable/tensors.html#data-types) available in PyTorch. Some are specific for CPU and some are better for GPU. Getting to know which one can take some time. Generally if we see `torch.cuda` anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA). The most common type (and generally the default) is `torch.float32` or `torch.float`. This is referred to as $32$-bit floating point. But there's also $16$-bit floating point (`torch.float16` or `torch.half`) and $64$-bit floating point (`torch.float64` or `torch.double`). And to confuse things even more there's also $8$-bit, $16$-bit, $32$-bit and $64$-bit integers. Plus more! The reason for all of these is to do with **precision in computing**. Precision is the amount of detail used to describe a number. The higher the precision value $(8, 16, 32)$, the more detail and hence data used to express a number. This matters in deep learning and numerical computing because we're making so many operations, the more detail we have to calculate on, the more compute we have to use. So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate).

**Resources:** 
  * [PyTorch documentation for a list of all available tensor datatypes](https://pytorch.org/docs/stable/tensors.html#data-types).
  * [Wikipedia page for an overview of what precision in computing](https://en.wikipedia.org/wiki/Precision_(computer_science)).

Let's see how to create some tensors with specific datatypes. We can do so using the `dtype` parameter. The default datatype for tensors is `float32`.

In [124]:
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations performed on the tensor are recorded 
float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

Aside from shape issues (tensor shapes don't match up), two of the other most common issues we'll come across in PyTorch are datatype and device issues. For example, one of tensors is `torch.float32` and the other is `torch.float16` (PyTorch often likes tensors to be the same format). Or one of our tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be on the same device). We'll see more of this device talk later on. For now let's create a tensor with `dtype=torch.float16`.

In [125]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work
float_16_tensor.dtype

torch.float16

## **Getting information from tensors**

Once we've created tensors (or someone else or a PyTorch module has created them for you), we might want to get some information from them.

We've seen these before but three of the most common attributes we'll want to find out about tensors are:
* `shape` - what shape is the tensor? (some operations require specific shape rules)
* `dtype` - what datatype are the elements within the tensor stored in?
* `device` - what device is the tensor stored on? (usually GPU or CPU)

Let's create a random tensor and find out details about it.

In [126]:
some_tensor = torch.rand(3, 4)
print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}")

tensor([[0.6286, 0.9663, 0.7687, 0.4566],
        [0.5745, 0.9200, 0.3230, 0.8613],
        [0.0919, 0.3102, 0.9536, 0.6002]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


## **Tensor Operations**

In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors. A model learns by investigating those tensors and performing a series of operations (could be $1,000,000s+$) on tensors to create a representation of the patterns in the input data. These operations are often a wonderful dance between:
* Addition
* Substraction
* Multiplication (element-wise)
* Division
* Matrix multiplication

And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks. Stacking these building blocks in the right way, we can create the most sophisticated of neural networks (just like lego!).

### **Basic operations**

In [127]:
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [128]:
tensor * 10

tensor([10, 20, 30])

In [129]:
tensor

tensor([1, 2, 3])

In [130]:
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [131]:
tensor = tensor + 10
tensor

tensor([1, 2, 3])

In [132]:
torch.multiply(tensor, 10)

tensor([10, 20, 30])

In [133]:
tensor

tensor([1, 2, 3])

In [134]:
tensor * tensor

tensor([1, 4, 9])

### **Matrix multiplication** 

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is [matrix multiplication](https://www.mathsisfun.com/algebra/matrix-multiplying.html). PyTorch implements matrix multiplication functionality in the [`torch.matmul()`](https://pytorch.org/docs/stable/generated/torch.matmul.html) method. The main two rules for matrix multiplication to remember are:

1. The **inner dimensions** must match:
  * $(3, 2) \times (3, 2)$ won't work
  * $(2, 3) \times (3, 2)$ will work
  * $(3, 2) \times (2, 3)$ will work
2. The resulting matrix has the shape of the **outer dimensions**:
 * $(2, 3) \times (3, 2) \implies (2, 2)$
 * $(3, 2) \times (2, 3) \implies (3, 3)$

**Note:** $@$ in Python is the symbol for matrix multiplication. $\newline$
**Resource:** We can see all of the rules for matrix multiplication using `torch.matmul()` in the [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.matmul.html).

In [135]:
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

The difference between element-wise multiplication and matrix multiplication is the addition of values. For our `tensor` variable with values $[1, 2, 3]$:

| Operation | Calculation | Code |
| ----- | ----- | ----- |
| **Element-wise multiplication** | $[1\times1, 2\times2, 3\times3] = [1, 4, 9]$ | $tensor \times tensor$ |
| **Matrix multiplication** | $[1\times1 + 2\times2 + 3\times3] = [14]$ | `tensor.matmul(tensor)` |


In [136]:
tensor * tensor

tensor([1, 4, 9])

In [137]:
torch.matmul(tensor, tensor)

tensor(14)

In [138]:
tensor @ tensor

tensor(14)

## **Common Errors**

Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors we'll run into in deep learning is shape mismatches.

In [139]:
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)

try:
    print(torch.matmul(tensor_A, tensor_B)) # (this will give error)
except RuntimeError as e:
   print(e)

mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)


We can make matrix multiplication work between `tensor_A` and `tensor_B` by making their inner dimensions match. One of the ways to do this is with a **transpose** (switch the dimensions of a given tensor). We can perform transposes in PyTorch using either:
- `torch.transpose(input, dim0, dim1)` - where `input` is the desired tensor to transpose and `dim0` and `dim1` are the dimensions to be swapped.
- `tensor.T` - where `tensor` is the desired tensor to transpose.

A matrix multiplication like this is also referred to as the [**dot product**](https://www.mathsisfun.com/algebra/vectors-dot-product.html) of two matrices.

In [140]:
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [141]:
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [142]:
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}")
print(f"Multiplying: {tensor_A.shape} @ {tensor_B.T.shape} : Inner Dimensions Match.\n")
print("Output:")
output = torch.matmul(tensor_A, tensor_B.T)
print(output) 
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])
New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])
Multiplying: torch.Size([3, 2]) @ torch.Size([2, 3]) : Inner Dimensions Match.

Output:
tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


In [143]:
torch.mm(tensor_A, tensor_B.T)

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Neural networks are full of matrix multiplications and dot products. The [`torch.nn.Linear()`](https://pytorch.org/docs/1.9.1/generated/torch.nn.Linear.html) module (we'll see this in action later on), also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input $x$ and a weights matrix $A$.
$$
y = x\cdot{A^T} + b
$$
- $x$ is the input to the layer (deep learning is a stack of layers like `torch.nn.Linear()` and others on top of each other).
- $A$ is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent patterns in the data (notice the $T$, that's because the weights matrix gets transposed).
- We might also often see $W$ or another letter like $X$ used to showcase the weights matrix.
- $b$ is the bias term used to slightly offset the weights and inputs.
- $y$ is the output (a manipulation of the input in the hopes to discover patterns in it).

This is a linear function (similar to something like $y = mx+b$ in high school or elsewhere), and can be used to draw a straight line! Let's play around with a linear layer. Try changing the values of `in_features` and `out_features` below and see what happens.

In [144]:
torch.manual_seed(42)
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input 
                         out_features=6) # out_features = describes outer value 
x = tensor_A
output = linear(x)
print(f"Input:\n{x}\n\nInput shape: {x.shape}")

Input:
tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

Input shape: torch.Size([3, 2])


In [145]:
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


### **Finding Aggregations**

Now we've seen a few ways to manipulate tensors, let's run through a few ways to aggregate them (go from more values to less values). First we'll create a tensor and then find the max, min, mean and sum of it. Some methods such as `torch.mean()` require tensors to be in `torch.float32` (the most common) or another specific datatype, otherwise the operation will fail. We can also do the same as above with `torch` methods.

In [146]:
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [147]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


In [148]:
torch.max(x), torch.min(x), torch.mean(x.type(torch.float32)), torch.sum(x)

(tensor(90), tensor(0), tensor(45.), tensor(450))

### **Positional Min/Max**

We can also find the index of a tensor where the max or minimum occurs with [`torch.argmax()`](https://pytorch.org/docs/stable/generated/torch.argmax.html) and [`torch.argmin()`](https://pytorch.org/docs/stable/generated/torch.argmin.html) respectively. This is helpful incase we just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the [softmax activation function](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html)).

In [149]:
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


### **Changing Datatype**

As mentioned, a common issue with deep learning operations is having our tensors in different datatypes. If one tensor is in `torch.float64` and another is in `torch.float32`, we might run into some errors. But there's a fix. We can change the datatypes of tensors using [`torch.Tensor.type(dtype=None)`](https://pytorch.org/docs/stable/generated/torch.Tensor.type.html) where the `dtype` parameter is the datatype we'd like to use. First we'll create a tensor and check its datatype (the default is `torch.float32`). The lower the number (e.g. $32, 16, 8$), the less precise a computer stores the value. And with a lower amount of storage, this generally results in faster computation and a smaller overall model. Mobile-based neural networks often operate with `8-bit` integers, smaller and faster to run but less accurate than their `float32` counterparts. For more on this, read up about [precision in computing](https://en.wikipedia.org/wiki/Precision_(computer_science)).

In [150]:
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

In [151]:
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [152]:
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

### **Reshaping, Stacking, Squeezing and Unsqueezing**

Often times we'll want to reshape or change the dimensions of our tensors without actually changing the values inside them. Deep learning models (neural networks) are all about manipulating tensors in some way. And because of the rules of matrix multiplication, if we've got shape mismatches, we'll run into errors. These methods help you make sure the right elements of your tensors are mixing with the right elements of other tensors. To do so, some popular methods are:

| Method | One-line description |
| ----- | ----- |
| [`torch.reshape(input, shape)`](https://pytorch.org/docs/stable/generated/torch.reshape.html#torch.reshape) | Reshapes `input` to `shape` (if compatible), can also use `torch.Tensor.reshape()`. |
| [`Tensor.view(shape)`](https://pytorch.org/docs/stable/generated/torch.Tensor.view.html) | Returns a view of the original tensor in a different `shape` but shares the same data as the original tensor. |
| [`torch.stack(tensors, dim=0)`](https://pytorch.org/docs/1.9.1/generated/torch.stack.html) | Concatenates a sequence of `tensors` along a new dimension (`dim`), all `tensors` must be same size. |
| [`torch.squeeze(input)`](https://pytorch.org/docs/stable/generated/torch.squeeze.html) | Squeezes `input` to remove all the dimenions with value `1`. |
| [`torch.unsqueeze(input, dim)`](https://pytorch.org/docs/1.9.1/generated/torch.unsqueeze.html) | Returns `input` with a dimension value of `1` added at `dim`. | 
| [`torch.permute(input, dims)`](https://pytorch.org/docs/stable/generated/torch.permute.html) | Returns a *view* of the original `input` with its dimensions permuted (rearranged) to `dims`. | 

In [153]:
x = torch.arange(1., 8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

In [154]:
x_reshaped = x.reshape(1, 7)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [155]:
z = x.view(1, 7)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [156]:
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

In [157]:
x_stacked = torch.stack([x, x, x, x], dim=0)
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.]])

In [158]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")
x_squeezed = x_reshaped.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
Previous shape: torch.Size([1, 7])

New tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
New shape: torch.Size([7])


In [159]:
print(f"Previous tensor: {x_squeezed}")
print(f"Previous shape: {x_squeezed.shape}")
x_unsqueezed = x_squeezed.unsqueeze(dim=0)
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")

Previous tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
Previous shape: torch.Size([7])

New tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
New shape: torch.Size([1, 7])


In [160]:
x_original = torch.rand(size=(224, 224, 3))
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0
print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


## **Indexing**

Sometimes we'll want to select specific data from tensors (for example, only the first column or second row). To do so, we can use indexing. Indexing in PyTorch with tensors is very similar to indexing on Python lists or NumPy arrays. We can use `:` to specify "all values in this dimension" and then use a comma (`,`) to add another dimension.

In [161]:
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [162]:
print(f"First square bracket:\n{x[0]}") 
print(f"Second square bracket: {x[0][0]}") 
print(f"Third square bracket: {x[0][0][0]}")

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


In [163]:
x[:, 0]

tensor([[1, 2, 3]])

In [164]:
x[:, :, 1]

tensor([[2, 5, 8]])

In [165]:
x[:, 1, 1]

tensor([5])

In [166]:
x[0, 0, :]

tensor([1, 2, 3])

In [167]:
x[0, :, 0]

tensor([1, 4, 7])

In [168]:
x[0, :, 1]

tensor([2, 5, 8])

## **PyTorch Tensors & NumPy**

Since NumPy is a popular Python numerical computing library, PyTorch has functionality to interact with it nicely. The two main methods we'll want to use for NumPy to PyTorch (and back again) are: 
- [`torch.from_numpy(ndarray)`](https://pytorch.org/docs/stable/generated/torch.from_numpy.html): NumPy array $\implies$ PyTorch tensor. 
- [`torch.Tensor.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html): PyTorch tensor $\implies$ NumPy array.

By default, NumPy arrays are created with the datatype `float64` and if we convert it to a PyTorch tensor, it'll keep the same datatype (as above). However, many PyTorch calculations default to using `float32`.  So if we want to convert our NumPy array (`float64`) to PyTorch tensor (`float64`) to PyTorch tensor (`float32`), we can use `tensor = torch.from_numpy(array).type(torch.float32)`. Because we reassigned `tensor` above, if we change the tensor, the array stays the same.

In [169]:
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [170]:
array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [171]:
tensor = torch.ones(7) 
numpy_tensor = tensor.numpy() 
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

In [172]:
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

## **Reproducibility** 

As we learn more about neural networks and machine learning, we'll start to discover how much randomness plays a part. Well, pseudorandomness that is. Because after all, as they're designed, a computer is fundamentally deterministic (each step is predictable) so the randomness they create are simulated randomness (though there is debate on this too). How does this relate to neural networks and deep learning then? We've discussed neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to improve those random numbers using tensor operations (and a few other things we haven't discussed yet) to better describe patterns in data. 

In short:  `Start with random numbers` $\implies$ `Tensor operations` $\implies$ `Try to make better (again and again and again).` Although randomness is nice and powerful, sometimes we'd like there to be a little less randomness. Why? So we can perform repeatable experiments. For example, we create an algorithm capable of achieving X performance. And then our friend tries it out to verify we're not crazy. How could they do such a thing? That's where **reproducibility** comes in. In other words, can you get the same (or very similar) results on your computer running the same code as I get on mine?

In [173]:
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)
print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B

Tensor A:
tensor([[0.8016, 0.3649, 0.6286, 0.9663],
        [0.7687, 0.4566, 0.5745, 0.9200],
        [0.3230, 0.8613, 0.0919, 0.3102]])

Tensor B:
tensor([[0.9536, 0.6002, 0.0351, 0.6826],
        [0.3743, 0.5220, 0.1336, 0.9666],
        [0.9754, 0.8474, 0.8988, 0.1105]])

Does Tensor A equal Tensor B? (anywhere)


tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

Just as we might've expected, the tensors come out with different values. But what if we wanted to create two random tensors with the *same* values. As in, the tensors would still contain random values but they would be of the same flavour. That's where [`torch.manual_seed(seed)`](https://pytorch.org/docs/stable/generated/torch.manual_seed.html) comes in, where `seed` is an integer (like `42` but it could be anything) that flavours the randomness.

In [174]:
import random
RANDOM_SEED=42 
torch.manual_seed(seed=RANDOM_SEED) 
random_tensor_C = torch.rand(3, 4)
torch.random.manual_seed(seed=RANDOM_SEED) 
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Does Tensor C equal Tensor D? (anywhere)


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

## **Running Tensors on GPUs** 

Deep learning algorithms require a lot of numerical operations. And by default these operations are often done on a CPU (computer processing unit). However, there's another common piece of hardware called a GPU (graphics processing unit), which is often much faster at performing the specific types of operations neural networks need (matrix multiplications) than CPUs. Our computer might have one. If so, we should look to use it whenever we can to train neural networks because chances are it'll speed up the training time dramatically. There are a few ways to first get access to a GPU and secondly get PyTorch to use the GPU.

**Note:** When we reference "GPU" throughout this course, we're referencing a [Nvidia GPU with CUDA](https://developer.nvidia.com/cuda-gpus) enabled (CUDA is a computing platform and API that helps allow GPUs be used for general purpose computing & not just graphics) unless otherwise specified.

### 1. **Getting a GPU**

| **Method** | **Difficulty to setup** | **Pros** | **Cons** | **How to setup** |
| ----- | ----- | ----- | ----- | ----- |
| Google Colab | Easy | Free to use, almost zero setup required, can share work with others as easy as a link | Doesn't save your data outputs, limited compute, subject to timeouts | [Follow the Google Colab Guide](https://colab.research.google.com/notebooks/gpu.ipynb) |
| Use your own | Medium | Run everything locally on your own machine | GPUs aren't free, require upfront cost | Follow the [PyTorch installation guidelines](https://pytorch.org/get-started/locally/) |
| Cloud computing (AWS, GCP, Azure) | Medium-Hard | Small upfront cost, access to almost infinite compute | Can get expensive if running continually, takes some time to setup right | Follow the [PyTorch installation guidelines](https://pytorch.org/get-started/cloud-partners/) |

### 2. **Getting PyTorch to run on the GPU**

Once we've got a GPU ready to access, the next step is getting PyTorch to use for storing data (tensors) and computing on data (performing operations on tensors). To do so, we can use the [`torch.cuda`](https://pytorch.org/docs/stable/cuda.html) package or [`torch.mps`](https://docs.pytorch.org/docs/stable/mps.html) package.

In [175]:
torch.cuda.is_available()

False

In [176]:
torch.mps.is_available()

True

If any of the above outputs `True`, PyTorch can see and use the GPU, if it outputs `False`, it can't see the GPU and in that case, we'll have to go back through the installation steps. Now, let's say we wanted to setup our code so it ran on CPU *or* the GPU if it was available. That way, if we or someone decides to run our code, it'll work regardless of the computing device they're using.  Let's create a `device` variable to store what kind of device is available.

In [177]:
if torch.cuda.is_available():
    device = "cude"
elif torch.mps.is_available():
    device = "mps"
else:
    device = "cpu"
device

'mps'

In [178]:
torch.mps.device_count()

1

### 3. **Putting Tensors (and models) on the GPU**

We can put tensors (and models, we'll see this later) on a specific device by calling [`to(device)`](https://pytorch.org/docs/stable/generated/torch.Tensor.to.html) on them. Where `device` is the target device we'd like the tensor (or model) to go to. Why do this? GPUs offer far faster numerical computing than CPUs do and if a GPU isn't available, because of our **device agnostic code** (see above), it'll run on the CPU.

**Note:** Putting a tensor on GPU using `to(device)` (e.g. `some_tensor.to(device)`) returns a copy of that tensor, e.g. the same tensor will be on CPU and GPU. To overwrite tensors, reassign them: `some_tensor = some_tensor.to(device)`.

In [179]:
tensor = torch.tensor([1, 2, 3])
print(tensor, tensor.device)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3]) cpu


tensor([1, 2, 3], device='mps:0')

The second tensor has `device='cuda:0'` or `device='mps:0'`, this means it's stored on the 0th GPU available (GPUs are 0 indexed, if two GPUs were available, they'd be `'cuda:0'` and `'cuda:1'` respectively, up to `'cuda:n'`).

### 4. **Moving tensors back to the CPU**

What if we wanted to move the tensor back to CPU? For example, we'll want to do this if we want to interact with our tensors with NumPy (NumPy does not leverage the GPU). Let's try using the [`torch.Tensor.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html) method on our `tensor_on_gpu`.

In [180]:
try:
    tensor_on_gpu.numpy() # (this will give error)
except TypeError as e:
   print(e)

can't convert mps:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.


Instead, to get a tensor back to CPU and usable with NumPy we can use [`Tensor.cpu()`](https://pytorch.org/docs/stable/generated/torch.Tensor.cpu.html).This copies the tensor to CPU memory so it's usable with CPUs.

In [181]:
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3])

In [182]:
tensor_on_gpu

tensor([1, 2, 3], device='mps:0')

## **Exercises**

All of the exercises are focused on practicing the code above. You should be able to complete them by referencing each section or by following the resource(s) linked.

**Resources:**

* [Exercise template notebook for 00](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/00_pytorch_fundamentals_exercises.ipynb).
* [Example solutions notebook for 00](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/solutions/00_pytorch_fundamentals_exercise_solutions.ipynb) (try the exercises *before* looking at this).

1. Documentation reading $-$ A big part of deep learning (and learning to code in general) is getting familiar with the documentation of a certain framework you're using. We'll be using the PyTorch documentation a lot throughout the rest of this course. So I'd recommend spending $10$-minutes reading the following (it's okay if you don't get some things for now, the focus is not yet full understanding, it's awareness). See the documentation on [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch-tensor) and for [`torch.cuda`](https://pytorch.org/docs/master/notes/cuda.html#cuda-semantics).
2. Create a random tensor with shape `(7, 7)`.
3. Perform a matrix multiplication on the tensor from $2$ with another random tensor with shape `(1, 7)` (hint: you may have to transpose the second tensor).
4. Set the random seed to `0` and do exercises $2$ & $3$ over again.
5. Speaking of random seeds, we saw how to set it with `torch.manual_seed()` but is there a GPU equivalent? (hint: you'll need to look into the documentation for `torch.cuda` for this one). If there is, set the GPU random seed to `1234`.
6. Create two random tensors of shape `(2, 3)` and send them both to the GPU (you'll need access to a GPU for this). Set `torch.manual_seed(1234)` when creating the tensors (this doesn't have to be the GPU random seed).
7. Perform a matrix multiplication on the tensors you created in $6$ (again, you may have to adjust the shapes of one of the tensors).
8. Find the maximum and minimum values of the output of $7$.
9. Find the maximum and minimum index values of the output of $7$.
10. Make a random tensor with shape `(1, 1, 1, 10)` and then create a new tensor with all the `1` dimensions removed to be left with a tensor of shape `(10)`. Set the seed to `7` when you create it and print out the first tensor and it's shape as well as the second tensor and it's shape.

## **Extra-curriculum**

- Spend 1-hour going through the [PyTorch basics tutorial](https://pytorch.org/tutorials/beginner/basics/intro.html) (I'd recommend the [Quickstart](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) and [Tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html) sections).
- To learn more on how a tensor can represent data, see this video: [What's a tensor?](https://youtu.be/f5liqUk0ZTw)

---