# 00. Tensors

This Jupyter Notebook is authored by [awdev](https://github.com/AWeirdScratcher), which contains summarizations and partial contents from [learnpytorch.io](https://learnpytorch.io). To learn more about this notebook, see [🐙 AWeirdScratcher/models](https://github.com/AWeirdScratcher/models) to find out more.

Note: Click on the first icon on the left navigation to explore the outlines of this notebook.

<br />

[![Learn PyTorch](https://img.shields.io/badge/Learn-%23EE4C2C.svg?style=for-the-badge&logo=PyTorch&logoColor=white)](https://www.learnpytorch.io/00_pytorch_fundamentals/)
[![Ramptix](https://img.shields.io/badge/%E2%AC%9C%20%E2%94%82%20Ramptix-%23202020?style=for-the-badge)](https://github.com/ramptix)

In [1]:
import torch
from IPython.display import HTML, display

device = "cuda" if torch.cuda.is_available() else "cpu"

## Basics
Tensors are one of the core building blocks of Pytorch, and there are many types of them, including:

- Scalar
- Vector
- Matrix
- Tensor

Let's go through them one by one.

<br /><hr /><br />

Scalar: A single number, and in "tensor-speak," it's a zero dimension tensor.

- Dimensions: `0`
- Example usage: `a` (lower)

An example of a scalar would be like:

```python
a = torch.tensor(69)
```

In [2]:
# Example scalar
scalar = torch.tensor(4.2)

print(scalar)
print("shape:", scalar.shape)
print("dimensions:", scalar.ndim)

tensor(4.2000)
shape: torch.Size([])
dimensions: 0


**Vector**: A single-dimensional tensor that can contain many numbers.
- Dimensions: `1`
- Example usage: `y` (lower)

An example of a vector would be like:

```python
y = torch.tensor([1, 2, 3, 4, ...])
```

There's only one dimension, sort of like a 1d-line.

In [3]:
vector = torch.tensor([6, 9, 4, 2, 0])

print(vector)
print("shape:", vector.shape)
print("dimensions:", vector.ndim)

tensor([6, 9, 4, 2, 0])
shape: torch.Size([5])
dimensions: 1


> **Counting Dimensions**
>
> To count the number of dimensions, check how many open square brackets (`[`) there are at the beginning of the tensor's `repr`. For example, a tensor like-
> ```python
> tensor([1, 2, 3])
> ```
> ...would have 1 dimension, because there are two open square brackets.
>
> (Summarization)

In [4]:
# @title Exercise { display-mode: "form"}
# @markdown How many dimensions:
# @markdown ```python
# @markdown tensor([[1, 9], [8, 9]])
# @markdown ```
answer = -1 # @param { type: "number" }

In [5]:
# @title Exercise: Check Answer { display-mode: "form" }
# @markdown Run this cell to check your answer.
if answer == -1:
  print("Please enter your answer.")

print("Correct!" if answer == 2 else "Incorrect")

Please enter your answer.
Incorrect


**Matrix**: Has two dimensions (can contain "sub-lists" in the concept of vanilla Python).

- Dimensions: `2`
- Example usage: `Q` (upper)

An example of a matrix would be like:

```python
Q = torch.tensor([
  [1, 9, 8, 9],
  [0, 6, 0, 4]
])
```

In [6]:
MATRIX = torch.tensor([
    [1, 2, 3],
    [4, 5, 6],
])

MATRIX

tensor([[1, 2, 3],
        [4, 5, 6]])

In [7]:
# Check its shape & number of dimensions
MATRIX.shape, MATRIX.ndim

(torch.Size([2, 3]), 2)

The shape implies that the first dimension has `2` elements, and the second one has `3` elements.

Alternatively, you can think of the shape `torch.Size([2, 3])` as a 2x3 grid (`2` in height, `3` in width).

**Tensor**: An $n$-dimensional array of numbers.

- Dimensions: $n$
- Example usage: `X` (upper)

An example of an $n$-dimensional tensor would be:

```python
X = torch.tensor([
  [
    [0, 1, 2],
    [3, 4, 5]
  ]
])
```

In [8]:
TENSOR = torch.tensor([
    [
        [1, 9, 8, 9],
        [0, 6, 0, 4]
    ]
])

TENSOR

tensor([[[1, 9, 8, 9],
         [0, 6, 0, 4]]])

In [9]:
TENSOR.shape, TENSOR.ndim

(torch.Size([1, 2, 4]), 3)

This implies that there are three dimensions.

## Tensors on GPU

You can store tensors on GPU as well.

In [10]:
if device != 'cuda':
  print("You don't have CUDA.")
else:
  # put a tensor on cuda gpu
  a = torch.tensor([1, 2, 3], device="cuda")
  print(a)

tensor([1, 2, 3], device='cuda:0')


If you have an existing tensor to put on CUDA, use `tensor.to(device)`.

In [11]:
if device != 'cuda':
  print("You don't have CUDA.")
else:
  a = torch.tensor([1, 2, 3])
  a = a.to("cuda")
  print(a)

tensor([1, 2, 3], device='cuda:0')


In [12]:
# Or, convert to CPU
a = torch.tensor([1, 2, 3], device='cpu')
# a.to('cpu')

print(a)

tensor([1, 2, 3])


## GPU Device

In [13]:
!nvidia-smi

Mon Jan 29 10:41:33 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P0              26W /  70W |    105MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Random Tensors

A machine learning model often starts out with large random tensors of numbers and adjusts these them as it works through data to better represent it.

In essence:

```markdown
1. Start with random numbers
2. Look at data
3. Update random numbers
4. Start with random numbers
5. Look at data
...continue
```

<br />

<hr />

<br />

To create a random tensor in Pytorch, use the `torch.rand(size)` function, where `size` is the shape of the tensor.

In [14]:
# Create a random tensor
tensor_a = torch.rand(3, 5) # create a 3x5
tensor_a

tensor([[0.1796, 0.7897, 0.8344, 0.9858, 0.3416],
        [0.6639, 0.2644, 0.1054, 0.4837, 0.2422],
        [0.7584, 0.2750, 0.1260, 0.2989, 0.8472]])

There's also `torch.randint(low=0, high, size)` for creating tensors filled with random integers generated uniformly between `low` and `high`.

In [15]:
# Generate integers from one to five in a vector
torch.randint(1, 5, size=(10,))

tensor([4, 1, 3, 4, 3, 1, 2, 2, 1, 1])

### Manual Seed

To make the result more reproducible, use `torch.manual_seed(seed)`.

> <sub><b><a href="https://pytorch.org/docs/stable/generated/torch.manual_seed.html#torch-manual-seed" target="_blank">TORCH.MANUAL_SEED</a></b></sub>
>
> **`torch.manual_seed(seed: int)`** [\[SOURCE\]](https://pytorch.org/docs/stable/_modules/torch/random.html#manual_seed)
>
> Sets the seed for generating random numbers.
> - seed (<a href="https://docs.python.org/3/library/functions.html#int"><i>int</i></a>) - The desired seed.

In [16]:
torch.manual_seed(42)

<torch._C.Generator at 0x7fe03878d5b0>

Now create a random tensor and see if it's:

```python
tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009]])
```

...at the first time.

In [17]:
random_tensor = torch.rand(2, 3)
random_tensor

tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009]])

### Manual Seed for CUDA

To set a seed for CUDA (GPU), use `torch.cuda.manual_seed(seed)`.

In [18]:
if device == 'cuda':
  torch.cuda.manual_seed(42)
  cuda_random_tensor = torch.rand(2, 3, device=device)
  print(cuda_random_tensor)
else:
  print("cuda not available, got gpu.")

tensor([[0.6130, 0.0101, 0.3984],
        [0.0403, 0.1563, 0.4825]], device='cuda:0')


## Zeros & Ones

Create an $n$-dimensional tensor filled with zeros or ones with a given `size` (desired output shape) using `torch.zeros` or `torch.ones`.

In [19]:
zeros = torch.zeros(3, 4)
ones = torch.ones(3, 4)

print(zeros)
print(ones)

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])


## Zeros-like & Ones-like

Sometimes you might want to create a tensor filled with zeros or ones based on the shape of another tensor.

You can achieve this by using `torch.zeros_like(input)` or `torch.ones_like(input)`.

In [20]:
# Create a random tensor (5x5)
x = torch.rand(5, 5)

zeros = torch.zeros_like(x)
ones = torch.ones_like(x)

print(zeros)
print(ones)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])


In [21]:
# @title Equivalent (using `x.shape`)
shape = x.shape

zeros = torch.zeros(shape)
ones = torch.ones(shape)

print(zeros)
print(ones)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])


## Filling Tensors

Sometimes you might want to fill the whole tensor with a number.

You can achieve this using `torch.fill()` or `x.fill_()`

In [22]:
# Create a random matrix (3x3)
x = torch.rand(3, 3)
x

tensor([[0.0753, 0.8860, 0.5832],
        [0.3376, 0.8090, 0.5779],
        [0.9040, 0.5547, 0.3423]])

In [23]:
# Fill the tensor with 7
torch.fill(x, 7)

tensor([[7., 7., 7.],
        [7., 7., 7.],
        [7., 7., 7.]])

In [24]:
# Or use tensor.fill_()
x.fill_(7)

tensor([[7., 7., 7.],
        [7., 7., 7.],
        [7., 7., 7.]])

You can also fill the tensor with zeros in one move!

In [25]:
x.zero_()

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

> **Warning**
>
> Using `tensor.fill_()` or `tensor.zero_()` will *directly* affect the original tensor. If you don't want that to happen, just use `torch.fill()` or [`torch.zeros_like()`](#scrollTo=Zeros_like_Ones_like).

In [26]:
tensor = torch.rand(2, 3)

In [27]:
# torch.fill()
print("torch.fill()")
print(torch.fill(tensor, 7))
print("tensor:")
print(tensor)

torch.fill()
tensor([[7., 7., 7.],
        [7., 7., 7.]])
tensor:
tensor([[0.6343, 0.3644, 0.7104],
        [0.9464, 0.7890, 0.2814]])


In [28]:
# torch.zeros_like()
print("torch.zeros_like()")
print(torch.zeros_like(tensor))
print("tensor:")
print(tensor)

torch.zeros_like()
tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor:
tensor([[0.6343, 0.3644, 0.7104],
        [0.9464, 0.7890, 0.2814]])


## Range

Use `torch.arange(start, end, step)` to create a tensor of a range of numbers like 0 to 100.

> <sub><b><a href="https://pytorch.org/docs/stable/generated/torch.arange.html#torch-arange" target="_blank">TORCH.ARANGE</a></b></sub>
>
> `torch.arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)`
>
> Returns a 1-D tensor of size $\lceil\frac{end-start}{step}\rceil$ with values from the interval `start, end` taken with common difference step beginning from start.
> - start (Number) - the starting value for the set of points. Default: `0`.
> - end (Number) - the ending value for the set of points
> - step (Number) - the gap between each pair of adjacent points. Default: `1`.

In [29]:
range_ = torch.arange(start=0, end=20, step=1)
range_

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19])

In [30]:
# Try changing the step
range_ = torch.arange(start=0, end=20, step=5)
range_

tensor([ 0,  5, 10, 15])

## Tensor Datatypes

The most common datatype in Pytorch (which is the default) is `torch.float32` (also known as `torch.float`). This is also referred to as "32-bit floating point."

Additionally, there's also a 16-bit floating point (`torch.float16` aka. `torch.half`) and a 64-bit floating point (`torch.float64` aka. `torch.double`).

The reason for using those different datatypes is to do with **precision** in computing, and "precision" is the amount of detail used to describe a number. The higher the value is, the more detail (and hence data) is used to describe a number. However, a higher precision value might lead to a slower performance; a lower precision value is indeed fast but might sacrifice the model's accuracy.

Let's create a tensor with a specific datatype (`float64`).

In [31]:
# float64 datatype
x = torch.rand(3, 3, dtype=torch.float64)
x

tensor([[0.3091, 0.0313, 0.0404],
        [0.9319, 0.1521, 0.2650],
        [0.1304, 0.2519, 0.2334]], dtype=torch.float64)

In [32]:
# Check datatype of a tensor
x.dtype

torch.float64

To change the datatype of a tensor, use `tensor.type()`.

In [33]:
# Convert a tensor to a specific datatype (say, float16)
x.type(torch.half)

tensor([[0.3091, 0.0313, 0.0404],
        [0.9321, 0.1521, 0.2651],
        [0.1305, 0.2520, 0.2334]], dtype=torch.float16)

There's also one datatype I find interesting — `torch.bool`.

It acts like a boolean in Python!

In [34]:
# Create a 5x5 tensor filled with zeros and ones randomly
x = torch.randint(0, 2, size=(5, 5))
x

tensor([[0, 1, 1, 0, 1],
        [1, 1, 1, 0, 1],
        [0, 1, 1, 1, 0],
        [1, 0, 1, 0, 1],
        [0, 0, 1, 0, 1]])

In [35]:
# Convert it into a torch.bool type
# Both x.bool() and x.to(torch.bool) works, it's just a matter of time
%%time
x.bool()

CPU times: user 44 µs, sys: 10 µs, total: 54 µs
Wall time: 58.9 µs


tensor([[False,  True,  True, False,  True],
        [ True,  True,  True, False,  True],
        [False,  True,  True,  True, False],
        [ True, False,  True, False,  True],
        [False, False,  True, False,  True]])

In [36]:
%%time
x.to(torch.bool)

CPU times: user 103 µs, sys: 22 µs, total: 125 µs
Wall time: 131 µs


tensor([[False,  True,  True, False,  True],
        [ True,  True,  True, False,  True],
        [False,  True,  True,  True, False],
        [ True, False,  True, False,  True],
        [False, False,  True, False,  True]])

We can see that `x.to(torch.bool)` is faster than `x.bool()`, as well as the efficiency.

Neat!

## Getting Information from Tensors

Here are the top most common attributes we use:

- `shape` - shape of the tensor (some operations require specific shape rules)
- `ndim` - number of dimensions
- `dtype` - the datatype of the elements in the tensor are stored
- `device` - the device the tensor is stored on (usually `cuda` for GPU or `cpu` for Central Processing Unit)

In [37]:
# Create a random tensor
x = torch.rand(1, 3, 5)

print("shape:", x.shape)
print("dimensions:", x.ndim)
print("datatype:", x.dtype)
print("device:", x.device)

shape: torch.Size([1, 3, 5])
dimensions: 3
datatype: torch.float32
device: cpu


These common attributes come in handy when it comes to debugging/troubleshooting.

When you run into issues in PyTorch, it's very often one to do with one of the three attributes above. So when an error message shows up, ask yourself:

- **what** shape are my tensors?
- **how** many dimensions are there?
- **what** datatype are they?
- **where** are they stored, cpu or gpu?

## Manipulating Tensors

Also known as "tensor operations," these operations are the most important ones:

- Addition
- Subtraction
- Division
- Multiplication (element-wise)
- Matrix multiplication

In [38]:
tensor_a = torch.tensor([1, 2, 3])
tensor_b = torch.tensor([4, 5, 6])

### Basic Operations

Let's start with the fundamental operations: addition (`+`), subtraction (`-`) and multiplication (`*`, element-wise).

In [39]:
# @title Addition (`+`)
tensor_a + tensor_b

tensor([5, 7, 9])

This results in a new tensor (`[5, 7, 9]`).

```python
tensor([1, 2, 3])
tensor([4, 5, 6])
-------------------
tensor([5, 7, 9])
```

You can also use `torch.add(a, b)` to perform this action, it returns the same result.

In [40]:
# Use torch.add()
torch.add(tensor_a, tensor_b)

tensor([5, 7, 9])

In [41]:
# @title Subtraction (`-`)
tensor_a - tensor_b

tensor([-3, -3, -3])

In [42]:
# Use torch.subtract()
torch.subtract(tensor_a, tensor_b)

tensor([-3, -3, -3])

In [43]:
# @title Multiplication
tensor_a * tensor_b

tensor([ 4, 10, 18])

In [44]:
# Use torch.multiply() or torch.mul(), they're aliases
torch.multiply(tensor_a, tensor_b), torch.mul(tensor_a, tensor_b)

(tensor([ 4, 10, 18]), tensor([ 4, 10, 18]))

#### Division

[learnpytorch.io](https://learnpytorch.io) did not mention this, but we'll cover it anyway.

In [45]:
tensor_a / tensor_b

tensor([0.2500, 0.4000, 0.5000])

In [46]:
# Use torch.divide() to divide tensor_a by tensor_b
torch.divide(tensor_a, tensor_b)

tensor([0.2500, 0.4000, 0.5000])

All these basic operations we saw above ($+$, $-$, $*$ , $\div$) are all element-wise operations, which is quite simple.

Now onto something different 😏.

### Matrix Multiplication

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

The main two rules of matrix multiplication are:

1. The **inner dimensions** must match (the following are shapes):
   - ❌ `(1, 2) @ (3, 4)` - Inner dimension does not match (`2 != 3`)
   - ❌ `(2, 3) @ (2, 3)` - Inner dimension does not match (`3 != 2`)
   - ✅ `(2, 3) @ (3, 2)` - Inner dimension matches (`3 == 3`)
2. The resulting matrix has the shape of the **outer dimensions**:
   - `(2, 3) @ (3, 2)` → `(2, 2)`
   - `(6, 9) @ (9, 6)` → `(6, 6)`

> **Note**
>
> `@` in Python is the syntax of matrix multiplication.

<br /><hr /><br />

But why those two rules? Let's go through them one by one.

Let's first create two matricies:

```python
tensor([
  [1, 2, 3],
  [4, 5, 6]
])
shape: torch.Size([2, 3])

tensor([
  [1, 2],
  [3, 4],
  [5, 6]
])
shape: torch.Size([3, 2])
```

<br />

Let's take a look at the visualization of this matrix multiplication `(2, 3) @ (3, 2)`:

<img alt="Matrix Multiplication Visualized"
  src="https://github.com/AWeirdScratcher/models/assets/90096971/e0c710ef-cafa-4b8f-a70f-5ba9f27c7ab5" width="740" />

We get:

```python
tensor([[22, 28], [49, 64]])
```

Matrix multiplication, unlike the *element-wise* multiplication from basic operations we saw earlier (which multiplies `a` by `b` element-by-element), involves "flipping" matricies as you saw from the visualization image.

Let's achieve this in code with PyTorch.

> **Note**
>
> The result (`tensor([[22, 28], [49, 64]])`) is also referred to as the *dot product* of the two matricies.


In [47]:
# Define tensor A and B
tensor_a = torch.tensor([
  [1, 2, 3],
  [4, 5, 6]
])
tensor_b = torch.tensor([
  [1, 2],
  [3, 4],
  [5, 6]
])

In [48]:
result = tensor_a @ tensor_b
result

tensor([[22, 28],
        [49, 64]])

Now try `torch.matmul` or `torch.mm`:

In [49]:
result_matmul = torch.matmul(tensor_a, tensor_b)
result_mm = torch.mm(tensor_a, tensor_b)

print("matmul:")
print(result_matmul)
print("mm:")
print(result_mm)

matmul:
tensor([[22, 28],
        [49, 64]])
mm:
tensor([[22, 28],
        [49, 64]])


Both of them perform matrix multiplication, while the AT (`@`) symbol is shorter, it's not recommended to use this in code.

Now, let's check if `tensor_a` and `tensor_b`'s inner dimensions match.

In [50]:
a_inner = tensor_a.shape[1] # (2x"3")
b_inner = tensor_b.shape[0] # ("3"x2)

a_inner == b_inner

True

They do match!

If we take a look at the shape of the tensor after performing matrix multiplication (named as `result`), it should be a `2x2` because according to **Rule #2**:

> **RULE 2**
>
> The resulting matrix has the shape of the outer dimensions:
> - `(2, 3) @ (3, 2)` → `(2, 2)`
> - `(6, 9) @ (9, 6)` → `(6, 6)`

So in this case:

`(2, 3) @ (3, 2)` → `torch.Size([2, 2])`

Let's check whether our theory is true or not.

In [51]:
print("matrix:")
print(result)

print("shape:")
print(result.shape)

matrix:
tensor([[22, 28],
        [49, 64]])
shape:
torch.Size([2, 2])


But what if we have tensors in different shapes? Can we still perform matrix multiplication in this case?

## Dealing with Shapes

One of the most common errors you'll run into in deep learning or when manipulating (operating) tensors is "shape errors."

Let's say, we want to do matrix multiplication between `tensor_p` and `tensor_q`, but their shapes are completely the same.

In [52]:
tensor_p = torch.tensor([
    [1, 2, 3],
    [4, 5, 6]
])
tensor_q = torch.tensor([
    [7, 8, 9],
    [0, 1, 2]
])

tensor_p.shape, tensor_q.shape

(torch.Size([2, 3]), torch.Size([2, 3]))

...they're both a 2x3.

We can make the matrix multiplication work between tensor $p$ and $q$ by making their inner dimensions match.

One of the ways to do this is with a **transpose** (switch the dimensions of a given tensor).

You can perform transposes in PyTorch using either:

- `torch.transpose(input, dim0, dim1)` - where `input` is the desired tensor to transpose and `dim0` and `dim1` are the dimensions to be swapped.
- `tensor.T` - where `tensor` is the desired tensor to transpose.

If we take a look at the two tensors ($p$ and $q$), we need to swap the dimensions from one of them.

We'll take `tensor_p` as an example.


### torch.transpose

Since our shape is `torch.Size([2, 3])`, it means our first (index `0`) dimension has two items, and the second (index `1`) one has three elements.

We can swap its dimensions using `torch.transpose()`.

In [53]:
transposed_p = torch.transpose(
    input=tensor_p,
    dim0=0, # first dimension
    dim1=1  # second dimension
)
transposed_p, transposed_p.shape

(tensor([[1, 4],
         [2, 5],
         [3, 6]]),
 torch.Size([3, 2]))

In [54]:
tensor_p, tensor_p.shape

(tensor([[1, 2, 3],
         [4, 5, 6]]),
 torch.Size([2, 3]))

In [55]:
# Now you can do matrix multiplication w/out errors
transposed_p @ tensor_q

tensor([[ 7, 12, 17],
        [14, 21, 28],
        [21, 30, 39]])

Comparing the transposed tensor with the original one, we can clearly see it has been sort of "flipped," as well as its shape.

Now we can do matrix multiplication between the transposed $p$ tensor (`3x2`) and the $q$ tensor (`2x3`) since their inner dimensions match.

<details>
  <summary>tensor.transpose</summary>
  <p>

You can also use `tensor.transpose()`.

```python
tensor_p.transpose(0, 1)
```

  </p>
</details>

### tensor.T, tensor.mT

There's also a simpler approach: `tensor.T`.

> **[tensor.T](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.T)**
>
> Returns a view of this tensor with its dimensions reversed.

In [56]:
transposed_p = tensor_p.T # tensor.T
transposed_p, transposed_p.shape

(tensor([[1, 4],
         [2, 5],
         [3, 6]]),
 torch.Size([3, 2]))

In [57]:
# Now do matrix multiplication
transposed_p @ tensor_q

tensor([[ 7, 12, 17],
        [14, 21, 28],
        [21, 30, 39]])

Sidenote, you can transpose the last two dimensions with `tensor.mT`.

> **[tensor.mT](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.mT)**
>
> Returns a view of this tensor with the last two dimensions transposed.
> `x.mT` is equivalent to `x.transpose(-2, -1)`.

In [58]:
tensor_3d = torch.rand(2, 3, 4)
tensor_3d.shape

torch.Size([2, 3, 4])

In [59]:
tensor_3d.mT.shape

torch.Size([2, 4, 3])

### torch.permute

`tensor.mT` only transposes the last two dimensions. If you want to completely reverse the tensor's (`tensor_3d`) dimensions, use `torch.permute()` to permute (rearrange) its dimensions.

> <sub><b><a href="https://pytorch.org/docs/stable/generated/torch.permute.html#torch-permute" target="_blank">TORCH.PERMUTE</a></b></sub>
>
> `torch.permute(input, dims)` → [Tensor](https://pytorch.org/docs/stable/tensors.html#torch.Tensor)
>
> Returns a view of the original tensor input with its dimensions permuted.
>
> - input (<a href="https://pytorch.org/docs/stable/tensors.html#torch.Tensor" target="_blank"><i>Tensor</i></a>) - the input tensor.
> - dims (<i>tuple of int</i>) - The desired ordering of dimensions.

Since there are 3 dimensions, the original indices are:

- dimension 1 (index `0`)
- dimension 2 (index `1`)
- dimension 3 (index `2`)

How do we swap the dimensions thoroughly? You've guessed it! Just flip the dimension indicies around:

- **new** dimension 1 (feed dimension 3, index `2`)
- **new** dimension 2 (feed dimension 2, index `1`)
- **new** dimension 3 (feed dimension 1, index `0`)

In [60]:
print("original shape:", tensor_3d.shape)
print("permuted shape:", torch.permute(tensor_3d, (2, 1, 0)).shape)

original shape: torch.Size([2, 3, 4])
permuted shape: torch.Size([4, 3, 2])


Working as expected! But we have a lot more than just three dimensions?

Let's say, we have 5 dimensions, then the code above (`(2, 1, 0)`) would raise an error.

In [61]:
tensor_5d = torch.rand(1, 2, 3, 4, 5)

try:
  torch.permute(tensor_5d, (2, 1, 0))
except RuntimeError as error:
  # fancy
  display(
      HTML("""\
      <p style="font-family: monospace">
        <span style="color: red">RuntimeError</span>: {}
      </p>""".format(error))
  )

To fix this, we can use [`torch.arange`](#scrollTo=Range) to create a range numbers, starting from the $n$ (number of dimensions) to 0, and convert it into a list.

It generates an Arithmetic sequence like the following, with the common difference as $-1$ and the last item as $0$.

$$
n-1,\ n-2,\ n-3,\ \cdots,\ 0\\
% leave a space after a backslash to create spacing
$$

> **Note**
>
> $n = 5$, which is the number of dimensions (`ndim`) the five-dimensional tensor (`tensor_5d`) has.

In [62]:
reversed_indices = (
    torch.arange((tensor_5d.ndim - 1), -1, step=-1) # n-1, n-2, n-3, ..., 0
    .tolist() # convert to list
)
reversed_indices

[4, 3, 2, 1, 0]

We first start from $n-1$ (index of the last dimension) to `0` (index of the first dimension). The step is `-1` to ensure we're going backwards, in order to fully reverse the tensor's dimensions.

Next, pass the value to the argument `dims`.

In [63]:
print("original shape:", tensor_5d.shape)
print("permuted shape:", torch.permute(tensor_5d, reversed_indices).shape)

original shape: torch.Size([1, 2, 3, 4, 5])
permuted shape: torch.Size([5, 4, 3, 2, 1])


<details>
  <summary>tensor.permute</summary>
  <p>

You can also use `tensor.permute()`.

```python
tensor_5d.permute(*reversed_indices)
```

  </p>
</details>

Note that the `torch.permute` description above in this section is not from [learnpytorch.io](https://learnpytorch.io), but rather the PyTorch documentation.

### Reshaping Tensors

In this section, we'll talk about reshaping tensors with `tensor.reshape()` and `tensor.view()`.


#### tensor.reshape

As the name suggests, `tensor.reshape(input, shape)` just returns a reshaped version of the input with a given shape.

This is slightly different from `tensor.permute()`, which is used for rearranging the dimensions.

First, let's create a grid of ones.

In [64]:
# Create a bunch of ones
ones = torch.ones(4, 3)
ones

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

In [65]:
# Reshape to a vector
ones.reshape(12)

tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [66]:
# Reshape to another matrix
ones.reshape(2, 6)

tensor([[1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.]])

#### tensor.view

`tensor.view()` is also a way of reshaping tensors, but it's still slightly different from `tensor.reshape`, and we'll talk about this later.

In [67]:
# Create a bunch of zeros
zeros = torch.zeros(4, 3)
zeros

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

In [68]:
zeros.view(12)

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [69]:
zeros.view(2, 6)

tensor([[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]])

This is quite similar to `tensor.reshape()`, but there's still a slight difference.

#### Comparison

On the documentation of `torch.reshape()` and a [StackOverflow answer](https://stackoverflow.com/a/49644300):

> <sub><b><a href="https://pytorch.org/docs/master/generated/torch.reshape.html#torch-reshape" target="_blank">TORCH.RESHAPE</a></b></sub>
>
> `torch.reshape(input, shape)` → [Tensor](https://pytorch.org/docs/master/tensors.html#torch.Tensor)
>
> Returns a tensor with the same data and number of elements as input, but with the specified shape. **When possible, the returned tensor will be a view of input. Otherwise, it will be a copy.** Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.

It means that `torch.reshape()` may return **a copy** or **a view** of the original tensor. You cannot count on that to return a view or a copy. According to the developer:

> If you need a copy, use `clone()`; if you need the same storage, use `view()`. The semantics of `reshape()` are that **it may or may not share the storage and you don't know beforehand.**

In conclusion, `reshape()` may or may not create a copy you don't know beforehand. So if you'd like to create a copy, just use `copy()`.

According to [learnpytorch.io](https://learnpytorch.io):

> ```python
> # Change view (keeps same data as original but changes view)
> # See more: https://stackoverflow.com/a/54507446/7900723
> z = x.view(1, 7)
> z, z.shape
> ```
>
> Remember though, changing the view of a tensor with `torch.view()` really only creates a new view of the same tensor.
>
> So changing the view changes the original tensor too.

I personally would recommend using `tensor.view()` if you want to share the same storage and `tensor.clone().view()` if not. Just never use `tensor.reshape()` to prevent unexpected results to happen.

In [70]:
# Create a random tensor
x = torch.rand(4, 3)
x

tensor([[0.3514, 0.8087, 0.3396],
        [0.1332, 0.4118, 0.2576],
        [0.3470, 0.0240, 0.7797],
        [0.1519, 0.7513, 0.7269]])

In [71]:
# Create a view
x_view = x.view(2, 6)
x_view

tensor([[0.3514, 0.8087, 0.3396, 0.1332, 0.4118, 0.2576],
        [0.3470, 0.0240, 0.7797, 0.1519, 0.7513, 0.7269]])

In [72]:
# Create a clone (copy) and then reshape it
x_copy = x.clone().view(2, 6)
x_copy

tensor([[0.3514, 0.8087, 0.3396, 0.1332, 0.4118, 0.2576],
        [0.3470, 0.0240, 0.7797, 0.1519, 0.7513, 0.7269]])

In [73]:
# @markdown You don't need to understand what this code means.
# @markdown [PyTorch Discussion](https://discuss.pytorch.org/t/any-way-to-check-if-two-tensors-have-the-same-base/44310/2)

# Check their address
x_pointer = x.data_ptr()
xv_pointer = x_view.data_ptr()
xc_pointer = x_copy.data_ptr()

# Check if they're in the same storage as x
print("x & x_view share the same storage?", x_pointer == xv_pointer)
print("x & x_copy share the same storage?", x_pointer == xc_pointer)

x & x_view share the same storage? True
x & x_copy share the same storage? False


Apparently, they do not store in the same storage. Now let's try changing the value of `x` and see if it affects `x_view` and `x_copy`.

In [74]:
# Change the value of x affects x_view, too
x.fill_(69)
print("x:")
print(x)

print("\nx_view:")
print(x_view)
print("\nx_copy:")
print(x_copy)

x:
tensor([[69., 69., 69.],
        [69., 69., 69.],
        [69., 69., 69.],
        [69., 69., 69.]])

x_view:
tensor([[69., 69., 69., 69., 69., 69.],
        [69., 69., 69., 69., 69., 69.]])

x_copy:
tensor([[0.3514, 0.8087, 0.3396, 0.1332, 0.4118, 0.2576],
        [0.3470, 0.0240, 0.7797, 0.1519, 0.7513, 0.7269]])


`x_copy` stays in its place and did not get affected by the changes of `x` at all.

In conclusion, `x.view()` is kind of like saying, "I want to view the original tensor this way, but don't change it," while `x.clone().view()` is like saying "give me a complete copy of it and view it this way so I won't mistakenly change the original tensor."

**TLDR;** `tensor.reshape()` is not recommended.

### torch.stack

You can stack tensors on top of each other with `torch.stack(tensors, dim=0)`.

In [75]:
# Create a vector
x = torch.arange(0, 51, step=5)
x

tensor([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [76]:
torch.stack([x, x, x])

tensor([[ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
        [ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
        [ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50]])

You can also specify its dimension to insert.

In [77]:
# Create a tensor with three dimensions
Q = torch.rand(1, 2, 3)
Q

tensor([[[0.8572, 0.1165, 0.8596],
         [0.2636, 0.6855, 0.9696]]])

In [78]:
# Stack them along the second dimension (index `1`)
torch.stack([Q, Q], dim=1)

tensor([[[[0.8572, 0.1165, 0.8596],
          [0.2636, 0.6855, 0.9696]],

         [[0.8572, 0.1165, 0.8596],
          [0.2636, 0.6855, 0.9696]]]])

### torch.squeeze

Squeezes input to remove all the dimenions with value 1.

> Returns a tensor with all specified dimensions of input of size 1 removed.
>
> (Documentation)

In [79]:
# Create a random tensor with shape (1, 2, 1, 2)
x = torch.rand(1, 2, 1, 2)
x.shape

torch.Size([1, 2, 1, 2])

In [80]:
x_sq = x.squeeze()
x_sq.shape

torch.Size([2, 2])

In [81]:
# Remove dimension #3 (index 2)
# It will be squeezed because it only has one element (size is 1).
x.squeeze(2).shape

torch.Size([1, 2, 2])

### torch.unsqueeze

This is the oppisite of `torch.squeeze()`. This function inserts a dimension value of `1` at a specific (dimension) index.

In [82]:
# Create a random 2x3 matrix
a = torch.rand(2, 3)
a

tensor([[0.7400, 0.0036, 0.8104],
        [0.8741, 0.9729, 0.3821]])

In [83]:
# At the first dimension (index 0), insert a dimension value of 1.
torch.unsqueeze(a, dim=0).shape

torch.Size([1, 2, 3])

In [84]:
# At the second dimension (index 1), insert a dimension value of 1.
torch.unsqueeze(a, dim=1).shape

torch.Size([2, 1, 3])

This is kind of like doing `list.insert(pos)` in Python.

```python
shape = [2, 3]
shape.insert(1, 1) # at pos 1, insert a 1
print(shape)
```

You can think of `torch.squeeze()` as making your tensor a bit tighter, and `torch.unsqueeze()` as loosening your tensor.

## Indexing

Given a list like this:

```python
In [1]: list_ = ["chocolate", "pytorch", "apple", "banana"]
```

If I want to get the text `"pytorch"` from the list (`list_`), I can use list indexing, which involves a little bit of "square bracket magic."

```python
In [2]: list_[1] # second item ("pytorch")
Out[2]: 'pytorch'
```

The same approach applies to PyTorch tensors.

In [85]:
# Create a vector
vector = torch.arange(1, 11, dtype=torch.float)
vector

tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [86]:
# Get the 7.0
vector[6]

tensor(7.)

However, if we have a matrix (grid) like 3x5, how do you get the exact item you're looking for, say, the first column in the second row?

You can think of matricies (grids) as lists that contain sub-lists:

```python
matrix = [
  [1, 9, 8, 9],
  [0, 6, 0, 4]
]
```

To get the number "8," which is located at the second column in the first row:

```python
eight = matrix[0][2]
# => 8
```

Again, the same approach applies to PyTorch tensors.

In [87]:
# Create a matrix
MATRIX = torch.arange(0, 100).view(4, 25)
MATRIX

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
         18, 19, 20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
         43, 44, 45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
         68, 69, 70, 71, 72, 73, 74],
        [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
         93, 94, 95, 96, 97, 98, 99]])

In [88]:
# Get the number 25 (a scalar)
MATRIX[1][0]

tensor(25)

You can use `[1, 0]` in PyTorch (but not vanilla Python).

In [89]:
MATRIX[1, 0]

tensor(25)

You can also use `:` (a colon, or [slice](https://docs.python.org/3/glossary.html#term-slice) in Python) followed by a comma and the desired index to specify "all values in this column"

In [90]:
# Get the items in the first column
MATRIX[:, 0]

tensor([ 0, 25, 50, 75])

Let's visualize `MATRIX[:, 0]`:

<img alt="matrix-column-0 visualization" src="https://github.com/AWeirdScratcher/models/assets/90096971/b58ddd92-936f-4f2a-833e-f5242d22d961" width="740" />

And that's why it returns `tensor([0, 25, 50, 75])`.

With the same approach, what do you think `MATRIX[:, 2]` would return?

In [91]:
MATRIX[:, 2]

tensor([ 2, 27, 52, 77])

Let's also try a three-dimensional tensor.

In [92]:
TENSOR = torch.arange(1, 11).view(1, 2, 5)
TENSOR

tensor([[[ 1,  2,  3,  4,  5],
         [ 6,  7,  8,  9, 10]]])

In [93]:
# Get the first number
TENSOR[0, 0, 0]

tensor(1)