# PyTorch Fundamentals

## Introduction to Tensors

### The Tensor object

At the core of PyTorch lies the `Tensor` object, which serves as the fundamental data structure for all computations. Tensors are multidimensional arrays, conceptually similar to NumPy arrays, but with added capabilities tailored for deep learning and high-performance computing.

PyTorch tensors support a wide variety of operations, including arithmetic, indexing, reshaping, and broadcasting. More importantly, they can be transferred seamlessly between the CPU and GPU, allowing for accelerated computation with minimal code changes.

In practice, tensors represent everything from scalar values to high-dimensional data such as images, sequences, and model parameters. Understanding tensors and how to manipulate them efficiently is essential for working with PyTorch and developing neural network models.

PyTorch tensors are created using [`torch.tensor`](https://pytorch.org/docs/stable/tensors.html). For example, we may create the simplest tensor – a scalar – in the following way:

In [None]:
import torch
# scalar
scalar = torch.tensor(2)
scalar

tensor(2)

A scalar is a $0$-dimensional tensor. As a matter of fact, we can check its dimension with `ndim`

In [None]:
scalar.ndim

0

To access the value of a tensor, we must use the `item()` method:

In [None]:
scalar.item()

2

The next structure is just a vector, i.e. a tensor of dimension $2$. We can create it using the torch.tensor() constructor from a simple python list:

In [None]:
# vector
vector = torch.tensor([2, 7])
vector

tensor([2, 7])

Note: be careful using `tensor()` instead of `Tensor()`. The latter is a lower-level class constructor. When given shape arguments (e.g., `torch.Tensor(2, 3)`), it creates an *uninitialized* tensor. This can lead to unintended behavior. So, always use `torch.tensor()` when starting from data.

Now, tensors' dimensions and their shapes are very important in PyTorch. When manipulating vectors, we must pay close attention to their shapes, or we may run into errors or miscalculations.


In [None]:
vector.ndim

1

This is different from vector.shape:

In [None]:
vector.shape

torch.Size([2])

We can step things up and create a matrix:

In [None]:
matrix = torch.tensor([[1, 2], [7, 8]])
matrix

tensor([[1, 2],
        [7, 8]])

In [None]:
print(f"matrix dims: {matrix.ndim}")
print(f"matrix shape: {matrix.shape}")

matrix dims: 2
matrix shape: torch.Size([2, 2])


This tells us that our matrix is 2-dimensional, and in fact it's a 2×2 square matrix. How about a rectangular matrix? Very simple:
<!--  -->

In [None]:
rect_matrix = torch.tensor([[1, 2], [3, 4], [4, 5]])
print(f"matrix dims: {rect_matrix.ndim}")
print(f"matrix shape: {rect_matrix.shape}")

matrix dims: 2
matrix shape: torch.Size([3, 2])


Since we have $2$ elements along each axes, and it's a $3\times2$ matrix.  
Let's see how it works with tensors:

In [None]:
# TENSOR
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5],
                        [6, 7, 8]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5],
         [6, 7, 8]]])

In [None]:
print(f"Tensor dim: {TENSOR.ndim}")

Tensor dim: 3


In [None]:
print(f"Tensor shape: {TENSOR.shape}")

Tensor shape: torch.Size([1, 4, 3])


Why this shape? We can think of tensors as multidimensional matrices. In this case, we have 1 matrix containing 4 vectors of dimension 3.

Understanding shapes takes some practice, but once you get used to it, it becomes second nature.

This represents 2 images, each of size 3×3 pixels, with 3 channels (RGB).


### Random Tensors

Random tensors are very important in PyTorch's workflows. The reason is the way Neural Networks work. They start from randomly initialized weights, and they adapt these values through training (by minimizing a Loss function). So the first step is always to intialize weights randomly.

In order to create random tensors we can use Torch's [torch.rand()](https://docs.pytorch.org/docs/main/generated/torch.rand.html).

In [None]:
random_tensor = torch.rand(3,4)
random_tensor
print(f"random tensor dimension: {random_tensor.ndim}") # dims will be 2
print(f"random tensor shape: {random_tensor.shape}") # shape will be torch.Size([3,4])

random tensor dimension: 2
random tensor shape: torch.Size([3, 4])


A common example of tensor encoding is when we want to encode an image. Usually to encode an image we use a tensor of shapes `[colour_channels, height, width]`. For an RGB image of size `224 x 224`we will use something like this:

![Example of encoding an RGB image](https://github.com/MatteoFalcioni/PyTorch_basics/blob/main/imgs/00-tensor-shape-example-of-image.png?raw=1)

> **Note on REPRODUCIBILITY:** `torch_manual_seed()` allows you to set the seed for random number generations, so that your code becomes reproducible (random numbers will still be random, but they will be the same random number every time - unless you change the seed)

### 0's and 1's Tensors
These are useful for creating masks.


In [None]:
# create a tensor of all zeros
zeros = torch.zeros(3,4)
zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [None]:
# create a tensor of all zeros
ones = torch.ones(3,4)
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

Note: when you let Torch automatically create a tensor, its default type is `torch.float32`. To get the type we can use the `.dtype` argument. We will see more about types in the future.

In [None]:
ones.dtype

torch.float32

### Creating range of tensors & tensors-like

In [None]:
# use torch.range()
one_to_Ten = torch.arange(1,11)
one_to_Ten

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Basically its parameters are start, end (ends at end-1) and steps. by default the step is $1$ but we can modify it:

In [None]:
zero_to_hundred = torch.arange(start=0, end=100, step=2)
zero_to_hundred

tensor([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
        36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,
        72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])

Another useful method is the `_like()` method. For example, `zeros_like()`  allows us to create a tensor with the shape of another already existing tensor, all filled with zeros. There are many options, you can fill it with chosen values (`torch.full_like()`) , with random numbers (`torch.rand_like()`), and so on.  

In [None]:
ten_zeros = torch.zeros_like(one_to_Ten)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

### Tensor datatypes

The standard type is float32, but we can change it of course by explicitly setting it. See [torch.dtype](https://docs.pytorch.org/docs/stable/tensor_attributes.html#torch-dtype).

In [None]:
# Float 32 tensor
float_32_tensor = torch.rand(2 ,6)
float_32_tensor

tensor([[0.3513, 0.6349, 0.8420, 0.9925, 0.9101, 0.7914],
        [0.1295, 0.1268, 0.9806, 0.1038, 0.3814, 0.0941]])

In [None]:
float_32_tensor.dtype

torch.float32

In [None]:
# Float 16 tensor
float_16_tensor = torch.rand((2 ,6), dtype=torch.float16)
float_16_tensor.dtype

torch.float16

Or, if you are creating a tensor from scratch, you can specify it in the same way

In [None]:
data = [3, 6, 9, 10]
tensor = torch.tensor(data, dtype=torch.float16)
tensor

tensor([ 3.,  6.,  9., 10.], dtype=torch.float16)

The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.

This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on, the more compute you have to use.

So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate).



`.dtype` is one of the three main arguments in the `tensor()` class. The other two are just as important - if not more - and we will see more about them later on. They are the `device` argument, which specify on which device (cpu or gpu) our machine should create our tensor, and `requires_grad`, which specifies if we should keep track of operations performed on our tensor in order to perform backpropagation.

In [None]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # What dataype is the tensor, defaults to torch.float32 (even if it's set to None)
                               device='cpu', # What device is your tensor on
                               requires_grad=False) # whether or not to track gradients for backpropagation (default False if not wrapped in nn.Parameter)

float_32_tensor.dtype, float_32_tensor.device, float_32_tensor.requires_grad

(torch.float32, device(type='cpu'), False)

Operations between types in Torch are robust, meaning that even if two tensors have different types, sometimes this will not raise an error, and revert everything to default (float 32). Nonetheless, you should always be careful to *perform operations between tensors of the same type*, to avoid miscalculations and problems.  

### Getting Information from Tensors

Once you've created tensors (or someone else or a PyTorch module has created them for you), you might want to get some information from them.

We've seen these before but three of the most common attributes you'll want to find out about tensors are:
- `shape` - what shape is the tensor? (some operations require specific shape rules)
- `dtype` - what datatype are the elements within the tensor stored in?
- `device` - what device is the tensor stored on? (usually GPU or CPU)

Let's create a random tensor and find out details about it.

In [None]:
# Create a tensor
some_tensor = torch.rand(3, 4)

# Find out details about it
print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}") # will default to CPU

tensor([[0.3923, 0.4521, 0.5415, 0.3870],
        [0.9300, 0.6538, 0.9350, 0.3785],
        [0.0193, 0.2008, 0.8574, 0.0714]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Basic Operations and Broadcasting

In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

A model learns by investigating those tensors and performing a series of operations (could be $1,000,000$'s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:

- Addition
- Substraction
- Multiplication (element-wise)
- Division
- Matrix multiplication

And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

Let's start with a few of the fundamental operations, addition (`+`), subtraction (`-`), mutliplication (`*`).

In [None]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [None]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

Notice how the tensor values above didn't end up being `tensor([110, 120, 130])`, this is because the values inside the tensor don't change unless they're reassigned.

In [None]:
# Tensors don't change unless reassigned
tensor

tensor([1, 2, 3])

Let's subtract a number and this time we'll reassign the tensor variable.

In [None]:
# Subtract and reassign
tensor = tensor - 10
tensor

tensor([-19, -18, -17])

In [None]:
# Add and reassign
tensor = tensor + 10
tensor

tensor([-9, -8, -7])

In [None]:
# Add and reassign
tensor = tensor + 10
tensor

tensor([1, 2, 3])

In [None]:
# Matrix multiplication

### *Note on Broadcasting*

Broadcasting allows PyTorch to perform operations between tensors of different shapes by automatically expanding the smaller tensor to match the shape of the larger one.

For example what we saw above :
```python
tensor = torch.tensor([1, 2, 3])
tensor + 10
```

PyTorch is *broadcasting* the scalar `10` across all elements of the tensor. Internally, it's as if it expanded `10` to match the shape of tensor:

In [None]:
# Equivalent to:
torch.tensor([1, 2, 3]) + torch.tensor([10, 10, 10])

tensor([11, 12, 13])

#### *Broadcasting a Column Vector to a Matrix*
Let's see another example:


In [None]:
# A 3x3 matrix
matrix = torch.tensor([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# A column vector (3 rows, 1 column)
col_vector = torch.tensor([
    [10],
    [20],
    [30]
])

# Broadcasting: adds col_vector to each column of the matrix
result = matrix + col_vector
print(result)


tensor([[11, 12, 13],
        [24, 25, 26],
        [37, 38, 39]])


In this example, we add a 3×1 column vector to a 3×3 matrix. PyTorch automatically broadcasts the column vector along the second dimension (columns), so it gets added to each row:

- Matrix shape: `(3, 3)`
- Column vector shape: `(3, 1)`

PyTorch expands the column vector to match the shape of the matrix:

```
[[10, 10, 10],
[20, 20, 20],
[30, 30, 30]]
```


And then adds it element-wise to the matrix.

### Matrix Multiplication (*dot product*)

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication, also referred to as the *dot product*.

PyTorch implements matrix multiplication functionality in the [torch.matmul()](https://docs.pytorch.org/docs/stable/generated/torch.matmul.html#torch.matmul) method.

The main two rules for matrix multiplication to remember are:

The inner dimensions must match:
```
(3, 2) @ (3, 2) # won't work
(2, 3) @ (3, 2) # will work
(3, 2) @ (2, 3) # will work
```
The resulting matrix has the shape of the outer dimensions:
```
(2, 3) @ (3, 2) -> (2, 2)
(3, 2) @ (2, 3) -> (3, 3)
```

> **Note:** `@` in Python is the symbol for matrix multiplication, but it is recommended to use `.matmul()`, which is way faster.

> **Note:** Sometimes you'll find people using `.mm()` like it is an alias for `.matmul()`. It's not. `.mm()` works *only for 2D matrices*, while `.matmul()` works with tensors of any dimensiion. Basically, `.mm()` does not broadcast, while `.matmul()` does.



Let's create a tensor and perform element-wise multiplication and matrix multiplication on it.

In [None]:
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

In [None]:
# Element-wise matrix multiplication
tensor * tensor

tensor([1, 4, 9])

In [None]:
# Matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

In [None]:
# Can also use the "@" symbol for matrix multiplication, though not recommended (slower)
tensor @ tensor

tensor(14)

In [None]:
torch.mm(tensor, tensor)    # this won't work

RuntimeError: self must be a matrix

In [None]:
# to use .mm() with vectors we should manually change shapes (what .matmul() does automatically)
tensor1 = tensor.reshape(1,3)
tensor2 = tensor.reshape(3,1)
result = torch.mm(tensor1, tensor2) # shape (1,1)

print(f"result: {result}")  #tensor([[14]])
print(f"result value: {result.item()}") # 14

result: tensor([[14]])
result value: 14


To summarize, for our `tensor` variable with values `[1, 2, 3]`:

| Operation                | Calculation                        | Code                     |
|--------------------------|------------------------------------|--------------------------|
| Element-wise multiplication | `[1*1, 2*2, 3*3] = [1, 4, 9]`      | `tensor * tensor`        |
| Matrix multiplication     | `[1*1 + 2*2 + 3*3] = [14]`         | `tensor.matmul(tensor, tensor)`  |


### Transposition

Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

Of course if we are creating tensors "on the fly" we could just create them with matching shapes. But what if these tensors already exist and we need to multiply them? For example:

In [None]:
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

We want the inner dimensions to match. In this case we can simply transpose the second matrix. We can do this with
- `torch.transpose(input, dim0, dim1)` - where `input` is the desired tensor to transpose and `dim0` and `dim1` are the dimensions to be swapped.
- `tensor.T` - where tensor is the desired tensor to transpose.

Let's try the latter.

In [None]:
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [None]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [None]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output)
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


Remember, matrix multiplication is the building block of neural networks (that's why the mathematics of deep learning isn't usually that hard). So matrix multiplication is all you need.

![Example of encoding an RGB image](https://github.com/MatteoFalcioni/PyTorch_basics/blob/main/imgs/00_matrix_multiplication_is_all_you_need.jpeg?raw=1)

### Finding the min, max, mean, sum, etc (aggregation)

Now we've seen a few ways to manipulate tensors, let's run through a few ways to aggregate them (go from more values to less values).

First we'll create a tensor and then find the max, min, mean and sum of it.

In [None]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Now let's perform some aggregation.

In [None]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error because of the dataype!
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


You may find some methods such as `torch.mean()` require tensors to be in `torch.float32` (the most common) or another specific datatype, otherwise the operation will fail.

You can also do the same as above with `torch` methods.

In [None]:
torch.max(x), torch.min(x), torch.mean(x.type(torch.float32)), torch.sum(x)

(tensor(90), tensor(0), tensor(45.), tensor(450))

### Changing dataype

You can change the datatypes of tensors in different ways. This is the most flexible and recommended way:

In [None]:
x = torch.tensor([1, 2, 3])        # int64 by default
x = x.to(dtype=torch.float32)     # converts to float32

This is equivalent, *at creation*, to

In [None]:
x = torch.tensor([1, 2, 3], dtype=torch.float32)

Also, you can use these quick and readable shortcuts:

In [None]:
x = torch.tensor([1, 2, 3])

x.float()   # to torch.float32
x.double()  # to torch.float64
x.int()     # to torch.int32
x.long()    # to torch.int64 (default for integers)
x.bool()    # to boolean

tensor([True, True, True])

### Reshaping, stacking, (un)squeezing and permuting

In PyTorch, tensor shape manipulation is an essential skill. Whether you're preparing data for a neural network or aligning tensor shapes for operations, these four tools are foundational:

- `reshape()`: Change the shape of a tensor without changing its data.
- `stack()`: Combine multiple tensors along a new dimension.
- `squeeze()`: Remove dimensions of size 1.
- `unsqueeze()`: Add a dimension of size 1 at a specified position.
- `permute()`:

The `reshape()` method changes the tensor's shape as long as the total number of elements stays the same.

For example, a tensor with shape `(9,)` can be reshaped to `(3, 3)`, `(1, 9)`, `(9, 1)`, etc.

In [None]:
x = torch.arange(1, 10)  # tensor([1, 2, ..., 9])
print("Original shape:", x.shape)

reshaped = x.reshape(3, 3)
print("Reshaped (3x3):")
print(reshaped)

Original shape: torch.Size([9])
Reshaped (3x3):
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])


The `view()` function is another way to change the shape of a tensor, like `reshape()`. It returns a tensor with the same data but a different shape — **as long as the tensor is stored in a contiguous chunk of memory**.

If it's not contiguous (e.g. after a `transpose()`), you need to call `.contiguous()` before using `view()`.

```python
x = torch.arange(1, 9)
x = x.view(2, 4)

This is equivalent to:

In [None]:
x = torch.arange(1, 9).reshape(2, 4)

But `view()` may raise an error if the original tensor is not contiguous:

In [None]:
x = torch.randn(2, 3)
x_t = x.T  # transpose
x_t.view(-1)  # ❌ may raise error
x_t.contiguous().view(-1)  # ✅ safe


RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Also notice that, since view doesn't modify data stored in memory, if we define a new tensor as a view() of a first one, and then modify it, the starting tensor will change as well. An example can be:

In [None]:
x = torch.rand(1,9)
z = x.view(3,3)
z[2] = 100
print(z)
print(x)

tensor([[4.4159e-02, 2.0593e-01, 7.1661e-02],
        [2.2882e-01, 2.1666e-01, 2.7942e-02],
        [1.0000e+02, 1.0000e+02, 1.0000e+02]])
tensor([[4.4159e-02, 2.0593e-01, 7.1661e-02, 2.2882e-01, 2.1666e-01, 2.7942e-02,
         1.0000e+02, 1.0000e+02, 1.0000e+02]])


`reshape()` always allocates new memory, so it handles these issues automatically. You should usually prefer it to `view()`, even though the latter would be marginally faster in low level operations.

The main advantage of `view()` is memory related: it doesn't allocate new space in memory, instead it gives you a different view of the already existing tensor data. It is very useful when working with tensors occupying a lot of space.

The `stack()` function joins a sequence of tensors along a new dimension. All tensors must have the same shape.

- `dim=0`: stacks vertically (new first dimension)
- `dim=1`: stacks horizontally (new second dimension)

In [None]:
# Create two tensors of the same shape
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
print(f"Original shape: {a.shape}")

# Stack along a new dimension (dim=0)
stacked = torch.stack((a, b), dim=0)
print("Stacked along dim=0  ('orizzontally'):")
print(stacked)
print("Shape:", stacked.shape)

# Stack along another dimension (dim=1)
stacked_dim1 = torch.stack((a, b), dim=1)
print("\nStacked along dim=1 ('vertically'):")
print(stacked_dim1)
print("Shape:", stacked_dim1.shape)


Original shape: torch.Size([3])
Stacked along dim=0  ('orizzontally'):
tensor([[1, 2, 3],
        [4, 5, 6]])
Shape: torch.Size([2, 3])

Stacked along dim=1 ('vertically'):
tensor([[1, 4],
        [2, 5],
        [3, 6]])
Shape: torch.Size([3, 2])


The `squeeze()` method removes all dimensions with size 1. It's useful when a model or operation adds unnecessary singleton dimensions.

You can also specify a dimension to squeeze with `squeeze(dim=...)`.

In [None]:
x = torch.zeros(1, 3, 1, 5)
print("Original shape:", x.shape)

squeezed = x.squeeze()
print("Squeezed shape:", squeezed.shape)


Original shape: torch.Size([1, 3, 1, 5])
Squeezed shape: torch.Size([3, 5])


The `unsqueeze()` method adds a dimension of size 1 at a given position. This is useful for aligning shapes or preparing inputs for batch operations.

Common uses:
- Converting shape `(N,)` → `(1, N)` or `(N, 1)`
- Making tensors compatible with broadcasting

In [None]:
y = torch.tensor([1.0, 2.0, 3.0])
print("Original shape:", y.shape)

# Add dimension at position 0
y_unsq0 = y.unsqueeze(0)
print("Shape after unsqueeze(0):", y_unsq0.shape)

# Add dimension at position 1
y_unsq1 = y.unsqueeze(1)
print("Shape after unsqueeze(1):", y_unsq1.shape)

Original shape: torch.Size([3])
Shape after unsqueeze(0): torch.Size([1, 3])
Shape after unsqueeze(1): torch.Size([3, 1])


The `permute()` function allows you to **rearrange the dimensions** of a tensor in any order you want. It does not change the data — only how the data is viewed in memory.

This is especially useful when working with image data, where different libraries may expect different dimension orders (e.g., channels-first vs channels-last).

---

**Example:**

Suppose you have an image tensor with shape `(height, width, channels)` — common in libraries like NumPy or PIL — and you want to convert it to PyTorch’s expected format `(channels, height, width)`.

You can use:

In [None]:
image = torch.rand(64, 64, 3)  # HWC format
image_permuted = image.permute(2, 0, 1)  # CHW format

So it works like this:
```
Original shape: (D0, D1, D2)
tensor.permute(1, 2, 0) → shape becomes (D1, D2, D0)
```


Careful: `permute()` returns a **view** of our tensor - it doesn't copy data. Also you cannot use `.view()` after `permute()` unless you call `.contiguous()`, because permuting may create a non-contigous tensor.

### Selecting Data (Indexing)

Sometimes you'll want to select specific data from tensors (for example, only the first column or second row).

To do so, you can use indexing.

If you've ever done indexing on Python lists or NumPy arrays, indexing in PyTorch with tensors is very similar.

In [None]:
# Create a tensor
x = torch.arange(1, 28).reshape(3, 3, 3)    # 3 matrices of size 3x3
x, x.shape

(tensor([[[ 1,  2,  3],
          [ 4,  5,  6],
          [ 7,  8,  9]],
 
         [[10, 11, 12],
          [13, 14, 15],
          [16, 17, 18]],
 
         [[19, 20, 21],
          [22, 23, 24],
          [25, 26, 27]]]),
 torch.Size([3, 3, 3]))


Indexing values goes outer dimension -> inner dimension (check out the square brackets).

In [None]:
# Let's index bracket by bracket
print(f"First square bracket:\n{x[0]}") # first matrix
print(f"Second square bracket: {x[0][0]}") # first line (vector) of first matrix
print(f"Third square bracket: {x[0][0][0]}") # first element of first matrix

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


Indexing in PyTorch is consistent with how tensors are indexed mathematically. For example, a tensor with three dimensions can be represented as a quantity with indices $T_{ijk}$.  

However, remember that **Python uses 0-based indexing**, so:
- $T_{0jk}$ refers to the **first matrix**,
- $T_{00k}$ refers to the **first row (vector) of the first matrix**,
- and so on.

This correspondence makes tensor indexing intuitive once you're familiar with the dimensional structure.

You can also use `:` to specify "all values in this dimension" and then use a comma `(,)` to add another dimension.

In [None]:
x[:, 0]  # all of dimension 0 (first matrix). In other words: from all matrices, select the 0th element (first matrix)

tensor([[ 1,  2,  3],
        [10, 11, 12],
        [19, 20, 21]])

In [None]:
x[:, :, 1]  # Get all values indexed 1 in the third dimension. The third dimension selects
            # lines (horizontal vectors) out of our matrices. So we are selecting the middle value from each row

tensor([[ 2,  5,  8],
        [11, 14, 17],
        [20, 23, 26]])

In oher words, from all matrices (first `:`) and all vectors (second `:`) select element indexed `1` (the middle one in this case because of dimension $3$).

Another example:

In [None]:
x[:, 1, 1] # from all matrices, select the index 1 vector and from each of these select the index 1 element

tensor([ 5, 14, 23])

In [None]:
x[0, 0, :] # from matrix 0, select 0th vector (orizontal), and take all elements

tensor([1, 2, 3])

### PyTorch and NumPy

PyTorch has functionality to interact with NumPy nicely.

The two main methods you'll want to use for NumPy to PyTorch (and back again) are:
- `torch.from_numpy(ndarray)` : NumPy array -> PyTorch tensor.
- `torch.Tensor.numpy()`: PyTorch tensor -> NumPy array.

Let's try them out.

In [None]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

> **Note:** By default, NumPy arrays are created with the datatype `float64` if not explicitly set to other types. If you convert it to a PyTorch tensor, it'll keep the same datatype (as above). However, many PyTorch calculations default to using `float32`. So if you want to convert your NumPy array (`float64`) -> PyTorch tensor (`float64`) -> PyTorch tensor (`float32`), you can use `tensor = torch.from_numpy(array).to_(dtype=torch.float32)`.

Because we reassigned `tensor` above, if you change the tensor, the array stays the same.

In [None]:
# Change the array, keep the tensor
array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

And if you want to go from PyTorch tensor to NumPy array, you can call `tensor.numpy()`

In [None]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

As above, the two structures are independent. Changing one won't affect the other.

## Running Torch on GPU

Once you've got a GPU ready to access, the next step is getting PyTorch to use for storing data (tensors) and computing on data (performing operations on tensors).

To do so, you can use the `torch.cuda` package.

Rather than talk about it, let's try it out.

You can test if PyTorch has access to a GPU using `torch.cuda.is_available()`.

In [1]:
# Check for GPU
import torch
torch.cuda.is_available()

True

If the above outputs `True`, PyTorch can see and use the GPU, if it outputs `False`, it can't see the GPU and in that case, you'll have to go back through the installation steps.

Now, let's say you wanted to setup your code so it ran on CPU or the GPU if it was available.

That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using.

Let's create a `device` variable to store what kind of device is available.

In [4]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

If the above output `"cuda"` it means we can set all of our PyTorch code to use the available CUDA device (a GPU) and if it output `"cpu"`, our PyTorch code will stick with the CPU.

> **Note:** In PyTorch, it's best practice to write device agnostic code. This means code that'll run on CPU (always available) or GPU (if available).

If you want to do faster computing you can use a GPU but if you want to do much faster computing, you can use multiple GPUs.

You can count the number of GPUs PyTorch has access to using `torch.cuda.device_count()`.

In [5]:
# Count number of devices
torch.cuda.device_count()

1


Knowing the number of GPUs PyTorch has access to is helpful incase you wanted to run a specific process on one GPU and another process on another (PyTorch also has features to let you run a process across all GPUs).

### Putting tensors (and models) on the GPU

You can put tensors (and models, we'll see this later) on a specific device by calling `to(device)` on them. Where device is the target device you'd like the tensor (or model) to go to.

Why do this?

GPUs offer far faster numerical computing than CPUs do and if a GPU isn't available, because of our device agnostic code (see above), it'll run on the CPU.

> **Note:** Putting a tensor on GPU using `to(device)` (e.g. `some_tensor.to(device)`) returns a copy of that tensor, e.g. the same tensor will be on CPU and GPU. To overwrite tensors, reassign them: ` some_tensor = some_tensor.to(device) `

Let's try creating a tensor and putting it on the GPU (if it's available).

In [6]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3]) cpu


tensor([1, 2, 3], device='cuda:0')

If you have a GPU available, the above code will output something like:
```
tensor([1, 2, 3]) cpu
tensor([1, 2, 3], device='cuda:0')
```

Notice the second tensor has `device='cuda:0'`, this means it's stored on the 0th GPU available (GPUs are 0 indexed, if two GPUs were available, they'd be `'cuda:0'` and `'cuda:1'` respectively, up to `'cuda:n'`).

### Moving tensors back to CPU

What if we wanted to move the tensor back to CPU?

For example, you'll want to do this if you want to interact with your tensors with NumPy (NumPy does not leverage the GPU).

Let's try using the `torch.Tensor.numpy()` method on our tensor_on_gpu

In [7]:
# If tensor is on GPU, can't transform it to NumPy (this will error)
tensor_on_gpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Instead, to get a tensor back to CPU and usable with NumPy we can use `Tensor.cpu()`.

This copies the tensor to CPU memory so it's usable with CPUs.

In [8]:
# Instead, copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3])

The above returns a copy of the GPU tensor in CPU memory so the original tensor is still on GPU.

In [9]:
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')