# <span style="color:#0b486b">  FIT3181/5215: Deep Learning (2025)</span>
***
*CE/Lecturer (Clayton):*  **Dr Trung Le** | trunglm@monash.edu <br/>
*Lecturer (Clayton):* **A/Prof Zongyuan Ge** | zongyuan.ge@monash.edu <br/>
*Lecturer (Malaysia):*  **Dr Arghya Pal** | arghya.pal@monash.edu <br/>
<br/>
*Head Tutor 3181:*  **Ms Ruda Nie H** |  \[RudaNie.H@monash.edu \] <br/>
*Head Tutor 5215:*  **Ms Leila Mahmoodi** |  \[leila.mahmoodi@monash.edu \]

<br/> <br/>
Faculty of Information Technology, Monash University, Australia
***

# Tutorial 1a: Introduction to PyTorch


`Acknowledgement:` This tutorial is developed based on [this Google Colab tutorial](https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb#scrollTo=XFVEgrKhTGfD).

## Importing PyTorch

Let's start by importing PyTorch and checking its version.

In [None]:
import torch
torch.__version__

'2.8.0+cu126'

## Introduction to Tensors

First, we create a `scalar` with the value `7`.

In [None]:
scalar = torch.tensor(7)
scalar

tensor(7)

In [None]:
print(scalar.item())

7


We now create an `1D` vector

In [None]:
vector = torch.tensor([1.0,2,3,4])
vector

tensor([1., 2., 3., 4.])

In [None]:
vector.dtype

torch.float32

We create an `2D` tensor or a matrix.

In [None]:
matrix = torch.tensor([[1,2,3],[4,5,6]])
matrix

tensor([[1, 2, 3],
        [4, 5, 6]])

In [None]:
print(f"dim={matrix.ndim}, shape={matrix.shape}")

dim=2, shape=torch.Size([2, 3])


In [None]:
print(matrix.numpy())

[[1 2 3]
 [4 5 6]]


In [None]:
print(vector.dtype)

torch.float32


## Create Tensors in PyTorch

### Initialize Random Tensors

In [None]:
randImage = torch.rand(size=(3,4,2))
randImage

tensor([[[0.6273, 0.2020],
         [0.6826, 0.6025],
         [0.4144, 0.1575],
         [0.6356, 0.5551]],

        [[0.1151, 0.9065],
         [0.1521, 0.3052],
         [0.4901, 0.7904],
         [0.2815, 0.1881]],

        [[0.4170, 0.2158],
         [0.5183, 0.2882],
         [0.5080, 0.3973],
         [0.3764, 0.9321]]])

In [None]:
print(f"Shape: {randImage.shape}, ndim = {randImage.ndim}, height= {randImage.shape[0]}, width = {randImage.shape[1]}, depth = {randImage.shape[2]}")

Shape: torch.Size([3, 4, 2]), ndim = 3, height= 3, width = 4, depth = 2


### Create One, Zero Tensors

In [None]:
ones = torch.ones(size= (2,3))
ones

tensor([[1., 1., 1.],
        [1., 1., 1.]])

In [None]:
zeros = torch.zeros(size = (2,3))
zeros

tensor([[0., 0., 0.],
        [0., 0., 0.]])

In [None]:
ones_like = torch.ones_like(zeros)
ones_like

tensor([[1., 1., 1.],
        [1., 1., 1.]])

In [None]:
zeros_like = torch.zeros_like(ones)
zeros_like

tensor([[0., 0., 0.],
        [0., 0., 0.]])

### Create a Range and Reshape

In [None]:
aRange = torch.arange(start = 0.1, end = 5.0, step = 0.2)
aRange

tensor([0.1000, 0.3000, 0.5000, 0.7000, 0.9000, 1.1000, 1.3000, 1.5000, 1.7000,
        1.9000, 2.1000, 2.3000, 2.5000, 2.7000, 2.9000, 3.1000, 3.3000, 3.5000,
        3.7000, 3.9000, 4.1000, 4.3000, 4.5000, 4.7000, 4.9000])

In [None]:
matrix = aRange.reshape(shape = (5,5))
matrix

tensor([[0.1000, 0.3000, 0.5000, 0.7000, 0.9000],
        [1.1000, 1.3000, 1.5000, 1.7000, 1.9000],
        [2.1000, 2.3000, 2.5000, 2.7000, 2.9000],
        [3.1000, 3.3000, 3.5000, 3.7000, 3.9000],
        [4.1000, 4.3000, 4.5000, 4.7000, 4.9000]])

### From NumPy Arrays

We first import `numpy` and then create a numpy array.

In [None]:
import numpy as np

In [None]:
npArray = np.arange(1,21, dtype = np.float32)
npArray

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
       14., 15., 16., 17., 18., 19., 20.], dtype=float32)

We reshape the above numpy array to `(5,4)`

In [None]:
npArray = npArray.reshape((5,-1))
print(npArray.shape)
npArray

(5, 4)


array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.],
       [13., 14., 15., 16.],
       [17., 18., 19., 20.]], dtype=float32)

We create a torch array or tensor from the numpy array using the method `torch.from_numpy()`.

In [None]:
torchArray = torch.from_numpy(npArray)
torchArray

tensor([[ 1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.],
        [ 9., 10., 11., 12.],
        [13., 14., 15., 16.],
        [17., 18., 19., 20.]])

We cast the float Torch tensor to the integer one.

In [None]:
torchArray = torchArray.type(torch.int16)
torchArray

tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12],
        [13, 14, 15, 16],
        [17, 18, 19, 20]], dtype=torch.int16)

In [None]:
print(torchArray.numpy())

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]


## Tensor datatypes

There are many different [tensor datatypes available in PyTorch](https://pytorch.org/docs/stable/tensors.html#data-types). Some are specific for CPU and some are better for GPU. Getting to know which is which can take some time.

Generally if you see `torch.cuda` anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA).

The most common type (and generally the default) is `torch.float32` or `torch.float`. This is referred to as "32-bit floating point". But there are also 16-bit floating point (`torch.float16` or `torch.half`) and 64-bit floating point (`torch.float64` or `torch.double`). And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.


In [None]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations performed on the tensor are recorded

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

In [None]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype

torch.float16

## Getting information from tensors

Once you've created tensors (or someone else or a PyTorch module has created them for you), you might want to get some information from them.

We've seen these before but three of the most common attributes you'll want to find out about tensors are:
* `shape` - what shape is the tensor? (some operations require specific shape rules)
* `dtype` - what datatype are the elements within the tensor stored in?
* `device` - what device is the tensor stored on? (usually GPU or CPU)

Let's create a random tensor and find out details about it.

In [None]:
# Create a tensor
some_tensor = torch.rand(3, 4)

# Find out details about it
print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}") # will default to CPU

tensor([[0.6577, 0.3406, 0.6845, 0.0493],
        [0.0033, 0.3951, 0.7220, 0.0098],
        [0.6388, 0.4054, 0.1279, 0.8850]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


## Manipulating tensors (tensor operations)

In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors. A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:
* Addition
* Substraction
* Multiplication (element-wise)
* Division
* Matrix multiplication

And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

### Basic operations

Let's start with a few of the fundamental operations, addition (`+`), subtraction (`-`), mutliplication (`*`).

In [None]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor1 = tensor + 10
tensor1

tensor([11, 12, 13])

In [None]:
# Multiply it by 10
tensor2 = tensor * 10
tensor2

tensor([10, 20, 30])

In [None]:
# Subtract and reassign
tensor3 = tensor - 10
tensor3

tensor([-9, -8, -7])

PyTorch also has a bunch of built-in functions like [`torch.mul()`](https://pytorch.org/docs/stable/generated/torch.mul.html#torch.mul) (short for multiplication) and [`torch.add()`](https://pytorch.org/docs/stable/generated/torch.add.html) to perform basic operations.

In [None]:
# Can also use torch functions
torch.multiply(tensor, 10)

tensor([10, 20, 30])

However, it's more common to use the operator symbols like `*` instead of `torch.mul()`

In [None]:
# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


### Matrix multiplication (is all you need)

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is [matrix multiplication](https://www.mathsisfun.com/algebra/matrix-multiplying.html).

PyTorch implements matrix multiplication functionality in the [`torch.matmul()`](https://pytorch.org/docs/stable/generated/torch.matmul.html) method.



In [None]:
A = torch.arange(1,7, dtype = torch.float32).reshape((3,-1))
print(f"A={A}\n shape = {A.shape}")

A=tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
 shape = torch.Size([3, 2])


In [None]:
B = torch.rand((2,4))
print(f"B={B}\n shape = {B.shape}")

B=tensor([[0.2553, 0.8564, 0.4074, 0.2563],
        [0.5789, 0.2519, 0.2289, 0.4719]])
 shape = torch.Size([2, 4])


Matrices `A` with the shape $3 \times 2$ and `B` with the shape $2 \times 4$ are compatible in dimension, so we can multiple them.

In [None]:
C1 = torch.matmul(A,B)
C1

tensor([[1.4131, 1.3602, 0.8652, 1.2001],
        [3.0814, 3.5768, 2.1378, 2.6564],
        [4.7498, 5.7934, 3.4103, 4.1128]])

In [None]:
C2 = A.matmul(B)
C2

tensor([[1.4131, 1.3602, 0.8652, 1.2001],
        [3.0814, 3.5768, 2.1378, 2.6564],
        [4.7498, 5.7934, 3.4103, 4.1128]])

In [None]:
C3 = A@B
C3

tensor([[1.4131, 1.3602, 0.8652, 1.2001],
        [3.0814, 3.5768, 2.1378, 2.6564],
        [4.7498, 5.7934, 3.4103, 4.1128]])

`C1`, `C2`, and `C3` must be the same and we assert this.

In [None]:
assert torch.equal(C1,C2)
assert torch.equal(C1,C3)

### One of the most common errors in deep learning (shape errors)

Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

In [None]:
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

To fix this, we need to transpose `tensor_B` before doing multiplication.

In [None]:
torch.matmul(tensor_A, tensor_B.T)

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

### Linear Layer

Neural networks are full of matrix multiplications and dot products.

The [`torch.nn.Linear()`](https://pytorch.org/docs/1.9.1/generated/torch.nn.Linear.html) module (we'll see this in action later on), also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input `x` and a weights matrix `W`.

$$
y = x\cdot W + b
$$


In [None]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input
                         out_features=6) # out_features = describes outer value
x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


### Finding the min, max, mean, sum, etc (aggregation)

Now we've seen a few ways to manipulate tensors, let's run through a few ways to aggregate them (go from more values to less values).

First we'll create a tensor and then find the max, min, mean and sum of it.

In [None]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [None]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
print(f"Mean: {x.mean()}") # this will error
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90


RuntimeError: mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Long

 **Note:** You may find some methods such as `torch.mean()` require tensors to be in `torch.float32` (the most common) or another specific datatype, otherwise the operation will fail.

You can also do the same as above with `torch` methods.

In [None]:
torch.max(x), torch.min(x), torch.mean(x.type(torch.float32)), torch.sum(x)

(tensor(90), tensor(0), tensor(45.), tensor(450))

You can also find the index of a tensor where the max or minimum occurs with [`torch.argmax()`](https://pytorch.org/docs/stable/generated/torch.argmax.html) and [`torch.argmin()`](https://pytorch.org/docs/stable/generated/torch.argmin.html) respectively.

This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the [softmax activation function](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html)).

In [None]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


### Reshaping, stacking, squeezing and unsqueezing

Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them.

To do so, some popular methods are:

| Method | One-line description |
| ----- | ----- |
| [`torch.reshape(input, shape)`](https://pytorch.org/docs/stable/generated/torch.reshape.html#torch.reshape) | Reshapes `input` to `shape` (if compatible), can also use `torch.Tensor.reshape()`. |
| [`Tensor.view(shape)`](https://pytorch.org/docs/stable/generated/torch.Tensor.view.html) | Returns a view of the original tensor in a different `shape` but shares the same data as the original tensor. |
| [`torch.stack(tensors, dim=0)`](https://pytorch.org/docs/1.9.1/generated/torch.stack.html) | Concatenates a sequence of `tensors` along a new dimension (`dim`), all `tensors` must be same size. |
| [`torch.squeeze(input)`](https://pytorch.org/docs/stable/generated/torch.squeeze.html) | Squeezes `input` to remove all the dimenions with value `1`. |
| [`torch.unsqueeze(input, dim)`](https://pytorch.org/docs/1.9.1/generated/torch.unsqueeze.html) | Returns `input` with a dimension value of `1` added at `dim`. |
| [`torch.permute(input, dims)`](https://pytorch.org/docs/stable/generated/torch.permute.html) | Returns a *view* of the original `input` with its dimensions permuted (rearranged) to `dims`. |


In [None]:
# Create a tensor
import torch
x = torch.arange(1., 8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

Let us add an extra dimension with reshape.

In [None]:
# Add an extra dimension
x_reshaped = x.reshape(1, 7)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [None]:
x_reshaped[0][0] =100
print(x_reshaped)
print(x)


tensor([[100.,   2.,   3.,   4.,   5.,   6.,   7.]])
tensor([100.,   2.,   3.,   4.,   5.,   6.,   7.])


We can also change the view with `torch.view()`.

In [None]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = x.view(1, 7)
z, z.shape

(tensor([[100.,   2.,   3.,   4.,   5.,   6.,   7.]]), torch.Size([1, 7]))

Remember though, changing the view of a tensor with `torch.view()` really only creates a new view of the *same* tensor.

So changing the view changes the original tensor too.

In [None]:
# Changing z changes x
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

If we wanted to stack our new tensor on top of itself five times, we could do so with `torch.stack()`.

In [None]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0) # try changing dim to dim=1 and see what happens
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.]])

How about removing all single dimensions from a tensor?

To do so you can use `torch.squeeze()`, (you can remember this as *squeezing* the tensor to only have dimensions over 1).

In [None]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimension from x_reshaped

x_squeezed = x_reshaped.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
Previous shape: torch.Size([1, 7])

New tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
New shape: torch.Size([7])


You can also rearrange the order of axes values with `torch.permute(input, dims)`, where the input gets turned into a view with new dims.

In [None]:
# Create tensor with specific shape
x_original = torch.rand(size=(224, 224, 3))

# Permute the original tensor to rearrange the axis order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


## Indexing (selecting data from tensors)

Sometimes you'll want to select specific data from tensors (for example, only the first column or second row).

To do so, you can use indexing. If you've ever done indexing on Python lists or NumPy arrays, indexing in PyTorch with tensors is very similar.

In [None]:
# Create a tensor
import torch
x = torch.arange(1, 19).reshape(2, 3, 3)
x, x.shape

(tensor([[[ 1,  2,  3],
          [ 4,  5,  6],
          [ 7,  8,  9]],
 
         [[10, 11, 12],
          [13, 14, 15],
          [16, 17, 18]]]),
 torch.Size([2, 3, 3]))

Indexing values goes outer dimension -> inner dimension (check out the square brackets).

In [None]:
# Let's index bracket by bracket
print(f"First square bracket:\n{x[0]}")
print(f"Second square bracket: {x[0][0]}")
print(f"Third square bracket: {x[0][0][0]}")

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


In [None]:
# Get all values of 0th dimension and the 0 index of 1st dimension
x[0,0,:2]

tensor([1, 2])

In [None]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]

tensor([[ 2,  5,  8],
        [11, 14, 17]])

In [None]:
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
x[:, 1, 1]

tensor([ 5, 14])

In [None]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :] # same as x[0][0]

tensor([1, 2, 3])

**<span style="color:red">Exercise 1</span>:** Write the code to
- Create a random 2D tensor X with dimension [32,10] (i.e., batch size = 32 and #features = 10).
- Create a random weight matrix W with dimension [10,2] and random bias b with dimension [1,2].
- Do the computation y = XW + b and pred_probs = softmax(y).

In [None]:
#Your answer here
X = torch.rand(size = (32,10))
W = torch.rand(size = (10,2))
b = torch.rand(size = (1,2))

y = X@W + b
print(f"y= {y.numpy()}")
pred_probs = torch.softmax(y, dim = 1)
print("After applying softmax")
print(f"pred_probs= {pred_probs.numpy()}")



y= [[2.9189672 2.7134569]
 [3.7784767 2.9404783]
 [3.9150856 3.1120372]
 [1.640368  1.7490172]
 [2.5118155 2.3188438]
 [2.7548802 2.001188 ]
 [3.0449328 2.696517 ]
 [2.9017928 3.047488 ]
 [2.8464084 2.8868349]
 [3.4406192 2.6447623]
 [2.750129  2.1783385]
 [2.0372329 2.0123913]
 [2.557089  2.4492266]
 [2.409139  2.4995751]
 [2.8411162 1.8130183]
 [3.6155918 3.0054197]
 [2.5423052 2.5432036]
 [3.0963573 3.0764377]
 [3.325526  3.1045728]
 [3.3856661 2.81844  ]
 [3.5100179 3.0037894]
 [3.1010141 3.0228896]
 [3.1659684 2.788439 ]
 [3.4248302 3.0028336]
 [2.7364833 3.0551102]
 [2.4372964 2.4860277]
 [2.378605  1.6766052]
 [3.3089933 2.6664968]
 [2.6150877 2.498794 ]
 [2.4560242 2.2900624]
 [2.038861  1.9990573]
 [3.331908  3.3023894]]
After applying softmax
pred_probs= [[0.5511975  0.44880247]
 [0.6980435  0.30195653]
 [0.6906262  0.30937383]
 [0.47286436 0.5271356 ]
 [0.5480938  0.45190626]
 [0.67998266 0.32001734]
 [0.5862334  0.41376662]
 [0.46364048 0.5363595 ]
 [0.48989478 0.51010525]


----

**The end**