# PyTorch Tutorial

This tutorial is mostly based on:

* https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
* https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

In [1]:
import numpy as np
import torch

print(torch.__version__)

2.6.0+cu124


## What is PyTorch?
PyTorch is a framework for working with deep neural networks.

## Why use PyTorch?
- Autograd functionality computes gradients automatically
- PyTorch integrates well with the Python data science stack, including NumPy, SciPy and Pandas for efficient data loading and processing pipelines
- GPU acceleration allows for fast training and inference, utilizing the parallel processing power of GPUs
- A large library of useful deep learning functions and modules are already built-in, enabling faster development and deployment of models


## Overview
- `torch.Tensor` basic tensor operation
- `torch.Tensor.grad` auto-differentiation
- `torch.cuda` devices other than CPU
- `torch.nn` neural network blocks
- `torch.utils.data` dataset and dataloader

## PyTorch Tensors

PyTorch tensors are just like NumPy arrays, and they include many of the same operations you are used to from NumPy.

Construct a tensor of size $5 \times 3$ with random values:



In [2]:
x = torch.rand(5, 3)
print(x)

tensor([[0.4120, 0.6185, 0.7979],
        [0.3137, 0.9068, 0.4318],
        [0.4224, 0.8480, 0.2113],
        [0.5459, 0.9087, 0.9181],
        [0.2700, 0.7042, 0.5308]])


Construct a matrix filled with zeros and of dtype int:



In [3]:
# Create a 5x3 tensor filled with zeros using int32 datatype
# torch.int32 provides a balance between memory usage and range
x = torch.zeros(5, 3, dtype=torch.int32)  # torch.int = torch.int32 = 32-bit signed integer, range of (-2^31) to (2^31 - 1)
print(x)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]], dtype=torch.int32)


Construct a matrix filled with zeros and of dtype long:


In [4]:
# Create a 5x3 tensor filled with zeros using int64 (long) datatype
# Use when working with large integers or when indexing large tensors
x = torch.zeros(5, 3, dtype=torch.long)  # torch.long = torch.int64 = 64-bit signed integer, range of (-2^63) to (2^63 - 1)
print(x)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])


Make a tensor from a list of values:

In [5]:
# PyTorch automatically infers appropriate datatypes based on input values
# Integer without decimal points create tensors with torch.int64 (long) datatype
x = torch.tensor([1, 2, 3])
print(x)
print(x.dtype)

# Floating point literals with decimal points create tensors with torch.float32 datatype
# float32 is the default floating-point precision in PyTorch
x = torch.tensor([1., 2., 3.])
print(x)
print(x.dtype)

tensor([1, 2, 3])
torch.int64
tensor([1., 2., 3.])
torch.float32


Create a tensor based on another tensor (inherit size and dtype, unless otherwise specified):

By default, the returned Tensor of `new_ones` has the same torch.dtype and torch.device as input tensor.

In [6]:
# Create a new tensor with the same dtype and device as x, but filled with ones
# new_* methods (like new_ones) inherit properties (dtype and device) from the source tensor
# new_* methods take output matrix size as input parameters
x = x.new_ones(5, 3)
print(x)

# Create a tensor with same shape as x but filled with random values from normal distribution
# Note: randn samples from standard normal distribution (mean=0, std=1)
x = torch.randn_like(x, dtype=torch.float)  # override dtype!
print(x)                                    # result has the same size

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
tensor([[ 0.0668, -0.3701,  1.0264],
        [ 1.0402, -1.9680,  0.8407],
        [-2.0102, -0.0924,  0.2377],
        [ 0.0984,  0.1048, -0.1059],
        [ 0.4637,  1.1193, -1.7230]])


Get the size object of a tensor, an object which supports tuple operations:

In [7]:
# size() is a method that returns a torch.size object
# It shows the dimensions of the tensor in each axis
print(x.size())

# shape is an attribute that provides the same information as size()
# This is equivalent to size() but follows NumPy's style convention
x.shape

torch.Size([5, 3])


torch.Size([5, 3])

Operations on tensors use similar syntax to NumPy:

In [8]:
# Create two 5x3 tensors filled with ones
x = torch.ones(5, 3)
y = torch.ones(5, 3)

# Element-wise addition using operator syntax (creates a new tensor)
print("x + y:", x + y)

# Element-wise addition using function syntax (equivalent to operator syntax)
print("torch.add(x, y):", torch.add(x, y))

# Original tensors remain unchanged
print("x:", x)
print("y:", y)

x + y: tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])
torch.add(x, y): tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])
x: tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
y: tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])


PyTorch also supports in-place operations (method names end in '_'):

In [9]:
# In-place operations modify the tensor directly and are denoted by a trailing underscore (_)
# add_ adds x to y and stores the result in y (equivalent to y = y + x)
y.add_(x)
print(y)

tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])


#### Qn: What happens if you run y.add_(x) again?

In [10]:
# What happens if you run y.add_(x) again?
# y.add_(x) will add x to y again, but this time it will use the updated y value
# This is because in-place operations modify the tensor in place
# So if you run y.add_(x) again, it will use the new value of y after the first addition
y.add_(x)
print(y) # 메모리 참조를 통해 Add를 수행했으므로 3으로 채워진다


tensor([[3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.]])


#### Qn: What function does "Run before" command serve in Google colab?

In [11]:
# NOTE: For jupyter notebooks, do not run cells that update values of certain variables,
# tensors or have inplace operations multiple times!
# This is because the variable values will be updated and the next cells that use those variables will give incorrect results.

# Solution for Google colab:
# Step 1: Edit >> Clear all outputs without restarting the kernel (수정 -> 모든 출력 제거)
# Step 2: Runtime >> Interrupt execution (this will interrupt the execution of the current cell) (런타임 -> 실행 중단)
# Step 3: Runtime >> Restart session (this will restart the kernel and clear all variables) (런타임 -> 세션 다시 시작)
# Step 4: Runtime >> Run before (this will run all the cells before the current cell) (런타임 -> 이전 셀 실행)

# These steps ensure that the variable values are not changed unexpectedly!

Example of broadcasting:

To learn more:
- Broadcasting basics using Numpy: https://numpy.org/doc/stable/user/basics.broadcasting.html
- Visual Example of Broadcasting using Numpy:  https://numpy.org/doc/stable/_images/broadcasting_2.png
- <a href="https://numpy.org/doc/stable/user/basics.broadcasting.html">
  <img src="https://numpy.org/doc/stable/_images/broadcasting_2.png"
       alt="Broadcasting example" width="500"/>
</a>
- Broadcasting basics using PyTorch: https://pytorch.org/docs/stable/notes/broadcasting.html

In [12]:
import numpy as np
a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])
b = np.array([1.0, 2.0, 3.0]) # 차원을 a 기준으로 맞춰서 값 broadcasting

print(f"Shape of a: {a.shape}")
print(f"Shape of b: {b.shape}")
a + b


Shape of a: (4, 3)
Shape of b: (3,)


array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

#### Qn: What happens if a has shape (4,3) and b has shape (4,)?

In [None]:
# b = np.array([1.0, 2.0, 3.0, 4.0])
# a + b

In [14]:
import numpy as np

# NumPy broadcasting: scalar to array
# The scalar value 3 is broadcast to match the (shape of x) = (5,3)
x = np.ones((5,3)) # 1로 채워진 5행 3열 np.array
y = 3 # x를 기준으로 y가 3으로 채워진 5행 3열 np.array로 broadcast됨
print("x + y:", x + y)


x + y: [[4. 4. 4.]
 [4. 4. 4.]
 [4. 4. 4.]
 [4. 4. 4.]
 [4. 4. 4.]]


In [15]:
# PyTorch broadcasting: scalar to tensor
# Similar to NumPy, PyTorch broadcasts the scalar value 2 to match x's shape
x = torch.ones(5, 3)
y = 2
print("x + y:", x + y)


x + y: tensor([[3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.]])


In [16]:
# PyTorch broadcasting: tensor to tensor with different shapes
x = torch.ones(5, 3)
y = torch.ones(5, 1)

print(f"x is {x}, \n\nShape of x is {x.shape}\n")
print(f"y is {y}, \n\nShape of y is {y.shape}")


x is tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]), 

Shape of x is torch.Size([5, 3])

y is tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.]]), 

Shape of y is torch.Size([5, 1])


#### Qn: Which dimension will y be broadcast along?


In [17]:
# Here, y has shape (5,1) and is broadcast across the second dimension to match x's shape (5,3)
# The value from each row in y is added to each element in the corresponding row of x
print("x + y:", x + y) # y는 5행 3열로 broadcast 된다


x + y: tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])


In [19]:
# # failed case of conflicting dimension
# x = torch.ones(5, 3)
# y = torch.ones(3, 5)
# print("x + y:", x + y)

# 행과 열 둘중에 하나는 맞아야지 broadcast 된다

Indexing works as you would expect:

In [20]:
x = torch.randn(5, 3)
print(x)
print(x[1:4, :])

tensor([[-1.2331, -1.6004,  2.1488],
        [-0.2056, -0.4587,  1.4655],
        [-0.7613,  0.3486,  1.1519],
        [ 1.0689,  1.8702, -0.5663],
        [-1.2982, -0.3522, -0.3863]])
tensor([[-0.2056, -0.4587,  1.4655],
        [-0.7613,  0.3486,  1.1519],
        [ 1.0689,  1.8702, -0.5663]])


In [23]:
# indexing by operators
x[x < 0] = 0
print(x)

tensor([[0.0000, 0.0000, 2.1488],
        [0.0000, 0.0000, 1.4655],
        [0.0000, 0.3486, 1.1519],
        [1.0689, 1.8702, 0.0000],
        [0.0000, 0.0000, 0.0000]])


You can change the order of the dimensions of a tensor with `torch.permute()`:

In [24]:
# Create a 3D tensor with random values from a normal distribution
# Shape (5, 3, 2) means:
#   - 5 elements in the first dimension (think of as "layers") == Index 0
#   - 3 elements in the second dimension (think of as "rows") == Index 1
#   - 2 elements in the third dimension (think of as "columns") == Index 2
x = torch.randn(5, 3, 2) # 3행 2열의 5차원 torch 생성
print(f"x is {x}, \n\nShape of x is {x.shape}")


x is tensor([[[-5.0827e-02,  6.0196e-01],
         [-8.8637e-01,  1.3100e+00],
         [ 8.6924e-01,  1.0600e+00]],

        [[-6.7277e-01, -7.4261e-01],
         [ 1.3902e-03,  7.7815e-01],
         [ 4.8914e-01,  2.0327e+00]],

        [[-3.1641e-01, -1.2154e+00],
         [-4.2746e-01,  1.6590e+00],
         [ 6.0726e-02,  2.3989e-01]],

        [[-9.1434e-01,  3.0674e-01],
         [-1.2861e+00, -7.9680e-01],
         [-5.5386e-01, -1.3207e+00]],

        [[-2.8182e-01, -1.0545e-01],
         [ 1.6844e-01, -2.3084e-01],
         [-1.2874e+00, -4.2593e-01]]]), 

Shape of x is torch.Size([5, 3, 2])


#### Qn: What would permute(1,2,0) avhieve? Is the data still the same?

In [27]:
# Ans: no values are changed, only how they're organized in memory and how you access them!
# permute rearranges the dimensions of a tensor : permute는 tensor의 실제 데이터 값을 바꾸는게 아니라 읽는 순서만 바꾸는 메서드
# permute(1,2,0) changes the order of dimensions from (0,1,2) to (1,2,0)
# 위의 결과를 보면 5개 차원의 3행 2열 행렬로 읽었는데,
# 아래 결과를 보면 3개 차원의 2행 5열 행렬로 읽는다.
# This is like transposing a matrix, but generalized to higher dimensions

print(f"x.permute(1,2,0) is:\n{x.permute(1,2,0)}\n")
print(f"Shape of x.permute(1,2,0) is: {x.permute(1,2,0).shape}")

x.permute(1,2,0) is:
tensor([[[-5.0827e-02, -6.7277e-01, -3.1641e-01, -9.1434e-01, -2.8182e-01],
         [ 6.0196e-01, -7.4261e-01, -1.2154e+00,  3.0674e-01, -1.0545e-01]],

        [[-8.8637e-01,  1.3902e-03, -4.2746e-01, -1.2861e+00,  1.6844e-01],
         [ 1.3100e+00,  7.7815e-01,  1.6590e+00, -7.9680e-01, -2.3084e-01]],

        [[ 8.6924e-01,  4.8914e-01,  6.0726e-02, -5.5386e-01, -1.2874e+00],
         [ 1.0600e+00,  2.0327e+00,  2.3989e-01, -1.3207e+00, -4.2593e-01]]])

Shape of x.permute(1,2,0) is: torch.Size([3, 2, 5])


Tensor data types and casting:

In [28]:
# torch.double is equivalent to torch.float64 (64-bit floating point)
a = torch.ones(3, 3, dtype=torch.double)
print(a)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)


In [29]:
# Converting tensor to long (int64) data type
# .long() is a convenience method equivalent to a.to(torch.int64)
b = a.long()
print(f"b is {b}, \n\nType of b is {b.type()}")

# Converting tensor to int (int32) data type
# .int() is a convenience method equivalent to a.to(torch.int32)
c = a.int()
print(f"c is {c}, \n\nType of c is {c.type()}")

# check value of a again! 타입 캐스팅 시 깊은 복사가 일어나서 새로운 메모리 공간이 할당된다. 즉 a가 가리키는 메모리랑 b, c가 가리키는 메모리가 다름 (새로운 텐서 반환)
print(f"a is {a}, \n\nType of a is {a.type()}")


b is tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]]), 

Type of b is torch.LongTensor
c is tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]], dtype=torch.int32), 

Type of c is torch.IntTensor
a is tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64), 

Type of a is torch.DoubleTensor


Documentation on various dtypes: https://pytorch.org/docs/stable/tensors.html

### More useful PyTorch Tensor operations

To see the complete API check here: https://pytorch.org/docs/stable/tensors.html

`.view()` can be used to resize/reshape tensors:


In [32]:
x = torch.randn(4, 4)
y = x.view(16) # 16개의 열로 변환
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions (-1 넣으면 알아서 계산해서 정해준다는 이야기)
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


If you have a one element tensor, use `.item()` to get the value as a Python number:


In [35]:
x = torch.ones(4,5)
x = x.sum()
print(x)
print(x.item()) # item()을 사용하면 Python 자료구조 형태로 바꿔준다.

tensor(20.)
20.0


Concatenating two matrices together

In [38]:
x = torch.ones(5, 3)
y = torch.zeros(5, 2)
print(torch.cat([x, y], dim=1))


tensor([[1., 1., 1., 0., 0.],
        [1., 1., 1., 0., 0.],
        [1., 1., 1., 0., 0.],
        [1., 1., 1., 0., 0.],
        [1., 1., 1., 0., 0.]])


#### Qn: What happens if you try to concatenate along dim=0?

In [41]:
# # Ans: RuntimeError: Sizes of tensors must match except in dimension 0:
try:
    print(torch.cat([x, y], dim=0))
except Exception :
    print("RuntimeError: 행을 제외한 나머지 차원이 다르면 안돼요")
# x는 5행 3열, y는 5행 2열
# dim=0이라는건, 첫번째 축 방향(행)으로 이어붙이기 떄문에, 차원 0을 제외한 나머지 차원이 일치해야 한다.
# 이 경우, x와 y의 두번째 출 방향(열)이 다르므로 오류가 발생한다.
print(torch.cat([x, y], dim=1)) # 그럼 이 경우는 가능하다

RuntimeError: 행을 제외한 나머지 차원이 다르면 안돼요
tensor([[1., 1., 1., 0., 0.],
        [1., 1., 1., 0., 0.],
        [1., 1., 1., 0., 0.],
        [1., 1., 1., 0., 0.],
        [1., 1., 1., 0., 0.]])


### Converting between NumPy arrays and PyTorch Tensors

Important: PyTorch Tensors and NumPy arrays will share the same underlying memory locations. If you change values for one, the values for the other will be changed too.

Convert PyTorch Tensor to NumPy array:

such conversion requires source tensor to be on CPU.

In [44]:
a = torch.ones(5)
print(f"a is {a}")

# Convert PyTorch tensor to NumPy array
# Note: This conversion only works for source tensors on CPU, not for GPU tensors
# 텐서를 numpy 배열로 바꿀때는 .numpy()사용
b = a.numpy() # 얕은 복사 수행 (b가 a랑 동일한 메모리를 가리킴)
print(f"b is {b}")

# Perform an in-place addition of 1 to the tensor
a.add_(1)

a is tensor([1., 1., 1., 1., 1.])
b is [1. 1. 1. 1. 1.]


tensor([2., 2., 2., 2., 2.])

In [45]:
# Both the tensor and array now contain 2's instead of 1's
# This demonstrates that the tensor and array share the same underlying memory
print(f"a is {a}")  # PyTorch tensor is modified
print(f"b is {b}")  # NumPy array is also modified because of shared memory!

a is tensor([2., 2., 2., 2., 2.])
b is [2. 2. 2. 2. 2.]


Convert NumPy array to PyTorch Tensor:

In [51]:
#반대로 넘파이를 텐서로 바꾸고 싶다면
a = np.ones(5)
b = torch.from_numpy(a) # .numpy(), .from_numpy() => 얕은 복사 수행
print(type(b))
print(a)
print(b)

<class 'torch.Tensor'>
[1. 1. 1. 1. 1.]
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)


#### Qn: What happens if I run the cell below twice?

In [52]:
# Perform in-place addition on the NumPy array
# The 'out=a' parameter means the result is stored back in 'a' (in-place operation)
np.add(a, 1, out=a) # add_는 pytorch 메서드 -> numpy에서는 add()의 parameter로 out=대상 지정

# Both the NumPy array and PyTorch tensor now contain 2's instead of 1's
# This demonstrates the shared memory between PyTorch tensors and NumPy arrays
print(f"a is {a}")
print(f"b is {b}")

a is [2. 2. 2. 2. 2.]
b is tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


## CUDA Tensors (On GPU)

PyTorch tensors have the added benefit that they can easily be placed on a GPU to speed up computations.

Query information about the GPU (if CUDA is available):

In [53]:
if torch.cuda.is_available():
    !nvidia-smi

Sun May 11 15:28:59 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   39C    P8              9W /   70W |       2MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

You can use `torch.device` objects to move tensors to and from the GPU:

In [55]:
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU (GPU 버전의 tensor 생성 방법)
    x = x.to(device)                       # or just use strings `.to("cuda")`
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # `.to` can also change dtype

tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]], device='cuda:0')
tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]], dtype=torch.float64)


Default device can be specified by [`torch.cuda.set_device(device)`](https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html)


## Autograd: Automatic Differentiation

From: https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

Now that you have learned how to use PyTorch Tensors you will learn how we can use PyTorch for automatic differentiation.

The `autograd` package in PyTorch provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

To allow PyTorch to keep track of operations for automatic differentiation, we need to set `requires_grad` as `True` for a Tensor. Autograd will then start to track all operations on the Tensor. When you finish your computation you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into the `.grad` attribute.

To stop a tensor from tracking history, you can call `.detach()` to detach
it from the computation history, and to prevent future computation from being
tracked.

To prevent tracking history (and using memory), you can also wrap the code block in `with torch.no_grad():`. This can be particularly helpful when evaluating a model because the model may have trainable parameters with `requires_grad=True`, but for which we don't need the gradients.

There’s one more class which is very important for autograd implementation - a `Function`.

`Tensor` and `Function` are interconnected and build up an acyclic
graph that encodes a complete history of computation. Each tensor has
a `.grad_fn` attribute that references a `Function` that has created
the `Tensor` (except for Tensors created by the user - their `grad_fn` is `None`).

If you want to compute the derivatives, you can call `.backward()` on a `Tensor`. If `Tensor` is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to `backward()`, however if it has more elements, you need to specify a `gradient` argument that is a tensor of matching shape.

Create a tensor and set `requires_grad=True` to track computation with it:

In [None]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

Perform a tensor operation:

In [None]:
y = x + 2
print(y)

`y` was created as a result of an operation, so it has a `grad_fn`:

In [None]:
print(y.grad_fn)

Do more operations on `y`:

In [None]:
z = y * y * 3
print(z)
out = z.mean()
print(out)

`.requires_grad_()` changes an existing Tensor's `requires_grad` flag in-place. The input flag defaults to `False` if not given:




#### Qn: What if you initialize a tensor and want to track

In [None]:
a = torch.randn(2, 2)

# element-wise operations
a = ((a * 3) / (a - 1))

# Check if tensor 'a' is set up to track gradients (for automatic differentiation)
# By default, tensors don't track gradients to save memory and computation
print(f"Tensor 'a' requires_grad: {a.requires_grad}")

In [None]:
# Enable gradient tracking for tensor 'a'
# This is necessary when you want to compute derivatives with respect to this tensor
# The underscore in requires_grad_() indicates this method modifies the tensor in-place
a.requires_grad_(True)
print(f"Tensor 'a' requires_grad: {a.requires_grad}")

# Create a new tensor 'b' by performing element-wise operations on 'a'
# When operations are performed on tensors with requires_grad=True,
# PyTorch builds a computational graph to track operations for backpropagation
b = (a * a).sum()

# grad_fn shows the last operation used to create this tensor
print(f"b.grad_fn: {b.grad_fn}")

In [None]:
b.backward()  # Compute gradients

# Now 'a' would have its gradient populated
print(a.grad)  # Shows dL/da - the gradient of b with respect to a

### Gradients

Let's backprop now. Because `out` contains a single scalar, `out.backward()` is equivalent to `out.backward(torch.tensor(1))`:



In [None]:
out.backward()

Print gradients $\frac{d(\texttt{out})}{d\texttt{x}}$:

In [None]:
print(x.grad)

You should get a matrix of `4.5`. Let's call the `out` *Tensor* "$o$". We find that $o = \frac{1}{4}\sum_i z_i$, $z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$. Therefore, $\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, hence $\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.

You can do many crazy things with autograd!



In [None]:
# requires_grad=True enables automatic differentiation for tensor x
x = torch.randn(3, requires_grad=True)

# This operation is tracked in the computational graph since x requires gradients
y = x * 2

while y.data.norm() < 1000:   ## y.data gives a copy of y's values without gradient tracking
    y = y * 2

print(y)

#### Quick recap: When calling .backward() on a non-scalar tensor, we must provide an external gradient that represents "how some scalar value changes with respect to each element of y" (∂L/∂y)

In [None]:
# Create a gradient tensor that represents ∂L/∂y for some hypothetical scalar function L
# The values [0.1, 1.0, 0.0001] represent ∂L/∂y₁, ∂L/∂y₂, and ∂L/∂y₃ respectively
# if y contributes to some scalar value L, these are the ∂L/∂y values
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)

# Perform backpropagation through the computational graph
# PyTorch uses the chain rule: ∂L/∂x = (∂L/∂y)(∂y/∂x)
# where ∂L/∂y is our provided gradient tensor
y.backward(gradients)

# Print the gradient of x
# This shows ∂L/∂x - the gradient of our hypothetical scalar function with respect to x
# The values combine: (how our scalar output changes with y) x (how y changes with x)
# This is the chain rule in action: ∂L/∂x = (∂L/∂y)(∂y/∂x)
print(x.grad)

# THINK: multi-dimensional output but need gradients for optimization of a single
# scalar quantity such as a loss function!

You can also stop autograd from tracking history on Tensors with `.requires_grad=True` by wrapping the code block in `with torch.no_grad()`:

In [None]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
	print((x ** 2).requires_grad)

The `.detach()` method is used when we want to perform operations on tensors without affecting the gradient computation of the original tensor.

In [None]:
x = torch.randn(3, requires_grad=True)
print(x.requires_grad)

y = x.detach()
print(y.requires_grad)

z = y * 2
print(z.requires_grad)

#### Qn: Which scenarios would benefit from .detach()?

In [None]:
# Ans: Validation and testing of a model
# .detach() can help save memory by not tracking gradients on the validation/test set calculations

**Read Later:**

Documentation of `torch.autograd` and `Function` is at
http://pytorch.org/docs/autograd

## Neural Networks

The `torch.nn` package in PyTorch provides higher level building blocks for neural networks like fully connected or convolutional layers. The `nn` package makes use of the `autograd` functionality to define these model building blocks and differentiate them. This allows us to quickly and easily implement neural networks by putting together layers and using PyTorch to help us update learnable parameters with the gradient.

An `nn.Module` contains layers, and a method `forward(input)` that
returns the `output`.

A typical training procedure for a neural network is as follows:

- Define the neural network that has some learnable parameters (or
  weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
  `weight = weight - learning_rate * gradient`

### Define the network

In the cell below we define a simple convolutional neural network. Notice that we use the `nn.Conv2d` and `nn.Linear` Modules as building blocks for the network.

There are plenty of other types of layers and tools available in the [torch.nn](https://pytorch.org/docs/stable/nn.html) package such as pooling layers, dropout, and batchnorm.

Conveniently, PyTorch is completely open source so you can check out exactly how each of these Modules are implemented:

* https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py
* https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/conv.py

**Important:** Whenever you extend the `nn.Module` class (e.g. with the `Net` class below) you will need to call the superclass constructor or an error will be thrown. In this example below this line is: `super().__init__()`

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super().__init__()
        # 1 input image channel, 6 output channels, 5x5 convolution kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # Affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.flatten(start_dim=1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
print(net)

You just have to define the ``forward`` function, and the ``backward``
function (where gradients are computed) is automatically defined for you
using ``autograd``.
You can use any of the Tensor operations in the ``forward`` function.

The learnable parameters of a model are returned by ``net.parameters()``



In [None]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

In [None]:
# Let us look at all of the conv1 weights
# Note: Conv1 layer has 6 filters, each of size 5x5
print(f"conv 1 layer weights are {params[0]}")
# Let us look at all of the conv1 biases
# Note: Conv1 layer has 6 filters, each with a bias
print(f"conv 1 layer biases are {params[1]}")

Continuing, let's try a random 32x32 input

In [None]:
# [1 sample, 1 channel, 32 height, 32 width]
input = torch.randn(1, 1, 32, 32)

# This will correctly apply dropout during training:
out = net(input)
print(out)


In [None]:
# This might NOT correctly apply dropout or batch normalization because it bypasses __call__
net.forward(input)

- Zero the gradient buffers of all parameters
- To calculate the gradient of all the parameters that used to compute `out` w.r.t. some random value



In [None]:
net.zero_grad()  # important, since gradient is accumulated
out.backward(torch.randn(1, 10))

In [None]:
# to check gradient buffer:
# net.conv1.bias.grad.shape

The [`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) module can sometimes be helpful to define blocks succintly or avoid creating a new `nn.Module` class for a small network. The `.forward()` function will be automatically defined by running modules in the order they are passed in to `nn.Sequential`.

For example, you can define a block of convolutional layers below:

In [None]:
conv_layers = nn.Sequential(
                nn.Conv2d(1, 6, 5),
                nn.ReLU(),
                nn.Conv2d(6, 16, 5),
                nn.ReLU()
            )

In [None]:
# Example
# Complete LeNet-like CNN model using our conv_layers block
class MnistCNN(nn.Module):
    def __init__(self):
        super(MnistCNN, self).__init__()
        # Reuse our predefined convolutional layers
        self.features = conv_layers

        # Add pooling layers and fully connected layers
        self.pool = nn.MaxPool2d(2, 2)
        self.classifier = nn.Sequential(
            nn.Linear(16 * 5 * 5, 120),  # 16 channels, 5x5 feature maps
            nn.ReLU(),
            nn.Linear(120, 84),
            nn.ReLU(),
            nn.Linear(84, 10)
        )

    def forward(self, x):
        # First convolutional block (changes size from 28x28 to 24x24, then to 20x20)
        x = self.features(x)

        # Pooling (reduces size from 20x20 to 10x10)
        x = self.pool(x)

        # Flatten for the fully connected layers
        x = torch.flatten(x, 1)

        # Classifier
        x = self.classifier(x)
        return x

**Note:**

`torch.nn` only supports mini-batches. The entire `torch.nn` package only supports inputs that are a mini-batch of samples, and not a single sample.

For example, `nn.Conv2d` will take in a 4D Tensor of `nSamples x nChannels x Height x Width`.

If you have a single sample, just use `input.unsqueeze(0)` to add a fake batch dimension.

Before proceeding further, let's recap all the classes you’ve seen so far.

**Recap:**
- `torch.Tensor` - A *multi-dimensional array* with support for autograd operations like `backward()`. Also *holds the gradient* w.r.t. the tensor.
- `nn.Module` - Neural network module. *Convenient way of encapsulating parameters*, with helpers for moving them to GPU, exporting, loading, etc.
- `nn.Parameter` - A kind of Tensor, that is *automatically registered as a parameter when assigned as an attribute to a* `Module`.
- `autograd.Function` - Implements *forward and backward definitions of an autograd operation*. Every `Tensor` operation, creates at least a single `Function` node, that connects to functions that created a `Tensor` and *encodes its history*.

**At this point, we covered:**
- Defining a neural network
- Processing inputs and calling backward

**Still Left:**
- Computing the loss
- Updating the weights of the network

### Loss Function

A loss function takes the (output, target) pair of inputs, and computes a
value that estimates how far away the output is from the target.

There are several different [loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions) under the `nn` package. A simple loss is `nn.MSELoss`, which computes the mean-squared error between the input and the target.

For example:

In [None]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

Now, if you follow `loss` in the backward direction, using its `.grad_fn` attribute, you will see a graph of computations that looks like this:

    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
          -> view -> linear -> relu -> linear -> relu -> linear
          -> MSELoss
          -> loss

So, when we call `loss.backward()`, the whole graph is differentiated w.r.t. the loss, and all Tensors in the graph that have `requires_grad=True` will have their `.grad` Tensor accumulated with the gradient.

For illustration, let us follow a few steps backward:

In [None]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[1][0])  # ReLU

### Backprop

To backpropagate the error all we have to do is call `loss.backward()`. You need to clear the existing gradients though, otherwise the gradients will be accumulated to existing gradients.

Now we'll call `loss.backward()`, and have a look at conv1's bias
gradients before and after the backward step.

In [None]:
net.zero_grad()  # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

Now, we have seen how to use loss functions.

**The only thing left to learn is:**

- Updating the weights of the network

### Update the weights

The simplest update rule used in practice is the Stochastic Gradient
Descent (SGD):

     weight = weight - learning_rate * gradient

We can implement this using simple python code:

```python
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)
```

However, as you use neural networks, you'll want to use various different
update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.
To enable this, PyTorch has a small package: `torch.optim` that
implements all these methods. Using it is very simple:

In [None]:
import torch.optim as optim

# Create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# In your training loop:
optimizer.zero_grad()             # zero the gradient buffers
output = net(input)               # compute the forward pass
loss = criterion(output, target)  # compute the loss
loss.backward()                   # compute the gradients
optimizer.step()                  # update the parameters

print(loss)

**Important:** Note how gradient buffers had to be manually set to zero using `optimizer.zero_grad()`. This is because gradients are accumulated, so if you don't zero gradients before each `backward()` call, you will begin accumulating gradients from previous forward/backward passes.

#### Note on eval and train modes

**Important**: If you use layers in your network like `torch.nn.Dropout` or `torch.nn.BatchNorm2d` which have different behavior during training and evaluation, you will need to make sure the modules in your network are appropriately set. PyTorch makes this easy with `eval` and `train` methods for any network extending `nn.Module`. Before beginning training you will call `net.train()` to set all modules in the network to train mode, and equivalently before evaluating you should call `net.eval()`.

## Training a Classifier

Now that you have seen the basics of how to define neural networks, compute losses, and make training updates, you will see how a simple classifier is trained in PyTorch on CIFAR-10.

### What about data?

Generally, when you have to deal with image, text, audio, or video data,
you can use standard python packages that load data into a numpy array.
Then you can convert this array into a `torch.*Tensor`.

-  For images, packages such as Pillow, OpenCV are useful
-  For audio, packages such as scipy and librosa
-  For text, either raw Python or Cython based loading, or NLTK and
   SpaCy are useful

Specifically for vision, we have created a package called
`torchvision`, that has data loaders for common datasets such as
Imagenet, CIFAR10, MNIST, etc., models for common architectures, and data transformers for images.

This provides a huge convenience and avoids writing boilerplate code.

For this tutorial, we will use the CIFAR10 dataset.
It has the classes: 'airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck'. The images in CIFAR-10 are of
size $3 \times 32 \times 32$, i.e. 3-channel color images of $32 \times 32$ pixels in size.

### Training an image classifier

We will do the following steps in order:

1. Load and normalizing the CIFAR10 training and test datasets using ``torchvision``
2. Define a Convolution Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data

### 1) Loading and normalizing CIFAR10

Using `torchvision`, it’s extremely easy to load CIFAR10.



In [None]:
import torch
import torchvision
import torchvision.transforms as transforms

The output of torchvision datasets are `PIL` images of range [0, 1].
We transform them to Tensors of normalized range [-1, 1] using the `transforms.ToTensor` and `transforms.Normalize` functions.

The [transforms package](https://pytorch.org/vision/stable/transforms.html) has other functions that you might use for **data augmentation**. For example, `torchvision.transforms.RandomResizedCrop` and `torchvision.transforms.RandomHorizontalFlip`.

In [None]:
# Transforms
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

# Datasets
trainset = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True, transform=train_transform)
testset = torchvision.datasets.CIFAR10(
    root='./data', train=False, download=True, transform=test_transform)

# Data loaders
trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=4, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=4, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

Let us show some of the training images, for fun.



In [None]:
import matplotlib.pyplot as plt
import numpy as np


def imshow(img):
    """Function to display an image."""
    img = img / 2 + 0.5  # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.axis('off')


# Get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)


# Show images
imshow(torchvision.utils.make_grid(images))
# Print labels
print('      '.join('%5s' % classes[labels[j]] for j in range(4)))

#### Training on GPU

Just like how you transfer a Tensor on to the GPU, you transfer the neural
net onto the GPU.

Let's first define our device as the first visible cuda device if we have
CUDA available:

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)

If `device` is in fact set to a CUDA device, then these methods will recursively go over all modules and convert their parameters and buffers to CUDA tensors:

```python
net = net.to(device)
```

Remember that you will have to send the inputs and targets at every step
to the GPU too:

```python
inputs, labels = inputs.to(device), labels.to(device)
```

Why don't I notice MASSIVE speedup compared to CPU? Because your network
is *realllly* small.

**Exercise:** Try increasing the width of your network (argument 2 of
the first `nn.Conv2d`, and argument 1 of the second `nn.Conv2d` –
they need to be the same number), see what kind of speedup you get.

### 2) Define a Convolution Neural Network

Copy the neural network from the Neural Networks section before and modify it to
take 3-channel images (instead of 1-channel images as it was defined).



In [None]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.out_channels1 = 6
        self.out_channels2 = 16
        self.conv1 = nn.Conv2d(3, self.out_channels1, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(self.out_channels1, self.out_channels2, 5)
        self.fc1 = nn.Linear(self.out_channels2 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.flatten(start_dim=1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net().to(device)

### 3) Define a Loss function and optimizer

Let's use a Classification Cross-Entropy loss and SGD with momentum.



In [None]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
#optimizer = optim.Adam(net.parameters(), lr=0.001)

### 4) Train the network


This is when things start to get interesting.
We simply have to loop over our data iterator, and feed the inputs to the
network and optimize.



In [None]:
import time


net = net.train()

# Loop over the dataset for multiple epochs
for epoch in range(1, 3):
    running_loss = 0.0
    t_s = time.time()

    # For each mini-batch...
    for i, data in enumerate(trainloader, 1):
        # Get the inputs
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 0:  # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch, i, running_loss / 2000))
            running_loss = 0.0
            print('iters time:', time.time() - t_s)
            t_s = time.time()

print('Finished Training')

Let’s quickly save our trained model:

In [None]:
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

We can load back in a saved model with the following: (note: saving and re-loading the model wasn’t necessary here, we only did it to illustrate how to do so):

In [None]:
net = Net().to(device)
net.load_state_dict(torch.load(PATH))

### 5) Test the network on the test data

We have trained the network for 2 passes over the training dataset.
But we need to check if the network has learned anything at all.

We will check this by predicting the class label that the neural network
outputs, and checking it against the ground-truth. If the prediction is
correct, we add the sample to the list of correct predictions.

Okay, first step. Let us display an image from the test set to get familiar.

In [None]:
dataiter = iter(testloader)
images, labels = next(dataiter)

# print images
imshow(torchvision.utils.make_grid(images))
print('Ground truth:')
print('      '.join('%5s' % classes[labels[j]] for j in range(4)))

Okay, now let us see what the neural network thinks these examples above are:



In [None]:
net = net.eval()

outputs = net(images.to(device))

The outputs are energies for the 10 classes.
Higher the energy for a class, the more the network
thinks that the image is of the particular class.
So, let's get the index of the highest energy:



In [None]:
_, predicted = torch.max(outputs, 1)

imshow(torchvision.utils.make_grid(images))
print('Predicted:')
print('      '.join('%5s' % classes[predicted[j]] for j in range(4)))

The results seem pretty good.

Let us look at how the network performs on the whole dataset.



In [None]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d%%' % (
    100 * correct / total))

That looks waaay better than chance, which is 10% accuracy (randomly picking
a class out of 10 classes).
Seems like the network learned something.

Hmmm, what are the classes that performed well, and the classes that did
not perform well:

In [None]:
class_correct = [0] * 10
class_total = [0] * 10
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s: %2d%%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

### Other information

How to write data loading code in PyTorch: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

More details on saving and loading models: https://pytorch.org/tutorials/beginner/saving_loading_models.html

## Other Tips and Helpful Functions


### Tips for debugging

Checklist for common PyTorch mistakes:

* Did you set `shuffle=True` in your train dataloader?
* Did you properly set `net.train()` and `net.eval()` in your training and evaluation code?
* Did you call `zero_grad()` in your training loop before `.backward()` to prevent gradients from accumulating?

Other tips:
* Have you visualized your loaded images? This is the best way to catch data loader issues.
* If you are getting a CUDA out of memory error, first try decreasing the batch size. If you are still getting the same error, your network may simply be too large, or you could be accidentally allocating a large array in memory.
* If the GPU memory is full, first try clearing the outputs and restarting the kernel. If that does not work, manually clear the GPU memory using `torch.cuda.empty_cache()`.
* Getting CUDA errors that are hard to understand? Sometimes error messages will be simpler if you switch your network to cpu memory to debug the forward and backward passes.



### Pretrained models

PyTorch provides easy access to load many pretrained models. You can find a wide variety of vision models pretrained for different tasks in the `torchvision` package: https://pytorch.org/vision/stable/models.html

To load a ResNet50 model pretrained on ImageNet:

In [None]:
from torchvision.models import resnet50

resnet = resnet50(pretrained=True)

It's common that you may want to finetune some or all of the weights in a pretrained model. You can check here for more details on how to do this: https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html


There are even more pretrained models available on the PyTorch Hub: https://pytorch.org/hub/

### More Tensor operations

The `torch.einsum` function offers a compact way to express various matrix transformations and products. Many of common matrix and vector computations can be easily expressed elegantly with a call to this function.

Some simple examples are below, but you can find many more example einsum operations in this helpful blog post: https://rockt.github.io/2018/04/30/einsum


In [None]:
x = torch.arange(6).reshape(2, 3)
print('x: ', x)

# matrix transpose
out = torch.einsum('ij->ji', [x])
print(out)

# sum all the rows in a matrix
out = torch.einsum('ij->i', [x])
print(out)

# sum all the values in a matrix
out = torch.einsum('ij->', [x])
print(out)

Operations on two matrices:

In [None]:
x = torch.arange(9).reshape(3, 3)
y = torch.arange(9).reshape(3, 3)
print('x: ', x)
print('y: ', y)

# element-wise multiplication
out = torch.einsum('ij,ij->ij', [x, y])
print(out)

# matrix multiplication
out = torch.einsum('ik,kj->ij', [x, y])
print(out)