# PyTorch

What is PyTorch?

It’s a Python based scientific computing package targeted at two sets of audiences:

- A replacement for numpy to use the power of GPUs
- a deep learning research platform that provides maximum flexibility and speed


## Preparation


In [116]:
import numpy as np
import matplotlib.pylab as plt
import sys
import torch
import torchvision
import torch.nn as nn
import torchvision.transforms as transforms

In [117]:
def set_default(figsize=(8, 5), dpi=100):
    plt.style.use(["dark_background", "bmh"])
    plt.rc("axes", facecolor="k")
    plt.rc("figure", facecolor="k")
    plt.rc("figure", figsize=figsize, dpi=dpi)


set_default()

### Check Package Versions

In [118]:
print("__Python VERSION:", sys.version)
print("__PyTorch VERSION:", torch.__version__)
print("__CUDNN VERSION:", torch.backends.cudnn.version())
print("__Number CUDA Devices:", torch.cuda.device_count())

__Python VERSION: 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ]
__PyTorch VERSION: 2.1.2
__CUDNN VERSION: None
__Number CUDA Devices: 0


## Tensors

Tensors are similar to numpy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

Construct a 5x3 matrix, uninitialized


In [119]:
x = torch.Tensor(5, 3)
print(x)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


In [120]:
# get its size
y = torch.rand(5, 3)
print(y)

print("------")
print(x + y)

tensor([[0.6759, 0.3309, 0.4508],
        [0.1773, 0.3992, 0.8487],
        [0.8971, 0.2025, 0.7822],
        [0.9403, 0.4309, 0.5986],
        [0.4874, 0.6713, 0.1971]])
------
tensor([[0.6759, 0.3309, 0.4508],
        [0.1773, 0.3992, 0.8487],
        [0.8971, 0.2025, 0.7822],
        [0.9403, 0.4309, 0.5986],
        [0.4874, 0.6713, 0.1971]])


In [121]:
# Addition: in-place
y.add_(x)

tensor([[0.6759, 0.3309, 0.4508],
        [0.1773, 0.3992, 0.8487],
        [0.8971, 0.2025, 0.7822],
        [0.9403, 0.4309, 0.5986],
        [0.4874, 0.6713, 0.1971]])

### Create tensors

In [122]:
# random
v = torch.rand(2, 3)  # Initialize with random number (uniform distribution)
v = torch.randn(2, 3)  # With normal distribution (SD=1, mean=0)
v = torch.randperm(4)

# ones
eye = torch.eye(3)  # Create an identity 3x3 tensor
v = torch.ones(10)  # A tensor of size 10 containing all ones
v = torch.ones(2, 1, 2, 1)  # Size 2x1x2x1
v = torch.ones_like(eye)  # A tensor with same shape as eye. Fill it with 1.

# zeros
v = torch.zeros(10)

# range of values
v = torch.arange(5)  # similar to range(5) but creating a Tensor
v = torch.arange(0, 5, step=1)  # Size 5. Similar to range(0, 5, 1)

# linear or log scale
v = torch.linspace(
    1, 10, steps=10
)  # Create a Tensor with 10 linear points for (1, 10) inclusively
print(v)

v = torch.logspace(
    start=-10, end=10, steps=5
)  # Size 5: 1.0e-10 1.0e-05 1.0e+00, 1.0e+05, 1.0e+10
print(v)

tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])
tensor([1.0000e-10, 1.0000e-05, 1.0000e+00, 1.0000e+05, 1.0000e+10])


### Dot product, component-wide product, matrix multiplication, 

In [123]:
# Dot product of 2 tensors
r = torch.dot(torch.Tensor([4, 2]), torch.Tensor([3, 1]))  # 14

In [124]:
# component-wise product
torch.Tensor([4, 2]) * torch.Tensor([3, 1])

tensor([12.,  2.])

In [125]:
# Matrix x Matrix
# Size 2x4
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 4)
r = torch.mm(mat1, mat2)  # Size 2x4

In [126]:
# Batch Matrix x Matrix
# Size 10x3x5
batch1 = torch.randn(10, 3, 4)
batch2 = torch.randn(10, 4, 5)
r = torch.bmm(batch1, batch2)

### Squeeze and unsqueeze

In [127]:
t = torch.ones(2, 1, 2, 1)  # Size 2x1x2x1
print(t)
r = torch.squeeze(t)  # Size 2x2
print(r)
r = torch.squeeze(t, 1)  # Squeeze dimension 1: Size 2x2x1
print(r)

# Un-squeeze a dimension
x = torch.Tensor([1, 2, 3])
r = torch.unsqueeze(x, 0)  # Size: 1x3
r = torch.unsqueeze(x, 1)  # Size: 3x1

tensor([[[[1.],
          [1.]]],


        [[[1.],
          [1.]]]])
tensor([[1., 1.],
        [1., 1.]])
tensor([[[1.],
         [1.]],

        [[1.],
         [1.]]])


### Transpose


In [128]:
# Transpose dim 0 and 1
v = torch.randn(3, 2)
r = torch.transpose(v, 0, 1)
print(r, r.shape)

print(v.T, v.T.shape)

tensor([[-0.9570,  0.6828, -0.7806],
        [-1.6305, -0.3655,  0.7060]]) torch.Size([2, 3])
tensor([[-0.9570,  0.6828, -0.7806],
        [-1.6305, -0.3655,  0.7060]]) torch.Size([2, 3])


### Numpy Bridge

Converting a torch Tensor to a numpy array and vice versa is a breeze.

The torch Tensor and numpy array will share their underlying memory locations, and changing one will change the other.

Converting torch Tensor to numpy Array


In [129]:
# Create a numpy array.
x = np.array([[1, 2], [3, 4]])

# Convert the numpy array to a torch tensor.
y = torch.from_numpy(x)

# Convert the torch tensor to a numpy array.
z = y.numpy()

In [130]:
# Conversion
a = np.array([1, 2, 3])
v = torch.from_numpy(a)  # Convert a numpy array to a Tensor

b = v.numpy()  # Tensor to numpy
b[1] = -1  # Numpy and Tensor share the same memory
assert a[1] == b[1]  # Change Numpy will also change the Tensor

### Reshape tensor

In [131]:
### Tensor resizing
x = torch.randn(2, 3)  # Size 2x3
y = x.view(6)  # Resize x to size 6
z = x.view(-1, 2)  # Size 3x2
print(y.shape, z.shape)

torch.Size([6]) torch.Size([3, 2])


### CUDA Tensors

All the Tensors on the CPU except a CharTensor support converting to NumPy and back.

Tensors can be moved onto GPU using the .cuda function.


In [132]:
# let us run this cell only if CUDA is available

x = torch.rand(3, 2)
y = torch.rand(3, 2)
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    x + y

In [133]:
x

tensor([[0.2534, 0.5432],
        [0.8201, 0.1815],
        [0.6040, 0.7312]])

In [134]:
y

tensor([[0.4071, 0.4429],
        [0.6041, 0.1087],
        [0.7630, 0.7572]])

## Autograd: automatic differentiation

Central to all neural networks in PyTorch is autograd, a core torch package for automatic differentiation.

The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

Let us see this in more simple terms with some examples.


In [135]:
# create an variable
x = torch.ones((2, 2), requires_grad=True)

# Do an operation of variable:
y = x + 2

# Do more operations on y
z = y * y * 3
out = z.mean()

In [136]:
# Gradients
# ---------
# let's backprop now
# ``out.backward()`` is equivalent to doing ``out.backward(torch.Tensor([1.0]))``
out.backward()

In [137]:
###############################################################
# print gradients d(out)/dx
#

print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


You should have got a matrix of `4.5`. Let’s call the `out` _Variable_ $o$.
We have that: $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$

Therefore,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$ hence
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$


In [138]:
# You can do many crazy things with autograd!
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

tensor([ 200.5000, -397.1070, -951.7963], grad_fn=<MulBackward0>)


In [139]:
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(x.grad)

tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])


### Basic autograd example 1 

In [140]:
# Create tensors.
x = torch.tensor(1.0, requires_grad=True)
w = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(3.0, requires_grad=True)

In [141]:
# Build a computational graph.
y = w * x + b  # y = 2 * x + 3

In [142]:
# Compute gradients.
y.backward()

In [143]:
# Print out the gradients.
print(x.grad)  # x.grad = 2
print(w.grad)  # w.grad = 1
print(b.grad)  # b.grad = 1

tensor(2.)
tensor(1.)
tensor(1.)


In [144]:
y.detach().numpy()

array(5., dtype=float32)

### Basic autograd example 2  

In [145]:
# Create tensors of shape (10, 3) and (10, 2).
x = torch.randn(10, 3)
y = torch.randn(10, 2)
print(x.shape, y.shape)

torch.Size([10, 3]) torch.Size([10, 2])


In [146]:
# Build a fully connected layer.
linear = nn.Linear(3, 2)
print("w: ", linear.weight)
print("b: ", linear.bias)

w:  Parameter containing:
tensor([[ 0.3042,  0.3011,  0.3361],
        [-0.5711,  0.4192,  0.2788]], requires_grad=True)
b:  Parameter containing:
tensor([-0.1238, -0.5274], requires_grad=True)


In [147]:
loss = torch.sum((linear(x) - y) ** 2) / y.shape[0]
print("loss: ", loss.data.numpy())

loss:  3.4196022


In [148]:
loss.backward()

In [149]:
print("w grad: ", linear.weight.grad)
print("b grad: ", linear.bias.grad)

w grad:  tensor([[ 0.2617,  1.1267,  1.6712],
        [-1.3647,  1.7426,  0.9916]])
b grad:  tensor([-0.4629, -1.3016])


In [150]:
# check grad
print("w grad:", (linear(x) - y).transpose(0, 1).mm(x) / y.shape[0] * 2)
print("b grad:", 2 * torch.mean(linear(x) - y, dim=0))

w grad: tensor([[ 0.2617,  1.1267,  1.6712],
        [-1.3647,  1.7426,  0.9916]], grad_fn=<MulBackward0>)
b grad: tensor([-0.4629, -1.3016], grad_fn=<MulBackward0>)


### Basic autograd example 3

In [151]:
# Create tensors of shape (10, 3) and (10, 2).
x = torch.randn(10, 3)
y = torch.randn(10, 2)
linear = nn.Linear(3, 2)

In [152]:
# Build loss function and optimizer.
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# Forward pass.
pred = linear(x)

# Compute loss.
loss = criterion(pred, y)
print("loss: ", loss.item())

# Backward pass.
loss.backward()

# Print out the gradients.
print("dL/dw: ", linear.weight.grad)
print("dL/db: ", linear.bias.grad)

# 1-step gradient descent.
optimizer.step()

# You can also perform gradient descent at the low level.
# linear.weight.data.sub_(0.01 * linear.weight.grad.data)
# linear.bias.data.sub_(0.01 * linear.bias.grad.data)

# Print out the loss after 1-step gradient descent.
pred = linear(x)
loss = criterion(pred, y)
print("loss after 1 step optimization: ", loss.item())

loss:  1.9841773509979248
dL/dw:  tensor([[-0.7695,  0.5312,  0.2570],
        [ 0.6282,  0.3737,  0.8947]])
dL/db:  tensor([ 1.1803, -0.1063])
loss after 1 step optimization:  1.9476324319839478


## Input pipeline, Data loader

``` python
# Download and construct CIFAR-10 dataset.
train_dataset = torchvision.datasets.CIFAR10(
    root="./data", train=True, transform=transforms.ToTensor(), download=True
)

# Fetch one data pair (read data from disk).
image, label = train_dataset[0]
print(image.size())
print(label)

# Data loader (this provides queues and threads in a very simple way).
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset, batch_size=64, shuffle=True
)

# When iteration starts, queue and thread start to load data from files.
data_iter = iter(train_loader)

# Mini-batch images and labels.
images, labels = data_iter.next()

# Actual usage of the data loader is as below.
for images, labels in train_loader:
    # Training code should be written here.
    pass
```

### Input pipeline for custom dataset

``` python
# ================================================================== #
#                  Input pipeline for custom dataset                 #
# ================================================================== #


# You should build your custom dataset as below.
class CustomDataset(torch.utils.data.Dataset):
    def __init__(self):
        # TODO
        # 1. Initialize file paths or a list of file names.
        pass

    def __getitem__(self, index):
        # TODO
        # 1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open).
        # 2. Preprocess the data (e.g. torchvision.Transform).
        # 3. Return a data pair (e.g. image and label).
        pass

    def __len__(self):
        # You should change 0 to the total size of your dataset.
        return 0


# You can then use the prebuilt data loader.
custom_dataset = CustomDataset()
train_loader = torch.utils.data.DataLoader(
    dataset=custom_dataset, batch_size=64, shuffle=True
)
```

### Pretrained model

``` python
# ================================================================== #
#                           Pretrained model                         #
# ================================================================== #

# Download and load the pretrained ResNet-18.
resnet = torchvision.models.resnet18(pretrained=True)

# If you want to finetune only the top layer of the model, set as below.
for param in resnet.parameters():
    param.requires_grad = False

# Replace the top layer for finetuning.
resnet.fc = nn.Linear(resnet.fc.in_features, 100)  # 100 is an example.

# Forward pass.
images = torch.randn(64, 3, 224, 224)
outputs = resnet(images)
print (outputs.size())     # (64, 100)
```

### Save and load the model    

``` python
# Save and load the entire model.
torch.save(resnet, "model.ckpt")
model = torch.load("model.ckpt")

# Save and load only the model parameters (recommended).
torch.save(resnet.state_dict(), "params.ckpt")
resnet.load_state_dict(torch.load("params.ckpt"))
```