TODO:
- [ ] More Tensor Ops from d2l and my note

In [1]:
import os
import numpy as np
import time
import matplotlib.pyplot as plt
from matplotlib.colors import to_rgba
from IPython.display import set_matplotlib_formats
import seaborn as sns
from tqdm.notebook import tqdm

%matplotlib inline 
set_matplotlib_formats('svg', 'pdf')  # For export

import torch
from boring_utils.utils import *

  set_matplotlib_formats('svg', 'pdf')  # For export


# Assign Values to Tensor

To directly assign values to the tensor during initialization, there are many alternatives including:

* `torch.zeros`: Creates a tensor filled with zeros
* `torch.ones`: Creates a tensor filled with ones
* `torch.rand`: Creates a tensor with random values uniformly sampled between 0 and 1
* `torch.randn`: Creates a tensor with random values sampled from a normal distribution with mean 0 and variance 1
* `torch.arange`: Creates a tensor containing the values $N,N+1,N+2,...,M$
* `torch.Tensor` (input list): Creates a tensor from the list elements you provide
* `torch.from_numpy`: Converts a numpy array into a PyTorch tensor

In [2]:
x = torch.Tensor(2, 3, 4)
cprint(x)

[93m<module> -> x:[0m
tensor([[[ 1.4013e-45,  0.0000e+00,  1.4013e-45,  0.0000e+00],
         [ 1.2728e-05,  4.0604e-41,  1.4013e-45,  7.1435e+31],
         [ 1.3205e-05,  4.0604e-41,  1.3206e-05,  4.0604e-41]],

        [[ 0.0000e+00,  0.0000e+00,  7.0625e-43,  0.0000e+00],
         [ 2.8026e-45,  0.0000e+00,  1.3206e-05,  4.0604e-41],
         [-2.2104e-22,  3.0712e-41,  4.2039e-45,  0.0000e+00]]])


In [3]:
# x = torch.tensor([[1, 2], [3, 4]])
x = torch.randn([2, 3, 4])
cprint(x, x.shape, x.size())

[93m<module> -> x:[0m
tensor([[[-2.0993e+00,  2.2594e+00, -8.8424e-01,  7.5328e-01],
         [-7.3936e-01,  1.4778e+00,  9.5404e-01,  3.7945e-01],
         [ 8.3714e-01,  1.0330e+00,  3.3841e-01,  2.4910e+00]],

        [[-1.6248e+00, -4.9452e-01, -1.6559e+00, -1.3338e+00],
         [-9.0200e-04, -2.0273e-01, -2.6124e-02, -2.3574e+00],
         [ 2.5105e+00, -9.1110e-01, -4.6658e-01,  8.9686e-02]]])
[93m<module> -> x.shape:[0m
torch.Size([2, 3, 4])
[93m<module> -> x.size():[0m
torch.Size([2, 3, 4])


Calling `x1 + x2` creates a new tensor containing the sum of the two inputs. However, we can also use in-place operations that are applied directly on the memory of a tensor. We therefore change the values of `x2` without the chance to re-accessing the values of `x2` before the operation.

In [4]:
x1 = torch.rand(2, 3)
x2 = torch.rand(2, 3)

y = x1 + x2
x2.add_(x1)

tensor([[1.4396, 1.6564, 1.4042],
        [1.4909, 0.9187, 1.3233]])

In [5]:
# address of y:
cprint(id(y), id(x1), id(x2))
# cprint(y, x1, x2)
cprint(y == x2)

[93m<module> -> id(y):[0m
124451900923776
[93m<module> -> id(x1):[0m
124451900932016
[93m<module> -> id(x2):[0m
124451900918656
[93m<module> -> y == x2:[0m
tensor([[True, True, True],
        [True, True, True]])


# Reshape

In [6]:
# original shape
x = torch.arange(0, 6)
cprint(x.shape)

x = x.view(2, 3)
cprint(x.shape)

# swap dim 0 and 1
x = x.permute(1, 0)
cprint(x.shape)

[93m<module> -> x.shape:[0m
torch.Size([6])
[93m<module> -> x.shape:[0m
torch.Size([2, 3])
[93m<module> -> x.shape:[0m
torch.Size([3, 2])


# Matrix Multiplications
Quite often, we have an input vector $\mathbf{x}$, which is transformed using a learned weight matrix $\mathbf{W}$. There are multiple ways and functions to perform matrix multiplication, some of which we list below:

* `torch.matmul`: Performs the matrix product over two tensors, where the specific behavior depends on the dimensions. If both inputs are matrices (2-dimensional tensors), it performs the standard matrix product. For higher dimensional inputs, the function supports broadcasting (for details see the [documentation](https://pytorch.org/docs/stable/generated/torch.matmul.html?highlight=matmul#torch.matmul)). Can also be written as `a @ b`, similar to numpy. 
* `torch.mm`: Performs the matrix product over two matrices, but doesn't support broadcasting (see [documentation](https://pytorch.org/docs/stable/generated/torch.mm.html?highlight=torch%20mm#torch.mm))
* `torch.bmm`: Performs the matrix product with a support batch dimension. If the first tensor $T$ is of shape ($b\times n\times m$), and the second tensor $R$ ($b\times m\times p$), the output $O$ is of shape ($b\times n\times p$), and has been calculated by performing $b$ matrix multiplications of the submatrices of $T$ and $R$: $O_i = T_i @ R_i$
* `torch.einsum`: Performs matrix multiplications and more (i.e. sums of products) using the Einstein summation convention. Explanation of the Einstein sum can be found in assignment 1.

Usually, we use `torch.matmul` or `torch.bmm`. We can try a matrix multiplication with `torch.matmul` below.

In [7]:
x = torch.arange(6)
x = x.view(2, 3)

W = torch.arange(12).view(3, 4)

h = x @ W
cprint(h.shape)

[93m<module> -> h.shape:[0m
torch.Size([2, 4])


# Indexing

In [8]:
x = torch.arange(12).view(3, 4)
cprint(x)

[93m<module> -> x:[0m
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


In [9]:
# 2nd col
cprint(x[:, 1])

# 1st row
cprint(x[0])

# elements of the tensor x that are both in first two rows and the last column 
cprint(x[:2, -1])

# 2nd half of the tensor x 
cprint(x[len(x[0])//2:, :])

[93m<module> -> x[:, 1]:[0m
tensor([1, 5, 9])
[93m<module> -> x[0]:[0m
tensor([0, 1, 2, 3])
[93m<module> -> x[:2, -1]:[0m
tensor([3, 7])
[93m<module> -> x[len(x[0])//2:, :]:[0m
tensor([[ 8,  9, 10, 11]])


# Dynamic Graph

In [10]:
x = torch.ones((3,))
cprint(x)

[93m<module> -> x:[0m
tensor([1., 1., 1.])


In [11]:
x.requires_grad_(True)
cprint(x.requires_grad)

[93m<module> -> x.requires_grad:[0m
True


In order to get familiar with the concept of a computation graph, we will create one for the following function:

$$y = \frac{1}{|x|}\sum_i \left[(x_i + 2)^2 + 3\right]$$

You could imagine that $x$ are our parameters, and we want to optimize (either maximize or minimize) the output $y$. For this, we want to obtain the gradients $\partial y / \partial \mathbf{x}$. For our example, we'll use $\mathbf{x}=[0,1,2]$ as our input.

In [12]:
# NOTE: only float tensors can require gradients!!!
x = torch.arange(3, dtype=torch.float32, requires_grad=True)
cprint(x)

# Ops: (add -> a, square -> b, add -> c, mean -> y)
a = x + 2
b = a ** 2
c = b + 3
y = c.mean()

cprint(y)

[93m<module> -> x:[0m
tensor([0., 1., 2.], requires_grad=True)
[93m<module> -> y:[0m
tensor(12.6667, grad_fn=<MeanBackward0>)


In [13]:
cprint(x.grad)

y.backward()

cprint(x.grad)

[93m<module> -> x.grad:[0m
None
[93m<module> -> x.grad:[0m
tensor([1.3333, 2.0000, 2.6667])


# GPU

In [14]:
cprint(torch.cuda.is_available())
device = 'cuda' if torch.cuda.is_available() else 'cpu'

[93m<module> -> torch.cuda.is_available():[0m
True


In [15]:
x = x.to(device)
cprint(x)

[93m<module> -> x:[0m
tensor([0., 1., 2.], device='cuda:0', grad_fn=<ToCopyBackward0>)
