This notebook covers some of the key operations used in pytorch. We start with a quick oversion of axues (both numpy and pytorch use same representation)

In [3]:
import torch
import numpy as np

## Axes

### 2D matrices

For both numpy and pytorch, axis 0 = row, 1 = column

Note how when we specify axis=0 for sum, we are collapsing along that (row) axis

In [6]:
np.sum([[1, 0], [3, 5]], axis=0) #->[1+3, 0+5]

array([4, 5])

If we flatten on colums we remove a column dimension - (ie we dont sum along columns :-) )

In [8]:
np.sum([[1, 0], [3, 5]], axis=1) #->[1+0, 3+5]

array([1, 8])

### 3D arrays / tensors

In [10]:
a = np.random.randint(10, size=(2,2,2))

In [30]:
a, a.shape

(array([[[9, 4],
         [5, 4]],
 
        [[4, 3],
         [9, 0]]]),
 (2, 2, 2))

axis 0, this referes to arrays in the outside bracket [  [[]],[[]]  ]

In [14]:
a[0], a[1]

(array([[9, 4],
        [5, 4]]),
 array([[4, 3],
        [9, 0]]))

axis 1 refers to elements in each of the axis 0 items

In [28]:
a[0][0], a[0][1], a[1][0], a[1][1]

(array([9, 4]), array([5, 4]), array([4, 3]), array([9, 0]))

axis 2 refers to the elements in each of axis 1 items 

ie we gradually peel away brackets as we go deeper

In [32]:
a[0][0][0], a[0][0][1], a[0][1][0],  a[0][1][1] #etc

(9, 4, 5, 4)

Flatten along axis 0 - similar to when we flatten on axis 0 in 2D. This reduces our array to 2 dimensions

In [22]:
b = np.sum(a, axis=0) #-> [[9+4, 4+3],[5+9, 4+0]]

In [23]:
b, b.shape

(array([[13,  7],
        [14,  4]]),
 (2, 2))

In [25]:
np.sum(a, axis=1) #->[[9+5, 4+4],[4+9, 3+0]]

array([[14,  8],
       [13,  3]])

In [31]:
np.sum(a, axis=2) #-> [[9+4, 5+4], [4+3, 9+0]]

array([[13,  9],
       [ 7,  9]])

### argmax

Returns the indices of the maximum value of all elements in the input tensor.

This is the second value returned by torch.max()

In [2]:
a = torch.randn(4, 3)
a

tensor([[ 2.0149,  1.0420, -1.3816],
        [-1.0265, -0.5212, -0.7570],
        [-0.5141,  0.5674,  0.1039],
        [-0.1549, -0.3003, -0.1086]])

In [3]:
torch.argmax(a)

tensor(0)

In [4]:
b = torch.randn(4)
b

tensor([0.6022, 1.1465, 0.3250, 1.0555])

In [5]:
torch.argmax(b)

tensor(1)

### max

##### torch.max(input)

Returns the maximum value of all elements in the input tensor.

##### torch.max(input, dim, keepdim=False, out=None) 

Returns a namedtuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim. And indices is the index location of each maximum value found (argmax).

If keepdim is True, the output tensors are of the same size as input except in the dimension dim where they are of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensors having 1 fewer dimension than input.

In [6]:
torch.max(a)

tensor(2.0149)

dim 0 ->  max over all rows (i.e. for each column): 

In [7]:
torch.max(a, 0)

torch.return_types.max(
values=tensor([2.0149, 1.0420, 0.1039]),
indices=tensor([0, 0, 2]))

dim 1 -> max over all columns (i.e. for each row)

In [8]:
torch.max(a, 1)

torch.return_types.max(
values=tensor([ 2.0149, -0.5212,  0.5674, -0.1086]),
indices=tensor([0, 1, 1, 2]))

### view

Returns a new tensor with the same data as the self tensor but of a different shape.

The returned tensor shares the same data and must have the same number of elements, but may have a different size. For a tensor to be viewed, the new view size must be compatible with its original size and stride, i.e., each new view dimension must either be a subspace of an original dimension, or only span across original dimensions d,d+1,…,d+kd, d+1, \dots, d+kd,d+1,…,d+k that satisfy the following contiguity-like condition that ∀i=0,…,k−1\forall i = 0, \dots, k-1∀i=0,…,k−1 

stride[i]=stride[i+1]×size[i+1]\text{stride}[i] = \text{stride}[i+1] \times \text{size}[i+1]stride[i]=stride[i+1]×size[i+1]

Otherwise, contiguous() needs to be called before the tensor can be viewed. See also: reshape(), which returns a view if the shapes are compatible, and copies (equivalent to calling contiguous()) otherwise.



In [9]:
x = torch.randn(4, 3)
x.size()

torch.Size([4, 3])

In [10]:
y = x.view(12)
y.size()

torch.Size([12])

Without a -1 need to get dimension correct

In [11]:
#y = x.view(10)
#y.size()

In [12]:
z = x.view(-1, 2)  # the size -1 is inferred from other dimensions
z.size()

torch.Size([6, 2])

In [13]:
w = x.view(6, -1)  # the size -1 is inferred from other dimensions
w.size()

torch.Size([6, 2])

### sum

#### torch.sum(input, dtype=None)
    
Returns the sum of all elements in the input tensor.

#### torch.sum(input, dim, keepdim=False, dtype=None) 

Returns the sum of each row of the input tensor in the given dimension dim. If dim is a list of dimensions, reduce over all of them.

If keepdim is True, the output tensor is of the same size as input except in the dimension(s) dim where it is of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensor having 1 (or len(dim)) fewer dimension(s).



In [14]:
a = torch.randn(1, 3)
a

tensor([[ 1.7224, -1.3243,  0.3586]])

In [15]:
torch.sum(a)

tensor(0.7567)

In [16]:
a = torch.randn(4, 3)
a

tensor([[-1.1065,  0.5816,  1.1932],
        [ 0.3565,  1.9991,  0.2112],
        [ 0.9671, -0.3203, -1.0331],
        [-2.0222, -0.4018, -1.8219]])

In [17]:
#sum over all columns (i.e. for each row)
torch.sum(a, 1)

tensor([ 0.6683,  2.5669, -0.3863, -4.2459])

### mean

#### torch.mean(input)

Returns the mean value of all elements in the input tensor.

#### torch.mean(input, dim, keepdim=False, out=None) 

Returns the mean value of each row of the input tensor in the given dimension dim. If dim is a list of dimensions, reduce over all of them.

If keepdim is True, the output tensor is of the same size as input except in the dimension(s) dim where it is of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensor having 1 (or len(dim)) fewer dimension(s).

In [18]:
a = torch.randn(4, 3)
a

tensor([[ 1.1041, -0.4993,  1.8628],
        [-0.6035,  0.6425, -1.3106],
        [-0.6543, -0.4198, -0.4286],
        [ 2.0873,  0.4965, -0.7824]])

In [19]:
a.mean()

tensor(0.1245)

dim = 1 -> over all columns (for each row)

In [20]:
torch.mean(a, 1)

tensor([ 0.8225, -0.4239, -0.5009,  0.6004])

In [21]:
torch.mean(a, 1, True)

tensor([[ 0.8225],
        [-0.4239],
        [-0.5009],
        [ 0.6004]])

In [22]:
torch.mean(a, 0)

tensor([ 0.4834,  0.0549, -0.1647])

In [23]:
a = torch.randn(4, 3, 2)
a

tensor([[[-0.5509, -0.8295],
         [-0.1816,  0.8299],
         [-0.7890,  0.0698]],

        [[-0.3103, -1.1878],
         [-1.2422, -1.8429],
         [-0.8061, -0.2843]],

        [[ 0.3603, -1.9474],
         [-0.2442, -0.8164],
         [ 1.2880,  0.1848]],

        [[-0.2814, -1.2271],
         [ 0.2662,  0.3517],
         [ 0.0496,  0.0306]]])

In [24]:
b=torch.mean(a,0)
b

tensor([[-1.9556e-01, -1.2980e+00],
        [-3.5046e-01, -3.6942e-01],
        [-6.4377e-02,  2.2051e-04]])

In [25]:
b.shape

torch.Size([3, 2])

In [26]:
c=torch.mean(a,1)
c

tensor([[-0.5072,  0.0234],
        [-0.7862, -1.1050],
        [ 0.4680, -0.8597],
        [ 0.0115, -0.2816]])

In [27]:
c.shape

torch.Size([4, 2])

### Random

Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1 (also called the standard normal distribution).

torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

In [28]:
torch.randn(2, 3)

tensor([[-0.5951,  1.4342, -1.2732],
        [ 0.1727,  1.2753, -0.8301]])

### Flatten

Flattens a contiguous range of dims in a tensor.

torch.flatten(input, start_dim=0, end_dim=-1) → Tensor



        input (Tensor) – the input tensor.

        start_dim (python:int) – the first dim to flatten

        end_dim (python:int) – the last dim to flatten




In [29]:
t = torch.tensor([[[1, 2],
                    [3, 4]],
                    [[5, 6],
                    [7, 8]]])

In [30]:
t.shape

torch.Size([2, 2, 2])

In [31]:
torch.flatten(t)

tensor([1, 2, 3, 4, 5, 6, 7, 8])

We can use the start_dim to flatten a single dimension - hany when you have a dimension of 1

In [32]:
torch.flatten(t, start_dim=1)

tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])

In [33]:
a = torch.randn(4, 1, 3, 3)
a.shape

torch.Size([4, 1, 3, 3])

In [34]:
b = torch.flatten(a, start_dim=1, end_dim=2)
b.shape

torch.Size([4, 3, 3])

In [35]:
b

tensor([[[ 1.0668,  0.6964, -2.1182],
         [-1.0142,  0.5931, -1.3457],
         [ 0.7723, -0.5258,  1.3341]],

        [[-0.1119,  1.6734, -1.6325],
         [ 0.5137, -0.7176, -0.5566],
         [-0.5263, -0.3947,  1.7352]],

        [[-1.3183,  1.1556,  0.5092],
         [-1.2826, -0.4203, -1.0321],
         [-0.3116, -0.1535, -0.6810]],

        [[-0.8669,  0.4939,  1.1409],
         [ 0.2214,  0.0935, -0.2618],
         [ 0.4363, -0.9791,  1.2344]]])

In [36]:
a

tensor([[[[ 1.0668,  0.6964, -2.1182],
          [-1.0142,  0.5931, -1.3457],
          [ 0.7723, -0.5258,  1.3341]]],


        [[[-0.1119,  1.6734, -1.6325],
          [ 0.5137, -0.7176, -0.5566],
          [-0.5263, -0.3947,  1.7352]]],


        [[[-1.3183,  1.1556,  0.5092],
          [-1.2826, -0.4203, -1.0321],
          [-0.3116, -0.1535, -0.6810]]],


        [[[-0.8669,  0.4939,  1.1409],
          [ 0.2214,  0.0935, -0.2618],
          [ 0.4363, -0.9791,  1.2344]]]])

### Eye

Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.



        n (python:int) – the number of rows

        m (python:int, optional) – the number of columns with default being n

        out (Tensor, optional) – the output tensor.

        dtype (torch.dtype, optional) – the desired data type of returned tensor. Default: if None, uses a global default (see torch.set_default_tensor_type()).

        layout (torch.layout, optional) – the desired layout of returned Tensor. Default: torch.strided.

        device (torch.device, optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

        requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: False.



In [37]:
torch.eye(3)

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

### Range

Returns a 1-D tensor of size ⌈(end−start) / step⌉ with values from the interval (start, end) taken with common difference step beginning from start.

torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor


In [38]:
torch.arange(5)

tensor([0, 1, 2, 3, 4])

In [39]:
torch.arange(1, 2.5, 0.5)

tensor([1.0000, 1.5000, 2.0000])

### Einsum

after https://stackoverflow.com/questions/55894693/understanding-pytorch-einsum

In [40]:
vec=torch.tensor([0, 1, 2, 3])

aten=torch.tensor([[11, 12, 13, 14],
        [21, 22, 23, 24],
        [31, 32, 33, 34],
        [41, 42, 43, 44]])

bten=torch.tensor([[1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3],
        [4, 4, 4, 4]])

#### 1) Matrix multiplication

PyTorch: torch.matmul(aten, bten) ; aten.mm(bten)

NumPy : np.einsum("ij, jk -> ik", arr1, arr2) 

#### Dot Product

In [41]:
torch.einsum('ij, jk -> ik', aten, bten)

tensor([[130, 130, 130, 130],
        [230, 230, 230, 230],
        [330, 330, 330, 330],
        [430, 430, 430, 430]])

In [42]:
#### Elementwise multiplication

torch.einsum('ij, ij -> ij', aten, bten)

tensor([[ 11,  12,  13,  14],
        [ 42,  44,  46,  48],
        [ 93,  96,  99, 102],
        [164, 168, 172, 176]])

#### 2) Extract elements along the main-diagonal

PyTorch: torch.diag(aten)

NumPy : np.einsum("ii -> i", arr) 

In [43]:
torch.einsum('ii -> i', aten)

tensor([11, 22, 33, 44])

3) Hadamard product (i.e. element-wise product of two tensors)

PyTorch: aten * bten
    
NumPy : np.einsum("ij, ij -> ij", arr1, arr2) 

In [44]:
torch.einsum('ij, ij -> ij', aten, bten)

tensor([[ 11,  12,  13,  14],
        [ 42,  44,  46,  48],
        [ 93,  96,  99, 102],
        [164, 168, 172, 176]])

4) Element-wise squaring

PyTorch: aten ** 2
    
NumPy : np.einsum("ij, ij -> ij", arr, arr) 

In [45]:
torch.einsum('ij, ij -> ij', aten, aten)

tensor([[ 121,  144,  169,  196],
        [ 441,  484,  529,  576],
        [ 961, 1024, 1089, 1156],
        [1681, 1764, 1849, 1936]])

General: Element-wise nth power can be implemented by repeating the subscript string and tensor n times. For e.g., computing element-wise 4th power of a tensor can be done using:

In [46]:
torch.einsum('ij, ij, ij, ij -> ij', aten, aten, aten, aten)

tensor([[  14641,   20736,   28561,   38416],
        [ 194481,  234256,  279841,  331776],
        [ 923521, 1048576, 1185921, 1336336],
        [2825761, 3111696, 3418801, 3748096]])

5) Trace (i.e. sum of main-diagonal elements)

PyTorch: torch.trace(aten)
NumPy einsum: np.einsum("ii -> ", arr) 

In [47]:
torch.einsum('ii -> ', aten)

tensor(110)

6) Matrix transpose

PyTorch: torch.transpose(aten, 1, 0)
    
NumPy einsum: np.einsum("ij -> ji", arr) 

In [48]:
torch.einsum('ij -> ji', aten)

tensor([[11, 21, 31, 41],
        [12, 22, 32, 42],
        [13, 23, 33, 43],
        [14, 24, 34, 44]])

7) Outer Product (of vectors)

PyTorch: torch.ger(vec, vec)

NumPy einsum: np.einsum("i, j -> ij", vec, vec) 

In [49]:
torch.einsum('i, j -> ij', vec, vec)

tensor([[0, 0, 0, 0],
        [0, 1, 2, 3],
        [0, 2, 4, 6],
        [0, 3, 6, 9]])

8) Inner Product (of vectors) PyTorch: torch.dot(vec1, vec2)

NumPy einsum: np.einsum("i, i -> ", vec1, vec2) 

In [50]:
torch.einsum('i, i -> ', vec, vec)

tensor(14)

9) Sum along axis 0

PyTorch: torch.sum(aten, 0)
NumPy einsum: np.einsum("ij -> j", arr) 

In [51]:
torch.einsum('ij -> j', aten)

tensor([104, 108, 112, 116])

10) Sum along axis 1

PyTorch: torch.sum(aten, 1)
    
NumPy einsum: np.einsum("ij -> i", arr) 

In [52]:
torch.einsum('ij -> i', aten)

tensor([ 50,  90, 130, 170])

11) Batch Matrix Multiplication

PyTorch: torch.bmm(batch_tensor_1, batch_tensor_2)
    
NumPy : np.einsum("bij, bjk -> bik", batch_tensor_1, batch_tensor_2) 

In [53]:
batch_tensor_1 = torch.arange(2 * 4 * 3).reshape(2, 4, 3)
batch_tensor_2 = torch.arange(2 * 3 * 4).reshape(2, 3, 4) 

torch.bmm(batch_tensor_1, batch_tensor_2) 

tensor([[[  20,   23,   26,   29],
         [  56,   68,   80,   92],
         [  92,  113,  134,  155],
         [ 128,  158,  188,  218]],

        [[ 632,  671,  710,  749],
         [ 776,  824,  872,  920],
         [ 920,  977, 1034, 1091],
         [1064, 1130, 1196, 1262]]])

In [54]:
# sanity check with the shapes
torch.bmm(batch_tensor_1, batch_tensor_2).shape 

torch.Size([2, 4, 4])

In [55]:
# batch matrix multiply using einsum
torch.einsum("bij, bjk -> bik", batch_tensor_1, batch_tensor_2)

tensor([[[  20,   23,   26,   29],
         [  56,   68,   80,   92],
         [  92,  113,  134,  155],
         [ 128,  158,  188,  218]],

        [[ 632,  671,  710,  749],
         [ 776,  824,  872,  920],
         [ 920,  977, 1034, 1091],
         [1064, 1130, 1196, 1262]]])

In [56]:
# sanity check with the shapes
torch.einsum("bij, bjk -> bik", batch_tensor_1, batch_tensor_2).shape

torch.Size([2, 4, 4])

12) Sum along axis 2

PyTorch: torch.sum(batch_ten, 2)
    
NumPy einsum: np.einsum("ijk -> ij", arr3D) 

In [57]:
torch.einsum("ijk -> ij", batch_tensor_1)

tensor([[ 3, 12, 21, 30],
        [39, 48, 57, 66]])

13) Sum all the elements in an nD tensor

PyTorch: torch.sum(batch_ten)
    
NumPy einsum: np.einsum("ijk -> ", arr3D) 

In [58]:
torch.einsum("ijk -> ", batch_tensor_1)

tensor(276)

14) Sum over multiple axes (i.e. marginalization)

PyTorch: torch.sum(arr, dim=(dim0, dim1, dim2, dim3, dim4, dim6, dim7))
    
NumPy: np.einsum("ijklmnop -> n", nDarr) 

In [59]:
# 8D tensor
nDten = torch.randn((3,5,4,6,8,2,7,9))
nDten.shape


torch.Size([3, 5, 4, 6, 8, 2, 7, 9])

In [60]:
# marginalize out dimension 5 (i.e. "n" here)
esum = torch.einsum("ijklmnop -> n", nDten)
esum

tensor([-111.1110, -263.9169])

In [61]:
# marginalize out axis 5 (i.e. sum over rest of the axes)
tsum = torch.sum(nDten, dim=(0, 1, 2, 3, 4, 6, 7))

torch.allclose(tsum, esum)

False

15) Double Dot Products (same as: torch.sum(hadamard-product) cf. 3)

PyTorch: torch.sum(aten * bten)
    
NumPy : np.einsum("ij, ij -> ", arr1, arr2) 

In [62]:
torch.einsum("ij, ij -> ", aten, bten)

tensor(1300)

In [63]:
## Numpy Elipsis

In [64]:
from numpy import arange
a = arange(16).reshape(2,2,2,2)

In [65]:
a

array([[[[ 0,  1],
         [ 2,  3]],

        [[ 4,  5],
         [ 6,  7]]],


       [[[ 8,  9],
         [10, 11]],

        [[12, 13],
         [14, 15]]]])

In [66]:
a[..., 0].flatten()

array([ 0,  2,  4,  6,  8, 10, 12, 14])

Equivalent to:

In [67]:
a[:,:,:,0].flatten()

array([ 0,  2,  4,  6,  8, 10, 12, 14])

### Expand size of tensor along non singleton dimension

eg extend dim 2

In [111]:
cuda0 = torch.device('cuda:0')
a=torch.ones([2, 1, 2, 2]).to(cuda0)
a.shape

torch.Size([2, 1, 2, 2])

In [115]:
a

tensor([[[[1., 1.],
          [1., 1.]]],


        [[[1., 1.],
          [1., 1.]]]], device='cuda:0')

In [112]:
out = torch.cat([a, torch.zeros(2,1,1,2).to(cuda0)], 2)

In [113]:
out.shape

torch.Size([2, 1, 3, 2])

In [114]:
out

tensor([[[[1., 1.],
          [1., 1.],
          [0., 0.]]],


        [[[1., 1.],
          [1., 1.],
          [0., 0.]]]], device='cuda:0')

### Device

In [93]:
cuda0 = torch.device('cuda:0')
a = torch.randn((2,3), device=cuda0)

In [94]:
a.device

device(type='cuda', index=0)

In [95]:
b=torch.zeros(2,4).to(a.device)

In [96]:
b.device

device(type='cuda', index=0)

### Scatter

Writes all values from the tensor src into self at the indices specified in the index tensor. For each value in src, its output index is specified by its index in src for dimension != dim and by the corresponding value in index for dimension = dim.

For a 3-D tensor, self is updated as:
    
<pre>
self[index[i][j][k]][j][k] = src[i][j][k]  # if dim == 0
self[i][index[i][j][k]][k] = src[i][j][k]  # if dim == 1
self[i][j][index[i][j][k]] = src[i][j][k]  # if dim == 2
</pre>

In [9]:
x = torch.rand(2, 5)

In [10]:
x

tensor([[0.4183, 0.0121, 0.0719, 0.2705, 0.7525],
        [0.1310, 0.4384, 0.3306, 0.8629, 0.6674]])

In [11]:
y = torch.zeros(3, 5)
scatter_pattern = torch.tensor([[0, 1, 2, 0, 0], [2, 0, 0, 1, 2]])

In [12]:
y.scatter_(0, scatter_pattern, x)

tensor([[0.4183, 0.4384, 0.3306, 0.2705, 0.7525],
        [0.0000, 0.0121, 0.0000, 0.8629, 0.0000],
        [0.1310, 0.0000, 0.0719, 0.0000, 0.6674]])

The scatter says “send the elements of x to the following indices in torch.zeros, according to ROW-WISE (dim 0)”. 

i.e. for each element in the original x tensor, we specify a row index (0, 1 or 2) to send it to in the tensor we are scattering into (y).

## Permute


In [4]:
x = torch.randn(2, 3, 5)
x.size()

torch.Size([2, 3, 5])

In [5]:
x.permute(2, 0, 1).size()

torch.Size([5, 2, 3])