The tensor object holds one pointer to the start of that contiguous block. The elements themselves are accessed via simple pointer arithmetic based on shape/stride, not through additional pointers per element.

In [1]:
import torch

points = torch.zeros(4)
print(points, points.shape)

points = points[None] # Adds a dimension of size 1, just like unsqueeze
print(points, points.shape)

points = torch.zeros(4)
points = torch.unsqueeze(points, dim=0)
print(points, points.shape)

tensor([0., 0., 0., 0.]) torch.Size([4])
tensor([[0., 0., 0., 0.]]) torch.Size([1, 4])
tensor([[0., 0., 0., 0.]]) torch.Size([1, 4])


``torch.unsqueeze`` is a **non-mutating operation**: An operation that returns a new object rather than modifying the original one. If I wanted it to be an in-place operation, I'd have to do:

In [2]:
points = torch.zeros(4)
points.unsqueeze_(dim=0)
print(points, points.shape)

tensor([[0., 0., 0., 0.]]) torch.Size([1, 4])


## Named tensors

We need to remember the ordering of the dimensions of a tensor to perform indexing properly, and keeping track of that as tensors are operated on can be error-prone.

In [None]:
img_t = torch.rand(3, 5, 5) # random RGB image, shape (channels, rows, columns)
weights = torch.tensor([0.2126, 0.7152, 0.0722]) # weights for each channel, shape (channels,)
print(img_t)

tensor([[[0.5323, 0.3062, 0.4296, 0.5832, 0.7785],
         [0.9456, 0.3301, 0.3890, 0.3865, 0.4743],
         [0.1099, 0.7306, 0.3771, 0.8148, 0.8417],
         [0.0143, 0.9922, 0.7060, 0.4197, 0.2523],
         [0.0888, 0.8428, 0.1962, 0.9237, 0.9444]],

        [[0.6308, 0.7288, 0.0802, 0.0920, 0.4818],
         [0.0716, 0.6737, 0.4062, 0.0832, 0.7225],
         [0.4502, 0.6662, 0.9211, 0.9130, 0.7539],
         [0.6602, 0.0678, 0.4497, 0.5164, 0.6099],
         [0.2338, 0.6252, 0.4737, 0.9767, 0.3831]],

        [[0.3814, 0.1284, 0.5853, 0.1698, 0.0966],
         [0.2337, 0.1087, 0.2341, 0.1857, 0.3446],
         [0.1534, 0.1133, 0.6370, 0.0355, 0.6777],
         [0.4399, 0.0371, 0.8363, 0.3337, 0.5920],
         [0.6145, 0.9361, 0.2971, 0.8338, 0.8945]]])


In [4]:
img_gray_naive = img_t.mean(-3)
print(img_gray_naive)

tensor([[0.5148, 0.3878, 0.3650, 0.2817, 0.4523],
        [0.4169, 0.3708, 0.3431, 0.2185, 0.5138],
        [0.2378, 0.5034, 0.6451, 0.5878, 0.7578],
        [0.3714, 0.3657, 0.6640, 0.4233, 0.4847],
        [0.3124, 0.8014, 0.3223, 0.9114, 0.7407]])


"PyTorch will allow us to multiply things that are the same shape, as well as shapes where one operand is of size 1 in a given dimension". This means:

PyTorch can automatically "stretch" tensors along dimensions where one of the tensors has size 1, so that their shapes match for elementwise operations like multiplication. This is called **broadcasting**.

In [5]:
a = torch.tensor([[1, 2], [3, 4]])      # shape: (2, 2)
b = torch.tensor([[10], [20]])          # shape: (2, 1)

# PyTorch will broadcast b to match a. Broadcasting b to match a:
# [[10, 10],
#  [20, 20]]

a*b

tensor([[10, 20],
        [60, 80]])

Broadcasting also appends leading dimensions of size 1 automatically. Leading dimensions are the ones on the left of a shape tuple. PyTorch (like NumPy) aligns shapes from the right. If two tensors have different number of dimensions, it automatically adds 1s on the left (i.e., leading 1s) to the smaller shape so broadcasting rules can apply.

In [22]:
print(img_t.shape, weights.shape)

unsqueezzed_weights = weights.unsqueeze(-1).unsqueeze(-1) # unsqueezing because we want to maintain the leading dimension
print(unsqueezzed_weights.shape)

weighted_img = img_t*unsqueezzed_weights
print(weighted_img.shape)

img_gray_weighted = weighted_img.sum(-3)
print(img_gray_weighted)

torch.Size([3, 5, 5]) torch.Size([3])
torch.Size([3, 1, 1])
torch.Size([3, 5, 5])
tensor([[0.5919, 0.5956, 0.1909, 0.2020, 0.5171],
        [0.2691, 0.5598, 0.3901, 0.1551, 0.6425],
        [0.3564, 0.6400, 0.7849, 0.8288, 0.7671],
        [0.5069, 0.2621, 0.5321, 0.4827, 0.5325],
        [0.2305, 0.6939, 0.4019, 0.9551, 0.5393]])


This get messy very quickly, so PyTorch provides ``einsum`` to perform such operations in a more compact manner. ``torch.einsum('notation', tensors...)`` uses Einstein summation notation to describe how to combine the dimensions of multiple tensors using summation, multiplication, or both.

In ```torch.einsum('input1_labels,input2_labels->output_labels', tensor1, tensor2)```:
* The first part of the string (before the `->`) contains labels that correspond positionally to the dimensions of each input tensor, separated by commas.
* The second part (after the `->`) tells how the result should look.
* Labels with the same letter mean the corresponding dimensions will be multiplied and summed over.

In [32]:
# in the notation (first arg.) - c: channels (img_t[0]), h: height (img_t[1]), w: width (img_t[2]), c: weights (weights[0]) | the output will drop the c dimension, by summing over it. 
img_gray_weighted_einsum = torch.einsum('chw,c->hw', img_t, weights)
img_gray_weighted_einsum

tensor([[0.5919, 0.5956, 0.1909, 0.2020, 0.5171],
        [0.2691, 0.5598, 0.3901, 0.1551, 0.6425],
        [0.3564, 0.6400, 0.7849, 0.8288, 0.7671],
        [0.5069, 0.2621, 0.5321, 0.4827, 0.5325],
        [0.2305, 0.6939, 0.4019, 0.9551, 0.5393]])

So, in ``torch.einsum('chw,c->hw', img_t, weights)`` we are saying:
* `chw` applies to `img_t` → shape: `(3, 5, 5)`
* `c` applies to `weights` → shape: `(3,)`
* `->hw` says: output should have dimensions `h` and `w`, meaning **sum over** `c`.

More on `einsum` can be found [here](https://rockt.ai/2018/04/30/einsum).

There's a lot of bookkeeping involved in this, being error-prone, specially if the locations where tensors are created and manipulated are far apart in the code. To address this, PyTorch 1.3 added **named tensors** as an experimental feature, by giving factory functions such as `tensor` and `rand` a `names` argument:

In [33]:
weights_named = torch.tensor([0.2126, 0.7152, 0.0722], names=['channels'])
weights_named

  weights_named = torch.tensor([0.2126, 0.7152, 0.0722], names=['channels'])


tensor([0.2126, 0.7152, 0.0722], names=('channels',))

In [35]:
img_named = img_t.refine_names(..., 'channels', 'rows', 'columns')
weights_named_aligned = weights_named.align_as(img_named)
print(img_named)
print(weights_named_aligned)

tensor([[[0.5323, 0.3062, 0.4296, 0.5832, 0.7785],
         [0.9456, 0.3301, 0.3890, 0.3865, 0.4743],
         [0.1099, 0.7306, 0.3771, 0.8148, 0.8417],
         [0.0143, 0.9922, 0.7060, 0.4197, 0.2523],
         [0.0888, 0.8428, 0.1962, 0.9237, 0.9444]],

        [[0.6308, 0.7288, 0.0802, 0.0920, 0.4818],
         [0.0716, 0.6737, 0.4062, 0.0832, 0.7225],
         [0.4502, 0.6662, 0.9211, 0.9130, 0.7539],
         [0.6602, 0.0678, 0.4497, 0.5164, 0.6099],
         [0.2338, 0.6252, 0.4737, 0.9767, 0.3831]],

        [[0.3814, 0.1284, 0.5853, 0.1698, 0.0966],
         [0.2337, 0.1087, 0.2341, 0.1857, 0.3446],
         [0.1534, 0.1133, 0.6370, 0.0355, 0.6777],
         [0.4399, 0.0371, 0.8363, 0.3337, 0.5920],
         [0.6145, 0.9361, 0.2971, 0.8338, 0.8945]]],
       names=('channels', 'rows', 'columns'))
tensor([[[0.2126]],

        [[0.7152]],

        [[0.0722]]], names=('channels', 'rows', 'columns'))


If we try to combine dimensions with different names, we get an error:

In [37]:
gray_named = (img_named[2]*weights_named).sum('channels')

RuntimeError: Error when attempting to broadcast dims ['rows', 'columns'] and dims ['channels']: dim 'columns' and dim 'channels' are at the same position from the right but do not match.

Outside functions that operate on named tensors we need to drop the names by renaming them to `None`.