# Do you even broadcast?
PyTorch and Numpy have this amazing magic trick that allows you to make your code super fast called [broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html). I have used it many times before without even realising but I never took the time to properly learn the fundamentals of it. The purpose of this blog post is to serve as a basic introduction to the topic and for me to finally properly learn how broadcasting works.


## What is broadcasting?

Broadcasting defines what happens when you do arithmetic operations on two tensors or numpy arrays with different shapes. Generally when you do broadcasted operations those are done in efficient C-code meaning that they will be much faster than regular Python array operations. 

That's enough of general jargon, broadcasting is best understood through examples. From now on I will use PyTorch tensors for illustrating the different broadcasting operations, but the principles also work for Numpy arrays. 

In [6]:
import torch

Let's first create two dummy tensors that we use for experimenting

In [25]:
x = torch.tensor([1, 2, 3])
y = torch.tensor([2, 2, 2])

In [26]:
x.shape, y.shape

(torch.Size([3]), torch.Size([3]))

Generally it is intuitive for us what happens when two tensor with same dimensions are multiplied.

In [27]:
x * y

tensor([2, 4, 6])

Corresponding elements just multiply each other.


But what happens when the dimensions do not match? This is where broadcasting rules come to play

In [18]:
z = torch.tensor([[1, 2, 3], 
                  [4, 5, 6]])

In [20]:
z * x

tensor([[ 1,  4,  9],
        [ 4, 10, 18]])

Hmm, what happened here? Let's first try to deduce it from the result. 

It looks like that *x* multiplied each row of *z* elementwise.

This is what broadcasting is essentially, the smaller tensor somehow deduced how to apply itself over the larger tensor. 

Luckily there are simple rules that make this work.

In [21]:
z.shape, x.shape

(torch.Size([2, 3]), torch.Size([3]))

Let's now introduce the two rules that define what tensors can be broadcast together.

The following rules are from the PyTorch broadcasting [documentation](https://pytorch.org/docs/stable/notes/broadcasting.html). I suggest you check that out as well. 

Two tensors can be broadcast if:
  - Each tensor has at least one dimension.
  - When iterating the two tensors over the dimensions starting from the trailing dimension (right) the following rules must apply:
    - The dimensions are equal between the two tensors
    - One of the dimensions is 1
    - One of the dimensions is missing (PyTorch will make the axis 1)
    
Let's now go through examples on how this works

In [11]:
x = torch.ones(3,4,5)
y = torch.ones(3,4,5)*2

In [13]:
x.shape, y.shape

(torch.Size([3, 4, 5]), torch.Size([3, 4, 5]))

In [12]:
x * y 

tensor([[[2., 2., 2., 2., 2.],
         [2., 2., 2., 2., 2.],
         [2., 2., 2., 2., 2.],
         [2., 2., 2., 2., 2.]],

        [[2., 2., 2., 2., 2.],
         [2., 2., 2., 2., 2.],
         [2., 2., 2., 2., 2.],
         [2., 2., 2., 2., 2.]],

        [[2., 2., 2., 2., 2.],
         [2., 2., 2., 2., 2.],
         [2., 2., 2., 2., 2.],
         [2., 2., 2., 2., 2.]]])

The following operation works because all the dimensions match

In [14]:
x = torch.ones(3,4)
y = torch.ones(3,4,5)

What do you think will happen here if we were to try arithmetic operations on these two tensors?

Stop and think for a minute or two

In [15]:
x * y

RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 2

We get an error that indicates that the tensor sizes do not match.

Why was this? Remember the tensor broadcasting rules? You start comparing the dimensions from the trailing dimensions, so the rightmost dimension.

In [16]:
x.shape, y.shape

(torch.Size([3, 4]), torch.Size([3, 4, 5]))

So in this case we would compare 4 and 5 and realise that the numbers are not equal nor one of them is not one. 

This means that the broadcasting rules are violated and so PyTorch cannot operate with these two shapes

How can we try fixing this?

One thing that comes to mind is inserting one as the rightmost dimension of tensor x. That would make the dimensions match nicely. 

The question is how can we do this in PyTorch. After some Googling there are basically two ways:
  - x.unsquueze()
  - x[:, :, None]

In [20]:
x.unsqueeze(dim=-1).shape

torch.Size([3, 4, 1])

In [25]:
x[:, :, None].shape

torch.Size([3, 4, 1])