<a href="https://colab.research.google.com/github/axel-sirota/operations-arrays-numpy/blob/main/module3/OperationsNumpy_Mod3Demo4_Broadcasting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Broadcasting on Numpy

## What is Broadcasting?

Let's see how operations work with numpy array, both when dimensions are equal as well as when they are not

In [3]:
import numpy as np

In [11]:
arr = np.arange(6)
scalar = np.array([2.0]*6)

In [12]:
arr * scalar

array([ 0.,  2.,  4.,  6.,  8., 10.])

This means that if the shapes are the same, the operation is element-wise. We used this already before in Boolean indexing.

However to get the same vector of 2 we can do:

In [13]:
scalar2 = np.zeros(6) + 2
scalar2

array([2., 2., 2., 2., 2., 2.])

In [14]:
arr * scalar2

array([ 0.,  2.,  4.,  6.,  8., 10.])

What happened? The array of zeroes got an addition by a number (2) so it got *stretched* internally to match the dimension and be able to perform the operation. This is known as **Broadcasting**

In [15]:
# It allows for the following:
arr*2.0

array([ 0.,  2.,  4.,  6.,  8., 10.])

Why do we do it? Is it only syntactic sugar?

In [16]:
# Code from SciPy and NumPy by Eli Bressert

# Create an array with 10^7 elements.
arr = np.arange(1e7)

# Converting ndarray to list
larr = arr.tolist()

# Lists cannot by default broadcast,
# so a function is coded to emulate
# what an ndarray can do.

def list_times(alist, scalar):
    for i, val in enumerate(alist): 
        alist[i] = val * scalar
    return alist

In [17]:
%timeit arr * 1.1

26.5 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [18]:
%timeit list_times(larr, 1.1)

1.19 s ± 372 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


We can easily see that broadcastin makes the whole operation more efficient and ~25 times faster! Why? Because broadcasting enables the whole operation to be done in C, and not in pure Python

## Broascasting rules

The main rules of broadcasting are the following:

1.   A dimension is 1
2.   The dimenions match on that axis



So, for example is A is 1x7 and B is 7x7 the rules say they should be broadcastable, let's see

In [19]:
a = np.arange(7).reshape(1,7)
b = np.arange(49).reshape(7,7)

In [20]:
a*b

array([[  0,   1,   4,   9,  16,  25,  36],
       [  0,   8,  18,  30,  44,  60,  78],
       [  0,  15,  32,  51,  72,  95, 120],
       [  0,  22,  46,  72, 100, 130, 162],
       [  0,  29,  60,  93, 128, 165, 204],
       [  0,  36,  74, 114, 156, 200, 246],
       [  0,  43,  88, 135, 184, 235, 288]])

In [21]:
(a*b).shape

(7, 7)

It was! What happened was that the array `[0,1,2,3,4,5,6]` got repeated at every row to end up with the correspondin 7x7 array

Similarly, if the dimenions don't match or there isn't a 1, it won't work: we get an exception that the broadcasting failed

In [22]:
a = np.arange(10).reshape(2,5)
b = np.arange(25).reshape(5,5)

In [23]:
a*b

ValueError: ignored

Now, there are caveats to the rule, let's see the following example:

In [32]:
a = np.array([1,2,3,4])
b = np.arange(20).reshape(4,5)


In [33]:
a+b

ValueError: ignored

What happened was that by default the broadcasting is first done in the rows (axis=0). This means it tried to put a  ( an array of shape 4) and tried to add rows of size 4. As `b` had 5 columns instead, the broadcasting failed.

**Lesson: Beware of 1d arrays, just use reshape and it will just work**

## Dimension expansion

As a final example, let's talk about dimension expansion

In [34]:
a = np.arange(42).reshape(7,1,6,1)
b = np.arange(1680).reshape(8,7,6,1,5)

**Will they broadcast? What will be the final shape??**

In [38]:
(a+b).shape

(8, 7, 6, 6, 5)

Yes!!! Because broadcasting goes from right to left:

1.   There is a 1 on `a` on the final axis. Then we broadcast that 5 times to match `b`
2.   There is a 1 on `b` on the previous axis, we broadcast to match 6 on `a`
3.   The third case the dimensions match, nothing to do
4.   Broadcast 1 into 7 on `a`
5.   No dimension on `a` copy *all* of `a` 8 times to match `b`

That explains the dimensions. I leave to you as homework to check the values!

