# 6. Vectorized operations

NumPy is optimized for scientific computing, you can perform common mathematical operations easily.

In [1]:
import numpy as np

In [2]:
a = np.arange(12).reshape(3, 4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [3]:
a * 2

array([[ 0,  2,  4,  6],
       [ 8, 10, 12, 14],
       [16, 18, 20, 22]])

We have already seen that we can use broadcasting to perform operations between arrays with compatible shapes.

In [4]:
a + np.array([[1], [2], [3]])

array([[ 1,  2,  3,  4],
       [ 6,  7,  8,  9],
       [11, 12, 13, 14]])

NumPy also implements mathematical functions.
The complete list is [here](https://numpy.org/doc/stable/reference/routines.math.html).

In [5]:
np.exp(a)

array([[1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01],
       [5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03],
       [2.98095799e+03, 8.10308393e+03, 2.20264658e+04, 5.98741417e+04]])

## Aggregation
Some functions aggregate several elements: `sum`, `mean`, `max`, `std`...

By default, NumPy will aggregate on the whole array, but you can provide an axis.

In [6]:
# sum of all elements of the array
np.sum(a)

66

In [7]:
# sum per column
np.sum(a, axis=0)

array([12, 15, 18, 21])

In [8]:
# max per row
np.max(a, axis=1)

array([ 3,  7, 11])

### Exercise
Load the array found in `data/simple_array.csv`, and extract for each column, the minimum and maximum values, the average and the standard deviation.

In [9]:
# uncomment and execute the following line if you want to load the solution
# %load ../solutions/exercise10.py

## Gotchas
When manipulating NumPy arrays, you might use Python lists as index.
Python lists do not support mathematical operations.

`+` is used to concatenate two Python lists, and `* n` will repeat a list `n` times.

In [10]:
np.array([1, 2, 3]) + np.array([4, 5, 6])

array([5, 7, 9])

In [11]:
[1, 2, 3] + [4, 5, 6]

[1, 2, 3, 4, 5, 6]

In [12]:
np.ones(3) * 4

array([4., 4., 4.])

In [13]:
[1, 1, 1] * 4

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

## Performance
NumPy arrays are optimised to perform vectorised operations: operations that
are done on all elements of an array in one go, without using a `for` loop.

We are going to use the Jupyter `%%timeit` magic to get an estimate of how much
time an operation is taking. Note: in order to get a correct estimate, `timeit`
will run the operation several times. Consequently, even if executing the cell
takes a few seconds, what we're doing might be much quicker.

In [14]:
a = np.random.rand(10000)

In [15]:
%%timeit
a * a

1.62 μs ± 19.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [16]:
import random
l = [random.random() for _ in range(10000)]

In [17]:
%%timeit
[e * e for e in l]

176 μs ± 365 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


The exact values depend on your laptop, but there are usually 3 orders of magnitude betwen NumPy and pure Python, for an operation as simple as squaring the values.

Using a `for` loop instead of a list comprehension is even slower.

In [18]:
%%timeit
result = []
for e in l:
    result.append(e * e)
result

206 μs ± 352 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Because of this, you should always try to use vectorized operations when
possible. The rule of thumb is: if you're using a `for` loop to iterate over
all of your data, you may not be using NumPy to its full potential.
It may be acceptable if you're in a hurry and can't find a better solution, but
it's usually worth the effort to vectorize your operations.

Note: there are advanced libraries you can use to transform your Python code
into vectorized operations, such as Numba. We won't cover them in this workshop.

### Exercise

Perform a linear transformation on a series of values:
`y1 = a1 * x11 + a2 * x12 + ... + b1`;
`y2 = a1 * x21 + a2 * x22 + ... + b2`;
...

You can first write the operation as it seems more natural to you, then try to vectorize it.
Time the two variants.

In [19]:
X = np.random.randint(0, 100, (3, 4))
a = np.array([4, 2, 8, 5])
b = np.array([245, 398, 546])

In [20]:
# uncomment and execute the following line if you want to load the solution
# %load ../solutions/exercise11.py