# Vectorization and Array Operations with NumPy

### Written for the CBC Workshop (May 2024)

#### John Stachurski

This notebook contains a very quick introduction to NumPy.

We use the following imports

In [1]:
import numpy as np
import matplotlib.pyplot as plt

## NumPy arrays

Let's review the basics of NumPy arrays.

### Creating arrays

Here are a few ways to create arrays:

In [2]:
a = np.array((10.0, 20.0))
a

array([10., 20.])

In [3]:
a = np.array((10, 20), dtype='float64')
a

array([10., 20.])

In [4]:
a = np.linspace(0, 10, 5)
a

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [5]:
a = np.ones(3)
a

array([1., 1., 1.])

In [6]:
a = np.zeros(3)

In [7]:
a = np.random.randn(4)
a

array([ 2.64176219, -0.62023134, -1.41202575,  0.96664629])

In [8]:
a = np.random.randn(2, 2)
a

array([[-1.27709828,  1.72774515],
       [-2.1484121 , -0.28084423]])

In [9]:
b = np.zeros_like(a)
b

array([[0., 0.],
       [0., 0.]])

### Reshaping

In [10]:
a = np.random.randn(2, 2)
a

array([[-2.39614363,  1.55054188],
       [-1.12911018, -0.67755033]])

In [11]:
a.shape

(2, 2)

In [12]:
np.reshape(a, (1, 4))

array([[-2.39614363,  1.55054188, -1.12911018, -0.67755033]])

In [13]:
np.reshape(a, (4, 1))

array([[-2.39614363],
       [ 1.55054188],
       [-1.12911018],
       [-0.67755033]])

### Array operations

Standard arithmetic operators are pointwise:

In [14]:
a + b

array([[-2.39614363,  1.55054188],
       [-1.12911018, -0.67755033]])

In [15]:
a * b  # pointwise multiplication

array([[-0.,  0.],
       [-0., -0.]])

To do matrix multiplication we use `@`, as in

In [16]:
a @ b

array([[0., 0.],
       [0., 0.]])

There are various functions for acting on arrays, such as

In [17]:
np.mean(a)

-0.663065565795695

These operations have an equivalent OOP syntax, as in

In [18]:
a.mean()

-0.663065565795695

### Broadcasting

When possible, arrays are "streched" across missing dimensions to perform array operations.

For example,

In [19]:
a = np.zeros((3, 3))
a

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [20]:
b = np.array((1.0, 2.0, 3.0))
b = np.reshape(b, (1, 3))
b

array([[1., 2., 3.]])

In [21]:
a + b

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [22]:
b = np.reshape(b, (3, 1))
b

array([[1.],
       [2.],
       [3.]])

In [23]:
a + b

array([[1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.]])

For more on broadcasting see [this tutorial](https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html).

### Ufuncs

Many NumPy functions can act on either scalars or arrays.

When they act on arrays, they act pointwise.

These kinds of functions are called `universal functions` or `ufuncs`.

In [24]:
np.cos(1.0)

0.5403023058681398

In [25]:
np.cos(np.pi)

-1.0

In [26]:
a = np.random.randn(3, 3)

In [27]:
np.cos(a)

array([[0.97808486, 0.99387898, 0.9895217 ],
       [0.17583142, 0.30397161, 0.9453401 ],
       [0.99903351, 0.94482136, 0.91977907]])

Some user-defined functions will be ufuncs, such as

In [28]:
def f(x):
    return np.cos(np.sin(x))

In [29]:
f(a)

array([[0.97840318, 0.99390392, 0.9895947 ],
       [0.55334608, 0.57950016, 0.94730339],
       [0.99903414, 0.94682182, 0.92397996]])

But some are not:

In [30]:
def f(x):
    if x < 0:
        return np.cos(x)
    else:
        return np.sin(x)

In [31]:
f(a)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

If we want to turn this into a vectorized function we can use `np.vectorize`

In [32]:
f_vec = np.vectorize(f)

Let's test it, and also time it.

In [33]:
a = np.linspace(0, 1, 10_000_000)
%time f_vec(a)

CPU times: user 4.98 s, sys: 143 ms, total: 5.12 s
Wall time: 5.12 s


array([0.00000000e+00, 1.00000010e-07, 2.00000020e-07, ...,
       8.41470877e-01, 8.41470931e-01, 8.41470985e-01])

This is pretty slow.

Here's a version of `f` that uses NumPy functions to create a more efficient ufunc.

In [34]:
def f(x):
    return np.where(x < 0, np.cos(x), np.sin(x))

In [35]:
%time f(a)

CPU times: user 118 ms, sys: 20 ms, total: 138 ms
Wall time: 138 ms


array([0.00000000e+00, 1.00000010e-07, 2.00000020e-07, ...,
       8.41470877e-01, 8.41470931e-01, 8.41470985e-01])

### Mutability

NumPy arrays are mutable (can be altered in memory by any name bound to them).

In [36]:
a = np.array((10.0, 20.0))
a

array([10., 20.])

In [37]:
a[0] = 1

In [38]:
a

array([ 1., 20.])

In [39]:
a[:] = 42

In [40]:
a

array([42., 42.])

Note that any name bound to an array can be used to mutate it.

In [41]:
a

array([42., 42.])

In [42]:
b = a  # bind the name b to the same array object

In [43]:
id(a)

140414831797008

In [44]:
id(b)

140414831797008

In [45]:
b[0] = 1_000

In [46]:
b

array([1000.,   42.])

In [47]:
a

array([1000.,   42.])

## Vectorizing loops

### Accelerating slow loops

In scripting languages, native loops are slow:

In [48]:
n = 10_000_000
x_vec = np.linspace(0.1, 1.1, n)

Let's say we want to compute the sum of of $\cos(2\pi / x)$ over $x$ in

In [49]:
%%time
current_sum = 0.0
for x in x_vec:
    current_sum += np.cos(2 * np.pi / x)

CPU times: user 6.51 s, sys: 0 ns, total: 6.51 s
Wall time: 6.51 s


The reason is that Python, like most high level languages is dynamically typed.

This means that the type of a variable can freely change.

Moreover, the interpreter doesn't compile the whole program at once, so it doesn't know when types will change.

So the interpreter has to check the type of variables before any operation like addition, comparison, etc.

Hence there's a lot of fixed cost for each such operation

The code runs much faster if we use **vectorized** expressions to avoid explicit loops.

In [50]:
%%time
np.sum(np.cos(2 * np.pi / x_vec))

CPU times: user 90.7 ms, sys: 15.8 ms, total: 106 ms
Wall time: 105 ms


1352487.12437786

Now high level overheads are paid per *array* rather than per float or integer.

### Implict Multithreading


Recent versions of Anaconda are compiled with Intel MKL support, which accelerates NumPy operations.

Watch system resources when you run this code.  

(For example, install `htop` (Linux / Mac), `perfmon` (Windows) or another system load monitor and set it running in another window.)

In [51]:
n = 20
m = 1000
for i in range(n):
    X = np.random.randn(m, m)
    λ = np.linalg.eigvals(X)

You should see all your cores light up.  With MKL, many matrix operations are automatically parallelized.