# Accelerating numpy with NumExpr

If the bottleneck in your code are operations on large arrays NumExpr may be a straightforward way to accelerate them.

We begin with some imports...

In [105]:
from pathlib import Path

import pandas as pd
import numpy as np
import numexpr as ne
import dask.array as da

from IPython.display import Markdown, display

Create a print function that lets us use markdown formatting

In [106]:
def printmd(string):
    display(Markdown(string))

# Long array computation

This time we look at an array computation with many components. This is a modified version of a real calculation for the speed of sound in water set out here: http://resource.npl.co.uk/acoustics/techguides/soundseawater/underlying-phys.html#up_mackenzie 

We start by defining a single constant and creating random temperature and pressure arrays

In [114]:
A00 = 1.0
np.random.seed(3)
def generateArray(z:np.ndarray,N=10):
    T = np.random.standard_normal((len(z),N))
    return T

Now we create a function for each of the Numpy and NumExpr versions

In [115]:
def getATPNumpy(T,P):
    A00 = 1
    A_t_p = ((A00 + (A00*T) + (A00*(T**2)) + (A00*(T**3)) + (A00*(T**4)))    +
             (A00 + (A00*T) + (A00*(T**2)) + (A00*(T**3)) + (A00*(T**4)))*P + 
             (A00 + (A00*T) + (A00*(T**2)) + (A00*(T**3)))*(P**2) + 
             (A00 + (A00*T) + (A00*(T**2)))*(P**3))
    return A_t_p

def getATPNumExpr(T,P):
    A00 = 1
    A_t_p = ne.evaluate("((A00 + (A00*T) + (A00*(T**2)) + (A00*(T**3)) + (A00*(T**4))) +(A00 + (A00*T) + (A00*(T**2)) + (A00*(T**3)) + (A00*(T**4)))*P + (A00 + (A00*T) + (A00*(T**2)) + (A00*(T**3)))*(P**2) +(A00 + (A00*T) + (A00*(T**2)))*(P**3))")
    return A_t_p

In [116]:
T = generateArray(z=z,N=1000)

We test to see if the output arrays are equal

In [117]:
np.testing.assert_array_equal(getATPNumExpr(T=T,P=-z[:,np.newaxis]),getATPNumpy(T=T,P=-z[:,np.newaxis]))

AssertionError: 
Arrays are not equal

Mismatched elements: 343 / 199000 (0.172%)
Max absolute difference: 2.38418579e-07
Max relative difference: 3.91785661e-16
 x: array([[3.548403e+02, 6.630850e+01, 4.424440e+01, ..., 5.934262e+01,
        2.727742e+02, 2.860142e+01],
       [2.682958e+02, 9.579535e+02, 8.703086e+02, ..., 1.998691e+02,...
 y: array([[3.548403e+02, 6.630850e+01, 4.424440e+01, ..., 5.934262e+01,
        2.727742e+02, 2.860142e+01],
       [2.682958e+02, 9.579535e+02, 8.703086e+02, ..., 1.998691e+02,...

No - it turns out that we have differences up to 6e-8. You have to decide if this is important for your use case.

Now we compare the timings

In [124]:
T = generateArray(z=z,N=100000)

In [125]:
printmd("**Numpy version**")
%timeit -n 1 -r 1 getATPNumpy(T=T,P=-z[:,np.newaxis])
printmd("**NumExpr version**")
%timeit -n 1 -r 1 getATPNumExpr(T=T,P=-z[:,np.newaxis])

**Numpy version**

4.64 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


**NumExpr version**

108 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


## Dask comparison
We can also compare these results with using a dask array

In [120]:
daskT = da.from_array(T)
daskT
# np.testing.assert_array_equal(getATPNumpy(T=T,P=-z[:,np.newaxis]),getATPNumpy(T=daskT,P=-z[:,np.newaxis]).compute())

Unnamed: 0,Array,Chunk
Bytes,15.18 MiB,15.18 MiB
Shape,"(199, 10000)","(199, 10000)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 15.18 MiB 15.18 MiB Shape (199, 10000) (199, 10000) Count 1 Tasks 1 Chunks Type float64 numpy.ndarray",10000  199,

Unnamed: 0,Array,Chunk
Bytes,15.18 MiB,15.18 MiB
Shape,"(199, 10000)","(199, 10000)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray


In [121]:
%timeit -n 1 -r 1 getATPNumpy(T=dT,P=-z[:,np.newaxis]).compute()

360 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


# Activity
1. Vary the array size by a few orders of magnitude to see how it affects the relative performance
2. does it make a difference to relative performacne if you use 32-bit floats instead of 64-bit floats?