# Explore numba for series
Anne Katrine Falk, 26FEB2021

[numba](https://numba.pydata.org/) is a python package, which compiles python code into machine code.

This notebook contains exploratory test for investigating ways to make calculations on series benefit from numba

In [None]:
import numba
from numba import jit
import numpy as np
import time
print(f'numba version: {numba.__version__}')
print(f'numpy version: {np.__version__}') 

In [None]:
# Verify if CUDA toolkit is found and print the version
!nvcc -V

# numpy functions

There is a list of supported numpy functions in the [documentation](https://numba.readthedocs.io/en/stable/reference/numpysupported.html#calculation)

Below a couple are investigated

## np.mean

In [None]:
# create random array
x = np.random.rand(1000)

In [None]:
@jit(nopython=True)
def numba_mean(x):
    return x.mean()

Time the speed when using np.mean directly.

In [None]:
%%timeit
x.mean()

Time the speed when using the numba-wrapped np.mean

In [None]:
%%timeit
numba_mean(x)

If this is the first time mean_numba is called, you maybe get a mesage saying that the slowest run took XX times longer than the fastest. This is because the numba translation to machine code takes some time, and this happens the first time mean_numba is called. After the first call of mean_numba, the machine code is cached, so subsequent calls of mean_numba don't have to do the translation.

Now, time mean_numba AFTER the initial translation to machine code and see the real speed. On my PC, this is a little more than three times faster

In [None]:
%%timeit
numba_mean(x)

## np.argmin

In [None]:
# create random array
x = np.random.rand(1000000)

In [None]:
%%timeit -n 10 -r 5
x.argmin()

In [None]:
@jit(nopython=True)
def numba_argmin(x):
    return x.argmin()

In [None]:
%%timeit -n 10 -r 5
numba_argmin(x)

What is going on? np.argmin is SLOWER when run with numba???

# np.diff

In [None]:
# create random array
x = np.random.rand(10000000) #10 million

In [None]:
%%timeit
np.diff(x)

In [None]:
@jit(nopython=True)
def numba_diff(x):
    return np.diff(x)

In [None]:
%%timeit
numba_diff(x)

Also SLOWER??? (A factor 2 slower on my PC)

# Loops

In [None]:
def sum_all_up_to(a):
    x = 0
    for i in range(a):
        x = x + i

In [None]:
@jit(nopython=True)
def numba_sum_all_up_to(a):
    x = 0
    for i in range(a):
        x = x + i

In [None]:
size = int(1e6)

In [None]:
%%timeit -r 5 -n 100
sum_all_up_to(size)

In [None]:
%%timeit -r 5 -n 100
numba_sum_all_up_to(size)

Here we see a speedup of orders of magnitude (200-400 times on my PC)

# Mixing loops with numpy functions

## Compare run time of numba loop to ordinary loop

In [None]:
@jit(nopython=True)
def numba_loop_over_diff(x, times):
    """ Executes np.diff on x times times - just to include a loop"""
    for i in range(times):
        np.diff(x)

In [None]:
def loop_over_diff(x, times):
    """Only loop, no numba"""
    for i in range(times):
        np.diff(x)

In [None]:
# create random array
x = np.random.rand(1000000)

In [None]:
%%timeit
numba_loop_over_diff(x, 100)

In [None]:
%%timeit
loop_over_diff(x, 100)

numba is SLOWER???

## All functions called by a @jit decorated function must me @jit decorated themselves

This won't work, because diff_wrapper_1 is not decorated by @jit

In [None]:
def diff_wrapper_1(x):
    return np.diff(x)

In [None]:
@jit(nopython=True)
def numba_loop_over_diff_1(x, times):
    for i in range(times):
        diff_wrapper_1(x)

In [None]:
numba_loop_over_diff_1(x, 100)