## Beyond numpy
Some calculation cannot be efficienclty performed with numpy
* numpy need a lot of memory
* Operation not implemented

Example : 
* Calculation of $\pi$ (With a very very very slow formula!!!)
$$ \frac\pi4 = \sum_i \frac{(-1)^i}{2i+1} = 1 - \frac13 + \frac 15 - \frac17 + \ldots $$

* Operation similar to cumsum
$$ y_n = f(y_{n-1}, x_n) $$

In [1]:
# Implementation in pure python
# numpy
import numpy as np
N = 100000
k = np.arange(N)

#4*np.sum((-1)**k/(2*k +1))

def calculate_pi_numpy(N):
    k = np.arange(N)
    sign = 1-2*(k%2)
    return 4*np.sum(sign/(2*k +1))


%timeit calculate_pi_numpy(1000000)

20.7 ms ± 288 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [2]:
def calculate_pi_python(N):
    res = 0
    sign = 1
    for i in range(N):
        res += sign/(2*i+1)
        sign = -sign        
    return 4*res

%timeit calculate_pi_python(1000000)

115 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## ctypes
* Interface between python and shared library (dll, so)
* Accelerate your code (this method is not recommended)
* Use existing code !!!
* Use closed source library

No magic : you have to know C and deal with pointer, memory allocation, ...

In [13]:
import ctypes
lib = ctypes.cdll.LoadLibrary('./libpi.so')

# Raw function
_calc_pi = lib.calc_pi

# Wrapper to be python friendly
def calc_pi_ctypes(N):
    out = ctypes.c_double(0)
    _calc_pi(N, ctypes.byref(out))
    return out.value*4

calc_pi_ctypes(10**6)

3.1415916535897743

In [14]:
%timeit calc_pi_ctypes(10**6)

4.95 ms ± 50.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Numba 
Compile your python code for free

In [11]:
import numba

def calculate_pi_python(N):
    res = 0
    sign = 1
    for i in range(N):
        res += sign/(2*i+1)
        sign = -sign        
    return 4*res

calculate_pi_numba = numba.jit(numba.float64(numba.int32))(calculate_pi_python)

%timeit calculate_pi_numba(1000000)


4.8 ms ± 40.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:
2*x**2 + 3*x +1