## Beyond numpy
Some calculation cannot be efficienclty performed with numpy
* numpy need a lot of memory
* Operation not implemented

Example : 
* Calculation of $\pi$ (With a very very very slow formula!!!)
$$ \frac\pi4 = \sum_i \frac{(-1)^i}{2i+1} = 1 - \frac13 + \frac 15 - \frac17 + \ldots $$

* Operation similar to cumsum
$$ y_n = f(y_{n-1}, x_n) $$

In [5]:
# Implementation in pure python
# numpy

import numpy as np

N = 10000000

def calc_pi_numpy(N):
    i = np.arange(N)
    return np.sum((1 - 2*(i%2))/(2*i + 1))

%timeit calc_pi_numpy(N)


207 ms ± 15.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [7]:
calc_pi_numpy(N)*4

3.1415925535897977

## ctypes
* Interface between python and shared library (dll, so)
* Accelerate your code (this method is not recommended)
* Use existing code !!!
* Use closed source library

No magic : you have to know C and deal with pointer, memory allocation, ...

In [8]:
%%writefile pi.c

#include <stdio.h>
#include <stdlib.h>

int calc_pi(int N, double * out){
    int i;
    double sgn = 1;
    *out = 0;
    for(i=0; i<N; i++){
        *out += sgn/(2*i+1);
        sgn = -sgn;
        }
    }

// gcc -shared -o libpi.so -fPIC pi.c  -Wno-pointer-to-int-cast

Writing pi.c


In [9]:
!gcc -shared -o libpi.so -fPIC pi.c  -Wno-pointer-to-int-cast

In [10]:
import ctypes
lib = ctypes.cdll.LoadLibrary('./libpi.so')

# Raw function
_calc_pi = lib.calc_pi

# Wrapper to be python friendly
def calc_pi_ctypes(N):
    out = ctypes.c_double(0)
    _calc_pi(N, ctypes.byref(out))
    return out.value*4

calc_pi_ctypes(10**6)

3.1415916535897743

In [11]:
%timeit calc_pi_ctypes(N)

23.9 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Numba 
Compile your python code for free

numba.vectorize

In [13]:
def calc_pi(N):
    res = 0
    sgn = 1
    for i in range(N):
        res = res + sgn/(2*i + 1)
        sgn = -sgn
    return 4*res

N = 10000000


%timeit calc_pi(N)

76.9 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [14]:
import numba

calc_pi_numba = numba.jit(numba.float64(numba.int32), nogil=True)(calc_pi)


In [17]:
N = 10000000

%timeit calc_pi_numba(N)

10.6 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [21]:
def my_abs_python(x):
    if x>0:
        return x
    else:
        return -x
    
my_abs_numba = numba.vectorize(my_abs_python)

my_abs_numba(np.random.rand(10)-.5)

array([0.37180335, 0.11937428, 0.2860404 , 0.16565811, 0.15553356,
       0.06154094, 0.1753101 , 0.41395646, 0.2791716 , 0.12612947])

In [22]:
data = np.random.rand(10**6)-.5

%timeit my_abs_numba(data)

668 µs ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [23]:
%timeit np.abs(data)

693 µs ± 58.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
