## Beyond numpy
Some calculation cannot be efficienclty performed with numpy
* numpy need a lot of memory
* Operation not implemented

Example : 
* Calculation of $\pi$ (With a very very very slow formula!!!)
$$ \frac\pi4 = \sum_i \frac{(-1)^i}{2i+1} = 1 - \frac13 + \frac 15 - \frac17 + \ldots $$

* Operation similar to cumsum
$$ y_n = f(y_{n-1}, x_n) $$

In [None]:
# Implementation in pure python
# numpy

## ctypes
* Interface between python and shared library (dll, so)
* Accelerate your code (this method is not recommended)
* Use existing code !!!
* Use closed source library

No magic : you have to know C and deal with pointer, memory allocation, ...

In [None]:
%%writefile pi.c

#include <stdio.h>
#include <stdlib.h>

int calc_pi(int N, double * out){
    int i;
    double sgn = 1;
    *out = 0;
    for(i=0; i<N; i++){
        *out += sgn/(2*i+1);
        sgn = -sgn;
        }
    }

// gcc -shared -o libpi.so -fPIC pi.c  -Wno-pointer-to-int-cast

In [None]:
!gcc -shared -o libpi.so -fPIC pi.c  -Wno-pointer-to-int-cast

In [None]:
import ctypes
lib = ctypes.cdll.LoadLibrary('./libpi.so')

# Raw function
_calc_pi = lib.calc_pi

# Wrapper to be python friendly
def calc_pi_ctypes(N):
    out = ctypes.c_double(0)
    _calc_pi(N, ctypes.byref(out))
    return out.value*4

calc_pi_ctypes(10**6)

## Numba 
Compile your python code for free

numba.vectorize

In [None]:
import numba

numba.jit(numba.float64(numba.int32), nogil=True)(calc_pi)
