## Need for Speed

### Exercise 1:

In [14]:
import matplotlib.pyplot as plt
import numpy as np
from numba import jit
import time

In [2]:
def compute_series(n):
    p, q = 0.1, 0.2
    x = np.empty(n, dtype=int)
    x[0] = 1
    U = np.random.uniform(0, 1, size=n)
    for t in range(1, n):
        current_x = x[t-1]
        if current_x == 0:
            x[t] = U[t] < p
        else: 
            x[t] = U[t] > q
    return x

In [3]:
n = 100000
x = compute_series(n)
print(np.mean(x == 0))

0.66675


In [4]:
%timeit compute_series(n)

10 loops, best of 3: 53.1 ms per loop


In [7]:
compute_series_numba = jit(compute_series)

In [15]:
%timeit compute_series_numba(n)

1000 loops, best of 3: 1.45 ms per loop


In [16]:
start_time = time.time()
compute_series_numba(n)
print("--- %s seconds ---" % (time.time() - start_time))

--- 0.0019366741180419922 seconds ---


In [9]:
%load_ext Cython

In [10]:
%%cython
import numpy as np
from numpy cimport int_t, float_t

def compute_series_cy(int n):
    x_np = np.empty(n, dtype=int)
    U_np = np.random.uniform(0, 1, size=n)
    
    cdef int_t [:] x = x_np
    cdef float_t [:] U = U_np
    
    cdef float p = 0.1
    cdef float q = 0.2
    cdef int t
    
    x[0] = 1
    for t in range(1, n):
        current_x = x[t-1]
        if current_x == 0:
            x[t] = U[t] < p
        else:
            x[t] = U[t] > q
    return np.asarray(x)

In [11]:
compute_series_cy(10)

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 0])

In [12]:
x = compute_series_cy(n)
print(np.mean(x == 0))

0.66625


In [13]:
%timeit compute_series_cy(n)

1000 loops, best of 3: 1.9 ms per loop


### Extra stuff:

Out of curiosity, I tried quickly writing this up in C, without spending too much time debugging (my code doesn't actually give the right answer, and I wasn't thinking much about memory management). Although I wasn't very focussed on writing extremely efficient code (I assumed that speed would come from writing the code in C and then being aggressive with gcc optimization flags), I was surprised to see that my code didn't in fact beat Numba. As can be seen below, I had to build some of Numpy's functionality from scratch, which I'd bet is contributing to a lot of the slowdown. I'd be interested in running it through callgrind, but I don't have valgrind downloaded on my local machine