Compiling to Native Code
====

For programmer productivity, it often makes sense to code the majority of your application in a high-level language such as Python and only optimize code bottleneck identified by profiling. One way to speed up these bottleneck is to compile the code to machine executables, often via an intermediate C or C-like stage. There are two common approaches to compiling Python code - using a Just-In-Time (JIT) compiler and using Cython.

Using `numexpr`
----

One of the simplest approaches is to use [`numexpr`](https://github.com/pydata/numexpr) which takes a `numpy` expression and compiles a more efficient version of the `numpy` expression written as a string. If there is a simple expression that is taking too long, this is a good choice due to its simplicity. However, it is quite limited.

In [20]:
a = np.random.random(int(1e6))
b = np.random.random(int(1e6))
c = np.random.random(int(1e6))

In [26]:
%timeit -r3 -n3 b**2 - 4*a*c

3 loops, best of 3: 28.5 ms per loop


In [27]:
import numexpr as ne

In [28]:
%timeit -r3 -n3 ne.evaluate('b**2 - 4*a*c')

3 loops, best of 3: 6.63 ms per loop


Using `numba`
----

When it works, the JIT `numba` can speed up Python code tremendously with minimal effort. Lets look at some of the [benchmark examples](http://julialang.org) that Julia uses to show off how fast it is compared to other dynamic languages.

### (1) Fibonacci

#### Version used in Julia benchmarks

In [29]:
def fib(n):
    if n<2:
        return n
    return fib(n-1)+fib(n-2)

In [50]:
n = 30

In [51]:
fib(n)

832040

In [52]:
%timeit -r3 -n3 fib(n)

3 loops, best of 3: 691 ms per loop


#### One way to speed this up is to memoize

In [41]:
from functools import lru_cache

In [42]:
@lru_cache()
def fib1(n):
    if n<2:
        return n
    return fib(n-1)+fib(n-2)

In [53]:
%timeit -r3 -n3 fib1(n)

The slowest run took 664498.28 times longer than the fastest. This could mean that an intermediate result is being cached.
3 loops, best of 3: 339 ns per loop


#### Rewrite as non-recursive version

In [44]:
def fib2(n):
    a, b = 0, 1
    for i in range(n):
        a, b = a+b, a
    return a

In [54]:
fib2(n)

832040

In [55]:
%timeit -r3 -n3 fib2(n)

3 loops, best of 3: 4.3 µs per loop


#### Using `numba`

In [5]:
import numba

In [48]:
@numba.njit()
def fib3(n):
    a, b = 0, 1
    for i in range(n):
        a, b = a+b, a
    return a

In [56]:
%timeit -r3 -n3 fib3(n)

The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
3 loops, best of 3: 451 ns per loop


### (2) pisum

In [1]:
def pisum():
    sum = 0.0
    for j in range(1, 501):
        sum = 0.0
        for k in range(1, 10001):
            sum += 1.0/(k*k)
    return sum

In [2]:
pisum()

1.6448340718480652

In [3]:
%timeit -r3 -n3 pisum()

3 loops, best of 3: 1.29 s per loop


In [6]:
@numba.njit()
def pisum2():
    sum = 0.0
    for j in range(1, 501):
        sum = 0.0
        for k in range(1, 10001):
            sum += 1.0/(k*k)
    return sum

In [7]:
%timeit -r3 -n3 pisum2()

3 loops, best of 3: 35.8 ms per loop
