## Compiled vs Interpreted

You may have heard about the differences between "compiled programming languages" and "interpreted programming languages"

* A compiled language is run in a few steps:
  1. Programmer writes the code
  2. Compiler converts that code into machine code
  3. Computer runs machine code. Note that once the code is compiled, it can be run whenever one wants without the compilation step
* An interpreted language runs code differently:
  1. Programmer writes code
  2. Computer "runs" the code by
    * An "interpreter" reads the code line-by-line
    * For each line, the interpreter figures out what the inputs are and tries to convert it to machine code
    * Computer runs the machine code

**Pros and cons of compiled**

* Once the compiler has run, the code is already machine code and runs very fast (as fast as possible given the code you wrote)
* For very large programs, compilation requires the upfront cost of compilation which can take minutes/hours
* Compiled programs can only be shared within similar hardware architecture and operating systems (though as long as there's a compiler for the hardware/OS, one could recompile the code)

**Pros and cons of interpreted**

* As long as there is an interpreter for the hardware/operating system, interpreted code can be easily shared
* Significantly slower than compiled code because of the back and forth to read the code line-by-line (which has to be redone each time the code is run!)
* Easier to interact with your code (and more importantly, your data!) because you can run one line at a time

## How different?

### Python

In [1]:
import numpy as np


def calculate_pi_python(n=1_000_000):
    """
    Approximates pi by drawing two random numbers and
    determining whether the of the sum of their squares
    is less than one (which tells us if the points are
    in the upper-right quadrant of the unit circle). The
    fraction of draws in the upper-quadrant approximates
    the area which we can then multiply by 4 to get the
    area of the circle (which is pi since r=1)
    """
    in_circ = 0

    # Iterate for many samples
    for i in range(n):
        # Draw random numbers
        x = np.random.random()
        y = np.random.random()

        if (x**2 + y**2) < 1:
            in_circ += 1

    return 4 * (in_circ / n)

In [2]:
%%timeit

calculate_pi_python(1_000_000)

695 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
calculate_pi_python(5_000_000)

3.1418848

### Fortran

In [2]:
%load_ext fortranmagic

In [3]:
%%fortran

subroutine calculate_pi_fortran(n, pi_approx)
    integer, intent(in) :: n
    real, intent(out) :: pi_approx

    integer :: count
    real :: x, y

    count = 0
    
    CALL RANDOM_SEED
    DO i = 1, n
        CALL RANDOM_NUMBER(x)
        CALL RANDOM_NUMBER(y)
        IF (x*x + y*y < 1.0) count = count + 1
    END DO

    pi_approx = 4.0 * REAL(count)/REAL(n)
end subroutine calculate_pi_fortran

In [4]:
%%timeit

calculate_pi_fortran(1_000_000)

12.3 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Fortran >> Python

I ran both of these on my computer and found that it took:

* ~700 ms for Python to do this calculation
* ~11 ms for Fortran to do this calculation

Note that each time it might be slightly different but this is roughly a ~63x speed up... This is a substantial penalty incurred for writing Python.

Given this information, why do we even bother with Python in the first place?

### So why Python if it is so slow?

While the speed advantages of Fortran seem substantial in the previous case, there are reasons that Python ultimately wins out.

* Great tools built by others -- These are often written using lower level languages like C/Fortran
* Easier to develop code quickly -- The objective isn't `min(run_time)` but rather `min(develop_time + run_time)`
* There are other ways to get around the "slow parts" of Python

#### Tools built by others

Let's look at an example of generating 500,000 random numbers and then taking their mean.

**Python**

In [5]:
def mean_of_random_numbers_python(n):
    return np.mean(np.random.rand(n))

In [7]:
%%timeit

mean_of_random_numbers_python(5_000_000)

29.5 ms ± 437 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [8]:
mean_of_random_numbers_python(500_000)

0.49961605191514

**Fortran**

In [9]:
%%fortran

subroutine mean_of_random_numbers_fortran(n, mean_x)
    integer, intent(in) :: n
    real, intent(out) :: mean_x

    mean_x = 0
    CALL RANDOM_SEED
    DO i = 1, n
        CALL RANDOM_NUMBER(x)
        mean_x = mean_x + x
    END DO

    mean_x = mean_x / REAL(n)
end subroutine mean_of_random_numbers_fortran

In [11]:
%%timeit

mean_of_random_numbers_fortran(5_000_000)

30.8 ms ± 672 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [12]:
mean_of_random_numbers_fortran(5_000_000)

0.5000318884849548

In [16]:
%%timeit

4 * np.mean(
    np.sum(np.random.rand(1_000_000, 2)**2, axis=1)
    < 1
)

21.4 ms ± 620 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


#### Easier to develop

As you may be able to tell from previous examples, it's significantly easier to develop Python code than Fortran code. This difference only becomes magnified as code complexity increases.

This is hard to convey without being in the context of a bigger project, so we refer to this [code repository](https://github.com/gabrielgggg/DeadlyDebtCrises/tree/master). Again, to emphasize, the code for this paper is almost certainly correct but we think it was more painful to write this than equivalent Python code.

#### Skip the slow part of Python

There are lots of ways that Python users can "skip the slow parts" of Python while largely getting majority of Python's benefits.

There are two main paths that are followed:

1. Write compiled code and call it from within Python.
2. Use something called "just-in-time" (JIT) compilation for your Python code

We will elaborate on these two paths for the remainder of this notebook

## Compile pieces of your code

We don't want to spend too much time here because we think JIT compilation is a more promising and simple path (as we will hopefully convince you of in this notebook).

That said, it's worth noting that we have already been using a compiled language <-> Python communicator throughout this lecture called `f2py`, i.e. ([Fortran to Python interface generator](https://numpy.org/doc/stable/f2py/)). This package makes it straightforward to write Fortran code that can be called from within Python.

The typical approach if you'd like to use a compiled language to speed up code is:

* Write your code in Python to start
* Benchmark to see which parts of your code are slow
* Rewrite the slowest parts in a compiled language
* Call the compiled language from Python

In [17]:
import cProfile
import pstats

In [18]:
def repeated_calculation_1(n, m):
    avg_pi = 0
    for i in range(m):
        avg_pi += calculate_pi_python(n)

    return avg_pi / m

In [19]:
with cProfile.Profile() as pr:
    repeated_calculation_1(500_000, 100)

    pr.print_stats()

         100000110 function calls in 55.055 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   55.055   55.055 1578056900.py:1(repeated_calculation_1)
      100   25.973    0.260   55.055    0.551 2665805320.py:4(calculate_pi_python)
        1    0.000    0.000    0.000    0.000 cProfile.py:41(print_stats)
        1    0.000    0.000    0.000    0.000 cProfile.py:51(create_stats)
        1    0.000    0.000    0.000    0.000 pstats.py:108(__init__)
        1    0.000    0.000    0.000    0.000 pstats.py:118(init)
        1    0.000    0.000    0.000    0.000 pstats.py:137(load_stats)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.P

In [20]:
def repeated_calculation_2(n, m):
    avg_pi = 0
    for i in range(m):
        avg_pi += calculate_pi_fortran(n)

    return avg_pi / m

In [21]:
with cProfile.Profile() as pr:
    repeated_calculation_2(500_000, 100)

    pr.print_stats()

         10 function calls in 0.615 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.615    0.615    0.615    0.615 1603496900.py:1(repeated_calculation_2)
        1    0.000    0.000    0.000    0.000 cProfile.py:41(print_stats)
        1    0.000    0.000    0.000    0.000 cProfile.py:51(create_stats)
        1    0.000    0.000    0.000    0.000 pstats.py:108(__init__)
        1    0.000    0.000    0.000    0.000 pstats.py:118(init)
        1    0.000    0.000    0.000    0.000 pstats.py:137(load_stats)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




## JIT compilation

JIT is a relatively modern development which has the goal of bridging some of the gaps between compiled and interpreted.

Rather than compile the code ahead of time or interpreting line-by-line, JIT compiles small chunks of the code right before it runs them.

For example, recall the function `mc_approximate_pi_python` (that we wrote earlier) that approximates the value of pi using Monte-carlo methods... We might even want to run this function multiple times to average across the approximations. The way that JIT works is,

1. Check the input types to the function
2. The first time it sees particular types of inputs to the function, it compiles the function assuming those types as inputs and stores this compiled code
3. The computer then runs the function using the compiled code -- If it has seen these inputs before, it can jump directly to this step.

### Our favorite JIT tools

* `Numba`: [Numba](https://numba.pydata.org/) is a package built for Python that adds JIT compilation capabilities for a subset of the Python programming languages -- The priority has been tools for scientific computing `numpy` etc... The main drawback is that only certain packages work with JIT.
* `Julia`: [Julia](https://julialang.org/) is an exciting new language that is based entirely around JIT compilation. The fact that the language is built around JIT means that all packages interact nicely with one another while maintaining their JIT capabilities.

### What works within Numba?

* Many Python objects. including: lists, tuples, dictionaries, integers, floats, strings
* Python logic, including: `if.. elif.. else`, `while`, `for .. in`, `break`, `continue`
* NumPy arrays
* Many (but not all!) NumPy functions

For more information, read these sections from the documentation

* [Supported Python features](https://numba.readthedocs.io/en/stable/reference/pysupported.html)
* [Supported NumPy  features](https://numba.readthedocs.io/en/stable/reference/numpysupported.html)

### When to use Numba?

* Loops!!!
* Can facilitate parallelization
* GPU code generation
* Did we say loops yet?

In [22]:
import numba

In [23]:
calculate_pi_numba = numba.jit(calculate_pi_python, nopython=True)

In [24]:
%%time

calculate_pi_numba(1_000_000)

CPU times: user 524 ms, sys: 72.4 ms, total: 596 ms
Wall time: 646 ms


3.141524

In [25]:
%%timeit

calculate_pi_numba(1_000_000)

6.83 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Why was the second run faster?

Remember the order than JIT works -- The first time it sees a particular function with given input types, it has to compile the function to determine what type everything is.

**Object mode vs no Python mode**

* Object mode: Allows Numba to call out to the Python interpreter if it sees something that it doesn't recognize - The cost is that this is slow and requires Numba to make certain optimization sacrifices
* No Python mode: If it sees an object that Numba doesn't recognize, it throws an error. This helps allow Numba make additional optimizations.

Numba's default behavior used to be to compile things in "object" mode but, recently, they've decided to reverse the default behavior to be no Python mode because it was the main use case (and how they recommend people use it).