# Optimisation - Cython
## Martin Robinson
## Oct 2019

# Using lower-level languages

- Interpreted languages are fundamentially speed-limited when they only consider *type* at run-time.
- e.g. consider what happens with the types of the variables in the following function
```python
def norm(arg_list, p):
    sum = 0               # sum is an int here
    for x in arg_list:    # type of x depends on input container
        sum += abs(x)**p  # type of rhs depends on both x and p, sum could *change* type here
    return sum**(1.0/p)   # return value is probably float due to 1.0
```
- how much memory to allocate for sum? does this memory need to be re-allocated during the loop? are conversion routines between types required during the loop?

- compare to equivilant C++ code
```cpp
float norm(std::vector<float>& arg_list, float p) {
    float sum = 0.0f;
    for (size_t i = 0; i < arg_list.size(); ++i) {
        sum += std::pow(std::abs(arg_list), p);
    }
    return std::pow(sum, 1.0f/p);
}
```
- compiler can pre-allocate the stack size because the sizes of local variables known, no type conversions neccessary
- compiler can generate efficient machine code because the programmer has provided more information (i.e. types)
    

# "Compiling" Python code

- All python implementations (CPython, PyPy, IronPython) compile to *bytecode*, which is then either interpreted at run-time, or perhaps further compiled to native machine code
- Implementations that compile to native machine code usually implement something close to normal python, but with restrictions or additions that alter the nature of the language. These include:
    - Cython (Python-to-C)
    - Nuitka (Python-to-C++)
    - Numba (Python-to-LLVM IR)

# "Wrapping" C and C++ for use in Python

- the compilers in the previous slide implement an altered version of python, yet another language to learn!
- If your already comfortable with C, C++ or Fortran, why not use this directly and write a *wrapper* to call from Python?
- 

# Cython

# Your first Cython program

In [None]:
```python
def norm(a, p):
    s = 0
    x_max = a.shape[0]
    y_max = a.shape[1]
    for i in range(x_max):
        for j in range(y_max):
            s += abs(a[i, j])**p
    return s**(1.0/p)

```

# Manual compilation

# Examining the generated code

# Adding types

# memoryviews

# Tuning indexing further

In [None]:
```python
from libc.math cimport abs
cimport cython

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
@cython.cdivision(True)     # Deactivate normal python division checking
cdef double norm(double [:, :] a, int p):
    cdef double s = 0
    cdef Py_ssize_t x_max = a.shape[0]
    cdef Py_ssize_t y_max = a.shape[1]
    for i in range(x_max):
        for j in range(y_max):
            s += abs(a[i, j])**p
    return s**(1.0/p)
```

# Packaging Cython programs