## Before using this notebook:

### run 
```
$ pythran pythran_example1.py
```
in order to convert the python code in 'pythran_example.py' to C++. (will generate a .so file)

### What's in this file?
Just one function, for finding the minimum product of pairs in two lists, with a comment to tell pythran the data types;

```
#pythran export min_product(float32 list, float32 list)
def min_product(arr1, arr2):
    assert (len(arr1) == len(arr2)), 'mismatch in dimensions'
    return min([a*b for a,b in zip(arr1,arr2)])

```

In [3]:
!pythran pythran_example1.py

CRITICAL: Cover me Jack. Jack? Jaaaaack!!!!
E: error: Command "/home/luke/anaconda3/envs/HighPerformance/bin/x86_64-conda_cos6-linux-gnu-c++ -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/luke/anaconda3/envs/HighPerformance/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/luke/anaconda3/envs/HighPerformance/include -fPIC -DENABLE_PYTHON_MODULE -D__PYTHRAN__=3 -DPYTHRAN_BLAS_BLAS -I/home/luke/anaconda3/envs/HighPerformance/lib/python3.8/site-packages/pythran -I/home/luke/anaconda3/envs/HighPerformance/lib/python3.8/site-packages/numpy/core/include -I/usr/local/include -I/usr/include -I/home/luke/anaconda3/envs/HighPerformance/include -I/home/luke/an

In [1]:
import numpy as np
import pythran_example1 as pe1

ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory

### Let's test our pythranized function;

In [2]:
pe1.min_product([1.2,1.4,99,55,1.00000002],[88,77,66,55,44])

NameError: name 'pe1' is not defined

### Not ideal, what happened?

We told pythran to expect float32 variables, but some of our elements were lists. In pure Python that's fine, but not with pythran. Can we still use the function though?

In [None]:
pe1.min_product([1.2,1.4,99.,55.,1.00000002],[88.,77.,66.,55.,44.]) # Note: added '.' after each int element

### Yay, so as long as we type carefully / cast beforehand we're grand.
### Now let's see how much faster the pythranized one is:

In [None]:
import timeit

def min_product_pure_python(arr1, arr2):
    assert (len(arr1) == len(arr2)), 'mismatch in dimensions'
    return min([a*b for a,b in zip(arr1,arr2)])

%timeit pe1.min_product([1.2,1.4,99.,55.,1.00000002],[88.,77.,66.,55.,44.]) 
%timeit min_product_pure_python([1.2,1.4,99.,55.,1.00000002],[88.,77.,66.,55.,44.]) 

### Barely a difference; let's try with some much bigger lists;

In [None]:
import numpy as np

test_a = [i for i in np.random.rand(1000000)] # we can't pass a numpy array (pythran TypeError) - we said we'd use a list
test_b = [i for i in np.random.rand(1000000)]

%timeit pe1.min_product(test_a, test_b) 
%timeit min_product_pure_python(test_a, test_b) 

### That's a bit more like it - over 4 times as fast, and with only adding a comment to tell pythran the types

### So, useful - but nothing we couldn't do easily already with Cython. 
### Let's do something that's a bit trickier with Cython - using Numpy stuff as well

In [None]:
# I've borrowed this example from Pythran's documentation
# We'll make the pure python version first:

import numpy as np
def arc_dist(theta1, phi1, theta2, phi2):
    temp = (np.sin((theta2-theta1)/2)**2 + 
           (np.cos(theta1)*np.cos(theta2)) * np.sin((phi2-phi1)/2)**2)
    return 2 * np.arctan2(np.sqrt(temp), np.sqrt(1-temp))

'''
And our pythran version (will be pretty much the exact same:

#pythran export arc_dist(float[], float[], float[], float[])
import numpy as np
def arc_dist(theta1, phi1, theta2, phi2):
    temp = (np.sin((theta2-theta1)/2)**2 + 
           (np.cos(theta1)*np.cos(theta2)) * np.sin((phi2-phi1)/2)**2)
    return 2 * np.arctan2(np.sqrt(temp), np.sqrt(1-temp))
''';

### Okay, run 
```
$ pythran pythran_example2.py
```
### to compile this one, then let's compare them;

In [None]:
import pythran_example2 as pe2

theta_1 = np.random.rand(100000)*np.pi 
theta_2 = np.random.rand(100000)*np.pi 

phi_1 = np.random.rand(100000)*np.pi*2 - np.pi
phi_2 = np.random.rand(100000)*np.pi*2 - np.pi

%timeit pe2.arc_dist(theta_1,phi_1,theta_2,phi_2) 
%timeit arc_dist(theta_1,phi_1,theta_2,phi_2) 

##### (Keep in mind that for this comparison we compared Pythran to Numpy, rather than Pythran to pure Python)
### A bit of an improvement, but can we do better?
### We can! we can tell pythran to try to vectorize and parallelise the loops;

We've already got a pythran_example2 module created and imported so we're going to give this version a new name with the '-o' flag in the pythran command; run
```
$ pythran -O5 -fopenmp -march=native pythran_example2.py -o pythran_example2_opt.so
```
*flags*
* -O5 : optimisation level 5
* -fopenmp : use openmp to parallelise
* -march=native : target archietecture compatability for whatever chip you're currently using

And let's test it now;

In [None]:
import pythran_example2_opt as pe2opt
%timeit pe2opt.arc_dist(theta_1,phi_1,theta_2,phi_2) 

### Twice as fast! And if you have a better processor than my laptop does (or access to a server cluster) you'll see a much bigger improvement.

### So far this has mostly been using external files and commands, but pythran does have some jupyter-specific functionalities, similar to cython - we'll quickly look at them now.

In [None]:
%load_ext pythran.magic 

In [None]:
%%pythran -O2 -fopenmp # pass arguments like this (note no brackets - unlike some tutorials)
                       # This is a python3 update / change I *think*, python2 may still use
                       # %%pythran(-O2 -fopenmp) type syntax

#pythran export average(float[]) # .so file generated named 'pythranised_<some sha1 hash>.so'
def average(nums): 
    running_total = 0
    for num in nums:
        running_total += num
    return running_total/len(nums)

In [None]:
def average_pure_python(nums): 
    running_total = 0
    for num in nums:
        running_total += num
    return running_total/len(nums)

In [None]:
test_list = np.random.rand(100000)

%timeit average(test_list)
%timeit average_pure_python(test_list)

### As always, this is a massive, complicated tool that has a lot more to offer than I've shown, as well as other limitations I've not mentioned. This guide is only intended as an introduction / something to try to quickly get some speedups cheaply