# How PyPy can help High Performance Computing

## Short bio
bla bla bla

# How many of you use Python?

# How many have ever had performance problems?

# Why do you use Python, then?

# Python strong points

* Simplicity
* Lots of libraries
* Ecosystem

* Ok, but why?

# Python *REAL* strong points

* Expressive & simple APIs
* Uniform typesystem (everything is an object)
* Powerful abstractions


# Example: JSON

```java
JSONObject jsonObj = new JSONObject(jsonString);

JSONArray jArray = jsonObj.getJSONArray("data");
int length = jArray.length();
for(int i=0; i<length; i++) {
    JSONObject jObj = jArray.getJSONObject(i);
    String id = jObj.optString("id");
    String name=jObj.optString("name");

    JSONArray ingredientArray = jObj.getJSONArray("Ingredients");
    int size = ingredientArray.length();
    ArrayList<String> Ingredients = new ArrayList<>();

    for(int j=0; j<size; j++) {
        JSONObject json = ja.getJSONObject(j);
        Ingredients.add(json.optString("name"));
    }
}

// googled for "getJSONArray example", found this:
// https://stackoverflow.com/questions/32624166/how-to-get-json-array-within-json-object

```

```python
obj = json.loads(string)
for item in obj['data']:
    id = item['id']
    name = item['name']
    ingredients = []
    for ingr in item["ingredients"]:
        ingredients.append(ingr['name'])
```

# So far so good, BUT

<center><img src="images/abstractions.svg" width="50%" /></center>

# Example of temporary objects
## Bound methods

In [None]:
class A(object):
    def foo(self):
        return 42

a = A()
bound_foo = a.foo
%timeit a.foo()
%timeit bound_foo()

# Ideally
### Think of concepts, not implementation details


# Real world
### Details leak to the user

# Python problem
### Tension between abstractions and performance

# Classical Python approaches to performance

# 1. Work around in the user code
### e.g. create bound methods beforehand

# 2. Work around in the language specs

* range vs xrange
* dict.keys vs .iterkeys 
* int vs long
* array.array vs list

* Easier to implement
* Harder to use
* Clutter the language unnecessarily
* More complex to understand
* Not really Pythonic

# 3. Stay in C as much as possible

* map + operator.* instead of plain simple python

XXX find some numpy example


In [24]:
import operator
import functools
import itertools
import numpy as np

#numbers = np.random.random(1000)
numbers = range(1000, 2000)


In [25]:
#% timeit doubles = map(functools.partial(operator.mul, 2), numbers)
% timeit [x*2 for x in numbers]

The slowest run took 8.65 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 47.3 µs per loop


In [27]:
%timeit np.array(numbers)*2

The slowest run took 49.95 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 42.5 µs per loop


# 4. Rewrite in C

* `#include "Python.h"`
* Cython
* CFFI


# "Rewrite in C" approach
## aka, 90/10 rule

<img src="images/90-10-rule-1.svg">

<img src="images/90-10-rule-2.svg">

<img src="images/90-10-rule-3.svg">

* Abstractions cost
* Code quality => poor performance
* Python parts become relevant

# (C)Python in the HPC world

# Python as a glue-only language

# Tradeoff between speed and code quality

# PyPy

* Alternative Python implementation
* Ideally: no visible difference to the user
* JIT compiler
* http://pypy.org

# How fast is PyPy?

### Wrong question

* Up to 80x faster in extreme cases
* 10x faster in good cases
* 2x faster on "random" code
* sometime it's just slower 

# PyPy flaws

* Far from being perfect
* it leaks *other* implementation details than CPython
  - e.g. JIT warmup, GC pecularities


# PyPy qualities

# Make pythonic, idiomatic code fast

# Abstractions are (almost) free



# The better the code, the biggest the speedup

<img src="images/90-10-rule-1.svg">

<img src="images/90-10-rule-pypy.svg">

# Python as a first class language
# No longer "just glue"

# Example: Sobel filter

* for full explanation: link to my EP2017 talk

# The *BIG* problem: C extensions
(WIP)

# Current status

* Most C extensions just work: numpy, scipy, pandas, etc.
* Slow :(

XXX show diagram of cpyext slowness

# We are working on it

# Future status (hopefully)
* All C extensions will just work
* C code as fast as today, Python code super-fast
* The best of both worlds
* PyPy as the default choice for HPC



My personal estimate: 6 months of work and we have a fast cpyext

(let's talk about money :))