# Advanced and Parallel Python

## 1- Scientific Python Software Stack

### Numpy

In [1]:
! pip install numpy
import numpy

Ignoring indexes: https://pypi.python.org/simple


### Scipy

In [2]:
! pip install scipy
import scipy

Ignoring indexes: https://pypi.python.org/simple


### Matplotlib

In [3]:
! pip install matplotlib 
from matplotlib import pyplot as plt
%matplotlib inline

Ignoring indexes: https://pypi.python.org/simple


### Pandas

In [4]:
! pip install pandas
import pandas

Ignoring indexes: https://pypi.python.org/simple


### Cython

In [5]:
! pip install cython
%load_ext Cython

Ignoring indexes: https://pypi.python.org/simple


### Line Profiler

In [6]:
! pip install line_profiler
%load_ext line_profiler

Ignoring indexes: https://pypi.python.org/simple


### Numba

In [7]:
! pip install numba
import numba

Ignoring indexes: https://pypi.python.org/simple


## 2- Why accelerating Python?

Simple problem, estimating $pi$:

$$\huge{\pi = \arctan\left(1\right)}$$

$$\huge{\pi \approx 4 \times \left(\frac{1}{1} - \frac{1}{3} + \frac{1}{5} - \ldots\right)}$$

### How to write this in Python?

In [None]:
def approx_pi(intervals):
    pi = 0
    for i in range(intervals):
        pi += (4 - 8 * (i % 2)) / (2 * i + 1)
    return pi
approx_pi(100000)

### How to bench it?

In [None]:
%timeit -r 20 -n 100 approx_pi(100000)

### How to write this in C?

In [None]:
%%writefile approx_pi.c 
double approx_pi(int intervals) {
    int i;
    double pi = 0.0;
    for(i = 0; i < intervals; i++) {
        pi += (4 - 8 * (i % 2)) / (double)(2 * i + 1);
    }
    return pi;
}

In [None]:
# compile
! gcc -o approx_pi.so -fPIC -shared approx_pi.c

### How to call C from Python?

In [None]:
import ctypes
import os

In [None]:
approx_pi_c = ctypes.cdll.LoadLibrary(os.path.join(os.getcwd(), 'approx_pi.so')).approx_pi
approx_pi_c.restype = ctypes.c_double

In [None]:
# bench
%timeit approx_pi_c(100000)

### Data Analytics with Timeit

In [None]:
import timeit

In [None]:
results = {}

In [None]:
# Admettons que le best of 3 ça ne fasse pas notre affaire vraiment
result = %timeit -q -o -r 10 -n 1 approx_pi(int(1e5))
results["CPython"] = result.all_runs

In [None]:
result = %timeit -q -o -r 10 -n 1 approx_pi_c(int(1e5))
results["ctypes"] = result.all_runs

In [None]:
pandas.DataFrame(results).describe()

### Using timeit to test hypothesis

In [None]:
def sum_values(dict):
    for key in range(10)

## 3- Vectorizing with Numpy

$$\huge{\sum_{i=0}^N \frac{i}{10}}$$

In [None]:
def numpy_sum(N):
    s = numpy.float(0)
    for i in range(N):
        s += numpy.float(i) / 10.
    return s

def numpy_sum(N):
    return numpy.sum(numpy.arange(N / 10, step=0.1))

# def numpy_sum(N):
#     return numpy.sum(numpy.arange(N) / 10.)

%timeit numpy_sum(100000)

In [None]:
def numpy_sum(N):
    return numpy.sum(numpy.arange(0, N+1, 0.1))

In [None]:
def numpy_sum(N):
    return numpy.sum(numpy.arange(0, N / 10, 0.1))
numpy_sum(100000)

### Using Numpy
#### Exercise 1

Use numpy's arange function to speedup the Pi approximation function

In [None]:
def approx_pi(intervals):
    pi = 0.0
    for i in range(intervals):
        pi += (4 - 8 * (i % 2)) / (float)(2 * i + 1)
    return pi
%timeit approx_pi(100000)

In [None]:
def approx_pi_numpy(intervals):
    num = numpy.array([4, -4] * int(intervals / 2), dtype=float)
    denum = numpy.arange(1, intervals * 2 + 1, 2, dtype=float)
    return numpy.sum(num / denum)
approx_pi_numpy(100000)

In [None]:
%timeit approx_pi_numpy(100000)

#### Exercise 2

- Run your function and confirm the result is correct
- Benchmark your function and keep the results

## 4- Finding Bottlenecks

In [None]:
#import

def gen_data(n):
    pass

def sum_nexts(numbers):
    pass

def main(n):
    pass

$$\huge{\text{sum}_i = \sum_{a=i+1}^N \text{number}_a}$$

#### Exercise 3

Benchmark the function execution time with the timeit magic we have seen earlier.

#### Profiling with prun

In [None]:
# prun sorted

#### Getting more details with line profiling

#### Comparing with Numpy

In [None]:
def gen_data_np(n):
    pass

def sum_nexts_np(numbers):
    pass

def main_np(n):
    pass

#### Food for thoughts

- What happends with `sum_nexts_np` if `numbers` is not a numpy array? 
- How to circumvent this?
- What is the complexity of this algorithm?
- Can we do better than this in pure Python?

#### Exercise 4: More Data Processing

Use the line profiler, find hotspots and optimize the following function.

In [None]:
def generate_data():
    with open("inputs.dat", "w") as fp:
        for _ in xrange(5000):
            for _ in xrange(5000):
                fp.write("{0},".format(random.random()))
            fp.write("\n")

def read_data():
    data = []
    fp = open("inputs.dat", "r")

    line = 1
    while line:
        line = fp.readline()
        if line:
            row = []
            for elem in line.split(','):
                elem = elem.strip()
                if elem:
                    row.append(float(elem))
            data.append(row)
        
    fp.close()
    return data

def process_A(data):
    """
    Return a new matrix of the same shape as data, with each original
    element squared by it's transposition equivalent.

    result[i][j] = data[i][j] ** data[j][i]
    """
    result = []
    for i in range(len(data)):
        row = []
        for j in range(len(data[i])):
            row.append(data[i][j] ** data[j][i])
        result.append(row)
    return result

def process_B(m1, m2):
    """
    Return the sum of the difference between each corresponding
    elements of two square matrices.

    diff = (m2[0][0] - m1[0][0]) + (m2[0][1] - m1[0][1]) + ...
    """

    diff = 0.
    for i in range(len(m1)):
        for j in range(len(m1[i])):
            diff += m2[i][j] - m1[i][j]
    return diff

def main():
    generate_data()
    data = read_data()
    result_1 = process_A(data)
    print("Difference is: ", process_B(data, result_1))

## 5- Compiling Python Code

## 6- Using Multiple Cores

## 7- Scaling Beyond One Machine