To run the presentation run:

In [1]:
! jupyter nbconvert slides_2019-09-20.ipynb --to slides

[NbConvertApp] Converting notebook slides_2019-09-20.ipynb to slides
[NbConvertApp] Writing 363918 bytes to slides_2019-09-20.slides.html


---

# __GeTKYP__   2019-09-20
---
<br></br>
# Software Productivity and Performance
<br></br>
## _Continuing Jels' legacy_

# Things you need to hear at least once
---

## - _Code is meant for humans, not computers_

## - _Premature optimisation is the root of all evil_

## - _Howver, never "intentionally" write bad code_

# The book
---
<br></br>
## __Python High Performance__
### Second edition
## Gabriele Lanaro
<br></br>

## _"Skip the first few chapters..."_
## _"Do something on the last chapters about concurrency"_

# Step 0
---
<br></br>
## __Get your code to work properly__


# Step 1
---
<br></br>
## __Check step 0__

# Step 1
---
<br></br>
## __Write Tests and Benchmarks__

# Step 1
---
<br></br>
## Who is using git?
## Who is using Travis CI (or Jenkins, or...)?
## Who is using Anaconda?

# Anaconda and reproducibility
---
<br></br>
You can easily recreate your environment on another machine!
```
conda list --explicit > spec-file.txt
```
<br></br>
```
conda create --name myenv --file spec-file.txt
```
<br></br>
```
conda install --name myenv --file spec-file.txt
```

# Layers of parallelism
---
<br></br>
## 0 : __Instruction-level__ parallelism
## 1 : __Vector__ architectures and __SIMD__-instructions (GPU)
## 2 : __Thread-level__ parallelism (only shared memory)
## 3 : __Request-level__ parallelism (also distributed memory)

# A simple example using mpi4py
---
<br></br>
(See also the [documentation](https://mpi4py.readthedocs.io/en/stable/)!)
<br></br>
Install with [conda](https://www.anaconda.com/distribution/) (as you should!)
<br></br>
```
conda install -c anaconda mpi4py
```
<br></br>
__WARNING:__ Conda can "cripple" itself in doing so (see [issue 5454](https://github.com/conda/conda/issues/5454))
<br></br>
__Note:__ Currently only available for python versions 2.7, 3.3, 3.4 and 3.5
<br></br>
```
conda create --name py35env python=3.5
```
```
conda activate py35env
```

# The simplest MPI script
---
<br></br>
Contents of `show_process.py`
<br></br>
```python
from mpi4py import MPI

comm  = MPI.COMM_WORLD
rank  = comm.Get_rank()
nproc = comm.Get_size()

print("This is process", rank)
```

# The simplest MPI script
---
<br></br>

In [1]:
! python show_process.py

This is process 0


In [2]:
! mpiexec -n 2 python show_process.py

This is process 1
This is process 0


In [3]:
! mpiexec -n 3 python show_process.py

This is process 1
This is process 2
This is process 0


🤓 Note: prefer `mpiexec` over `mpirun` as the former is standerdized in the MPI standard.

# Simple loop
---
<br></br>
Contents of `simple_loop.py`
<br></br>
```python
from mpi4py import MPI

comm  = MPI.COMM_WORLD
rank  = comm.Get_rank()
nproc = comm.Get_size()

arr = [1,2,3,4,5,6,7,8]

for i in range(rank, total, nproc):
    print('proc', rank, 'does number', i)
```

# Simple loop
---
<br></br>

In [5]:
! python simple_loop.py

proc 0 does number 0
proc 0 does number 1
proc 0 does number 2
proc 0 does number 3
proc 0 does number 4
proc 0 does number 5
proc 0 does number 6
proc 0 does number 7


In [7]:
! mpiexec -n 2 python simple_loop.py

proc 0 does number 0
proc 0 does number 2
proc 0 does number 4
proc 0 does number 6
proc 1 does number 1
proc 1 does number 3
proc 1 does number 5
proc 1 does number 7


In [8]:
! mpiexec -n 3 python simple_loop.py

proc 0 does number 0
proc 0 does number 3
proc 0 does number 6
proc 2 does number 2
proc 2 does number 5
proc 1 does number 1
proc 1 does number 4
proc 1 does number 7


In [5]:
def complicated_blackbox_function ():
    '''
    Just a stupid function that takes a while to execute.
    '''
    N = int(1.0E+7)
    while (N > 0):
        N -= 1
    return N

In [6]:
%timeit complicated_blackbox_function()

394 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
