# High performance Python 🚀
### Zbyszek & Aitor
### ASPP 2023, Heraklion, Greece

fork and clone the respository now please! :)

## Outline

* Introduction
* Analyze what makes your code slow and inefficient with **profiling** (Aitor)
* Speed & *convenience* with **Numba** (Zbyszek)
* Speed & *flexibility* with **Cython** (Zbyszek)
* Outroduction

## Introduction

* By now you are the *Master of Research*(TM).

<!-- ![Master of research](figures/mor.png) -->

* Using your new skills you can confidently transform any idea into a great manuscript!

* It seems like the only things holding you back are the **execution speed** and **memory usage** of your scripts!

* Profiling can help you identify which parts of your code are slow and use too much memory -> **optimization**

* For extra faster code both `Cython` and `Numba` can take your scripts to the next level.

## Exercise

Who thinks that they would benefit from faster code?

Please raise your hand.

Who has had code use too much memory?

Please raise your hand.

Who has spent countless hours fiddeling with code trying to optimize it?

Please raise your hand.

## The three rules of optimization
(adapted from Sebastian Witowski, EuroPython 2016)

#### 1. Don't.
- Likely you don't need it.
- Optimization comes with costs.

#### 2. Don't yet.
- Is your code finished?
- Did you write tests?
- Are you sure it's worth the investment?

#### 3. Profile
- Collect data - don't guess which part of your code you should optimize!

## Profiling demo: the Mandelbrot set

Let us use the Mandelbrot set (`mandelbrot.py`) as an example for this part, because fractals are pretty!

Using this script Aitor will demonstrate time and memory profiling in python.

Don't worry about understanding every line in the code!

<img src="./figures/mandelbrot.png" alt="mandelbrot" style="width:600px;"/>

## Collect basic data (`runtime`)
- while optimizing it's a good idea to keep track of the total runtime of your script
- even though modern profilers introduce little overhead this makes sure that your code changes translate into actual speedups
- the simplest way to do this is via `time` (or the equivalent on your OS):
```
time python mandelbrot_example.py 

	real	0m5,994s
	user	0m5,967s
	sys	0m0,961s
```
  - you're typically interested in "user time"

**EXERCISE** How long does it take in your machine?

## More data (`runtime`, `memory`...)

- One can also get data about CPU availability, memory usage and more
- In Linux you can use the GNU binary, which can be easily called as `\time`
- Use the `-v` verbose flag to get more information
- Thus: 

```
\time -v python mandelbrot_example.py
```

```
	Command being timed: "python mandelbrot_example.py"
	User time (seconds): 6.14
	System time (seconds): 0.94
	Percent of CPU this job got: 115%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.10
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 370700
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 104585
	Voluntary context switches: 565
	Involuntary context switches: 148
	Swaps: 0
	File system inputs: 0
	File system outputs: 2056
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

```

We are interested in `User time` and `Maximum resident set size`

In [2]:
# From KB to MB
370700 / 1024

362.01171875

## Collect fine grained `runtime` data with `py-spy`

**profilers** monitor the execution of your script, record statistics, and thus can **provide an understanding of the performance characteristics of your code**

Here we consider [py-spy](https://github.com/benfred/py-spy), a sampling-based runtime profiler for Python
  - simply speaking `py-spy` examines your program at regular intervals and records which function (or rather line) is currently being executed
- No code modifications needed!

<img src="./figures/sampling.svg" alt="sampling" style="width:600px;"/>

### Usage of `py-spy`
To time profile your script:
```
py-spy record -o flamegraph_mandelbrot.svg python mandelbrot_example.py
```
  - to make timings accurate it needs to collect enough of data; you can control the "sampling rate" using the `-r` argument
  - get more info on arguments with `-h`

- `py-spy` will will produce a "flamegraph" like the following (here `flamegraph_mandelbrot.svg`; open it with firefox)

<img src="./figures/flamegraph_mandelbrot.svg" alt="flamegraph" style="width:600px;"/>

## Collect fine grained `memory` data with `memray`

Here we will use [memray](https://github.com/bloomberg/memray), which tracks and reports memory allocations, both in Python code and in compiled extension modules. 
- No code modifications needed!

### Usage of `memray`
You can memory profile your script:
```
memray run mandelbrot_example.py
```

This will create a bin file in your current directory `memray-mandelbrot_example.py.XXXXX.bin`

Create a flamegraph like so (an equivalent command will we printed by memray for you in screen! You can just copy that!)
```
memray flamegraph memray-mandelbrot_example.py.XXXXX.bin
```
Finally creating an `.html` file, open it with firefox

<img src="./figures/memray_flamegraph.png" alt="memory" style="width:600px;"/>
<img src="./figures/mandelbrot_memory.png" alt="memory" style="width:600px;"/>

## Exercise

It's time to put theory into practice. We have prepared an example script (see [./profiling/numerical_integration.py](./profiling/numerical_integration.py)) which numerically computes the integral of a function and measures the error with respect to analytical integration.

0. Fork & clone this repository.
1. Familarize yourself with the script and exectute it: `python numerical_integration.py`
2. Profile the script both for speed (py-spy) and memory (memray). Look at the flamegraphs from both!
3. Optimize the script (**Only the indicated functions**), can you make it faster and more memory efficient?
4. Commit your changes in a new branch and create a PR. Include the duration before/after optimization in the PR message.

#### Hints
- Each profiler will tell you where more time is spent or memory is allocated, these are the optimization opportunities!
- Avoid using external libraries

Afterwards we will discuss the results jointly.

## Refresher: numerical integration

![RiemannSum](figures/MidRiemann2.svg)

Riemann sum: $\int_a^b dx f(x) \approx \sum_{i = 0}^{n - 1} f(a + (i + 0.5) \Delta x) \Delta x$ with $\Delta x = (b - a)/n$

here $a=0, b=2, n=4$

## Exercise discussion

What did we learn?
- ...

## Profiling conclusion

- Before optimizing, first finish your code & write tests!
- Then *measure* to find functions(/lines) that take up most of the time.
- Only optimize the relevant functions(/lines), measure again, and *know when to stop*!
  - 1min script you run 5 times
  - 8h script you run 1000 times
- To gain some basic data, you can use builtin tools
  - `time` (commandline)
  - `%timeit` (ipython, jupyter)
  - `import timeit; timeit.time('some_func()')` (requires code changes)
- profilers collect more fine grained data

## Beyond `py-spy` and `memray`
For *runtime*
- [py-spy](https://github.com/benfred/py-spy) is just one of many *runtime* profilers; alternatives include
  - [cProfile](https://docs.python.org/3/library/profile.html) (builtin!) + [snakeviz](https://github.com/jiffyclub/snakeviz)
  - [pyinstrument](https://github.com/joerick/pyinstrument)
  - [austin](https://github.com/P403n1x87/austin)
  - ...

For *memory* 
- [memray](https://github.com/bloomberg/memray) is just one of many *memory* profilers; alternatives include
  - [memory-profiler](https://pypi.org/project/memory-profiler/)
  - [pympler](https://pypi.org/project/Pympler/)
  - [guppy3](https://pypi.org/project/guppy3/)
  - ...
 
 
With modern tools, **profiling is easy! Use it!**

### Optimization: what to do (in order of [subjective] increasing complexity)

- **Do nothing**
- Vectorization (`numpy`!!)
- Data structures and algorithms
- Memoization / caching
- Non-Python libraries (`blas`, `openblas`, `blis`, `atlas`, Intel `mkl`, ...)
- Buy better hardware
- **Numba**
- **Cython** / pythran
- **Parallelization** (->tomorrow)
- GPUs (`cuda`, `opencl`, `directml`, ...)
- Low-level code