# Finding bottlenecks in your Python program


<img src="https://www.explainxkcd.com/wiki/images/e/e2/estimating_time.png" style="width: 500px;"/>


**"First make it work. Then make it right. Then make it fast."** -  Kent Beck

## Profiling allows us to measure resources used by sections of the program. 

Typical resources of interest are
* Amount of wall-clock time  used
* Amount of CPU time  used
* Amount of RAM used

But also other resources can be measured, such as:

* Disk I/O
* Network I/O
* ...

Today, we only consider wallclock/CPU profiling.

## Profiling techniques

Start simple, switch to more complex techniques if needed!

Techniques for measuring wall clock/CPU time in increasing complexity:
1. **Manual timing**
2. **The `timeit` module**
3. The `cprofile` module
4. Line by line profiling

## Case study: filling a grid with point values

* Consider a rectangular 2D grid



<center> 
    <img src="pdf/grid_lattice.svg" style="width: 200px;"/>
Grid lattice
</center>

* A NumPy array `a[i,j]` holds values at the grid points

# Implementation

In [16]:
import numpy as np

class Grid2D(object):
    def __init__(self,
                 xmin=0, xmax=1, dx=0.5,
                 ymin=0, ymax=1, dy=0.5):
        
        self.xcoor = np.arange(xmin, xmax+dx, step=dx)
        self.ycoor = np.arange(ymin, ymax+dy, step=dy)

    def gridloop(self, f):
        lx = np.size(self.xcoor)
        ly = np.size(self.ycoor)
        a = np.zeros((lx,ly))

        for i in range(lx):
            x = self.xcoor[i]
            for j in range(ly):
                y = self.ycoor[j]
                a[i,j] = f(x, y)
        return a

# Usage

Create a new grid:

In [17]:
g = Grid2D(dx=0.001, dy=0.001)

Define function to evaluate:

In [None]:
def myfunc(x, y):
    return np.sin(x*y) + y

Computing grid values:

In [18]:
print("Computing values...")
a = g.gridloop(myfunc)
print("done")

Computing values...
done


# Timing

Use `time.time()` to measure the time spend in a code section.
  ```python
  t0 = time.time()
  # execute code here
  t1 = time.time()
  print("Runtime: {}".format(t1-t0))
  ```
  
*Note*: `time.time()` measures **wall clock time**. Use `time.clock()` to measure **CPU time**.

Timing guidelines:
* Put simple statements in a loop to increase measurment accuracy.
* Make sure to use a constant machine load.
* Run the tests several times, choose the **smallest** time.

## Timing of the case study

The case study has two parts that could potentially be slow: 
1. The initialisation `Grid2D(dx=0.001, dy=0.001)`
2. Calling the `g.gridloop(myfunc)` function.

We time these two parts separately to figure out how much time is spend in each.

### Timing the Grid2D initialisation

In [22]:
import time

t0 = time.time()
g = Grid2D(dx=0.001, dy=0.001)
t1 = time.time()

print("CPU time: {:.4} s".format(t1-t0))


CPU time: 0.0002265 s


### Timing the `gridloop` function

In [23]:
min_time = 1e9  # Keep track of shortest runtime

for i in range(1, 10):

    t0 = time.time()
    g.gridloop(myfunc)
    t1 = time.time()
    
    min_time = min(min_time, t1-t0)
    print("Experiment {}. CPU time: {:.4} s".format(i, t1-t0))
    
print(f"Minimum runtime: {min_time}")

Experiment 1. CPU time: 1.695 s
Experiment 2. CPU time: 1.633 s
Experiment 3. CPU time: 1.618 s
Experiment 4. CPU time: 1.653 s
Experiment 5. CPU time: 1.649 s
Experiment 6. CPU time: 1.705 s
Experiment 7. CPU time: 1.731 s
Experiment 8. CPU time: 1.732 s
Experiment 9. CPU time: 1.691 s
Minimum runtime: 1.6183156967163086


$=>$ The gridloop function is causing the slow execution!

## The *timeit* module

The `timeit` module provides an convenient way for measuring the CPU time of small code snippets.

Usage:

In [24]:
import timeit

timeit.timeit(stmt="a+=1", setup="a=0")

0.05651295100688003

The code is automatically wrapped in a for loop. By default the command is executed 1,000,000 times. It returns the **accumulated** runtime.

You can adjust the the number of executions:

In [7]:
timeit.timeit(stmt="a+=1",setup="a=0", number=10000)

0.00032959198870230466

Use timeit.repeat if you would like to repeat the experiment multiple times:

In [8]:
timeit.repeat(stmt="a+=1",setup="a=0", number=10000, repeat=5)

[0.0003560699988156557,
 0.0004435739974724129,
 0.0006261609960347414,
 0.0005044199933763593,
 0.0005603640020126477]

## Timing user defined function

Timeit creates its own namespace - which means that variables, functions, ... are not avaible in timeit, unless they are imported in the setup argument:

In [25]:
timeit.repeat(stmt="g.gridloop(myfunc)", setup="from __main__  import g, myfunc", repeat=5, number=1)

[2.6584489940141793,
 2.695956929004751,
 2.906824637990212,
 3.2314149979793,
 3.2274685070151463]

## Profiling modules with cProfile

A profile is a set of statistics that describes how often and for how long various parts of the program executed.

`cProfile` is two main (deterministic) profiling module in Python.

### Two options to use cProfile

1. As a script: `python -m cProfile script.py`
2. As a module:

In [10]:
import cProfile
pr = cProfile.Profile()
res = pr.run("g.gridloop(myfunc)")  # res contains the statistics

## Getting runtime statistics

In [11]:
res.print_stats()                   # Print profiling statistics 

         1002014 function calls in 2.780 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(size)
        1    0.688    0.688    2.780    2.780 <ipython-input-1-005a33cae72c>:11(gridloop)
  1002001    2.092    0.000    2.092    0.000 <ipython-input-3-7e5b651fb108>:1(myfunc)
        1    0.000    0.000    2.780    2.780 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 fromnumeric.py:3113(_size_dispatcher)
        2    0.000    0.000    0.000    0.000 fromnumeric.py:3117(size)
        1    0.000    0.000    2.780    2.780 {built-in method builtins.exec}
        2    0.000    0.000    0.000    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
        1    0.001    0.001    0.001    0.001 {built-in method numpy.zeros}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objec

Meaning of clumns:
* **ncalls**: number of calls
* **tottime**: total time spent in the given function excluding time made in calls to sub-functions
* **percall**: tottime divided by ncalls
* **cumtime**: cumulative time spent in this and all subfunctions
* **percall**: cumtime divided by ncalls
* **filename:lineno(function)**: information about the function

## Back to our case-study: What have we learned so far?

The biggest contributors of the total runtime are:
   1. `gridloop` contributes one third of the total runtime.
   2. `myfunc` calls contributes two thirds of the total runtime.

* `myfunc` is fairly straight-forward
  ```python
  def myfunc(x, y):
    return sin(x*y) + y
  ```
  Might be difficult to improve.
* What about `gridloop`?

## Recall that, `gridloop` was a function of the form

```python
def gridloop(self, f):
    lx = size(self.xcoor)
    ly = size(self.ycoor)
    a = zeros((lx,ly))

    for i in range(lx):
        x = self.xcoor[i]
        for j in range(ly):
            y = self.ycoor[j]
             a[i,j] = f(x, y)
    return a
```

It would be useful to see how much time is spend in each line!

## Line by line profiling

The line_profiler module inspects the time spend in each line of a Python function. 

## Usage

1. Install with `conda install line_profiler`
2. "Decorate" the function of interest with `@profile`:
    ```python
    @profile
    def gridloop(func):
        # ...
    ```
3. Run line profiler with:
    ```bash
    kernprof -l -v grid2d_lineprofile.py
    ```

## Demo

In [12]:
!kernprof -l -v grid2d_lineprofile.py

Wrote profile results to grid2d_lineprofile.py.lprof
Timer unit: 1e-06 s

Total time: 3.08364 s
File: grid2d_lineprofile.py
Function: gridloop at line 11

Line #      Hits         Time  Per Hit   % Time  Line Contents
    11                                               @profile
    12                                               def gridloop(self, f):
    13         1         10.0     10.0      0.0          lx = size(self.xcoor)
    14         1          4.0      4.0      0.0          ly = size(self.ycoor)
    15         1         23.0     23.0      0.0          a = zeros((lx,ly))
    16                                           
    17      1002        357.0      0.4      0.0          for i in range(lx):
    18      1001        792.0      0.8      0.0              x = self.xcoor[i]
    19   1003002     368068.0      0.4     11.9              for j in range(ly):
    20   1002001     497530.0      0.5     16.1                  y = self.ycoor[j]
    21   1002001    22

**Conclusion:** A significant amount of time is spend in loops and indexing. How can we improve this?

## A vectorised Grid2D implementation

In [13]:
class VectorisedGrid2D(object):
    def __init__(self,
                 xmin=0, xmax=1, dx=0.5,
                 ymin=0, ymax=1, dy=0.5):
        
        self.xcoor = np.arange(xmin, xmax+dx, step=dx)
        self.ycoor = np.arange(ymin, ymax+dy, step=dy)

    def gridloop(self, f):
        return f(self.xcoor[:,None], self.ycoor[None,:])  # Vectorized grid evaluation 

## Timing the vectorised Grid2D

In [14]:
vg = VectorisedGrid2D(dx=0.001, dy=0.001)
min(timeit.repeat(stmt="vg.gridloop(myfunc)", setup="from __main__  import vg, myfunc", repeat=5, number=1))

0.013148885991540737

In [15]:
g = Grid2D(dx=0.001, dy=0.001)
min(timeit.repeat(stmt="g.gridloop(myfunc)", setup="from __main__  import g, myfunc", repeat=5, number=1))

1.6837359740020474

Vectorization yielded a 50-100x speed improvement!