# Benchmarking in Python

Now, we want to be able to measure some performance statistics to truly prove that we have improved our software application's performance over many conditions, not just the perfect condition.

This means that we'd like to send our code out as an actual job on DevCloud (to simulate dedicated, exclusive server processes), rather than just testing within the JupyterLab development environment.  In this notebook, we'll cover the different benchmarking interfaces that Python has.  

If you are using a different language, feel free to skip or skim through this notebook so you can verify that you understand your language's benchmarking interfaces.

We'll cover the Linux systemwide benchmark interfaces in Component 3.

## Benchmarking interfaces in Python

There are a few types of benchmarking interfaces we can use:

| Interface | When to use |
|-----------|-------------|
| `time` | The base library that keeps track of time.  You can use `time.process_time()` to get CPU time so far and `time.perf_counter()` to get Wall time.  A simple subtraction of points marked by calls to `time.perf_counter()` can get you a quick estimation of your run time.  |
| `timeit` | A built-in library dedicated to measuring Python code timing, complete with repeats.  |
| IPython/Jupyter `%%timeit` | A modified interface that allows for advanced timing.  Can only run in the ipython interpreter or in a Jupyter notebook. |
| `tracemalloc` | A built-in library dedicated to measuring and tracing memory blocks allocated by Python. |
| `cProfile` | The Python profiler, which does not technically benchmark but can help determine any functions that run too often or take too much time. |


## Using `timeit` to get truer results

If we just want a quick check, we can use the counters in the `time` module to see how quickly our code runs.  However, this is usually just one run, which may be confounded by cache misses and other miscellaneous events that cause our program to run slower than the second or subsequent runs while our program is active.

Thus, we can use the `timeit` library instead, so the same version of our code is repeated a few times such that we can get a truer minimum.  As noted in the [Python documentation](https://docs.python.org/3/library/timeit.html#timeit.Timer.repeat), the minimum value if all we need.

Let's try using `timeit` to time an example program's `main()` function.  It is important that the `main()` function does not take any input parameters (arguments), otherwise we will have to use the `setup` input parameter for `timeit.timeit()`.

In [None]:
import numpy as np


def compute_sum(in_array: np.ndarray) -> float:
    output = 0.0
    for item in in_array:
        output += item
    return output


def main() -> None:
    example_data = np.linspace(start=-23.4, stop=72.4, num=200)
    compute_sum(example_data)

In [None]:
import timeit

timer_main = timeit.Timer(main)
timer_main.repeat(repeat=10, number=10)  # 10 measurements of 10 samples

Examine the list (array) returned by the `timeit.Timer.repeat()` function.  Notice how the first couple of runs had longer times!  If you were looking at the long-term bounds of execution wall time, then you'd just pick the minimum as the documentation says, because your program should have either exclusive CPU cores or program code cached.

We could get rid of cache effects by using a random number generator instead:

In [None]:
def random_main() -> None:
    example_data = np.random.default_rng().uniform(low=-24.5, high=72.4, size=200)
    compute_sum(example_data)

In [None]:
timer_random_main = timeit.Timer(random_main)
timer_random_main.repeat(repeat=10, number=10)

We can see that there's definitely some more variance (likely from the generation of the random numbers, which is quite slow when using just NumPy), but overall, it follows the same pattern of the first run being much longer than the rest of the runs.  The minimum run time is pretty much the same even with random numbers, so we can reasonably assume that we have an accurate timing of our program. 

You're now ready to use the `timeit` interface!

If you'd like more information, here are some additional resources:

- https://docs.python.org/3/library/timeit.html
- https://note.nkmk.me/en/python-timeit-measure/
- https://cvw.cac.cornell.edu/python/timing

## Using `tracemalloc` to gather memory data

The [`tracemalloc`](https://docs.python.org/3/library/tracemalloc.html) module allows us to see how much memory is used by our Python code.  Using the example functions above, let's get a glimpse for how to use it:

In [None]:
import tracemalloc as tm

tm.start()
tm.reset_peak()
tm.clear_traces()

timer_random_main.repeat(repeat=10, number=10)

current_size, peak_size = tm.get_traced_memory()

print(f"current size: {current_size / 1024} KiB; "
      f"peak size: {peak_size / 1024} KiB")

As we can see, we can get the amount of memory used by the Python interpreter after code execution while waiting for additional data, but we can also get the maximal size of memory allocated by Python to run our code.

The peak is what matters the most, as it represents the worst possible case of our application.  We use the `timeit` module so that we can run repeats such that we get absolutely the worst-case peak size.

You're now ready to use the `tracemalloc` module!

If you'd like more information about `tracemalloc`, here are some additional resources:

- https://docs.python.org/3/library/tracemalloc.html
- https://tech.buzzfeed.com/finding-and-fixing-memory-leaks-in-python-413ce4266e7d

## Using `cProfile` to receive granular details

In [None]:
import cProfile

with cProfile.Profile() as profiler:
    profiler.runcall(timer_random_main.repeat, repeat=10, number=10)
    profiler.print_stats()

As you can see, `cProfile` is very helpful and can break code runs down for us!  We know the number of times a function was called, what its total time was (anything less than one second total is nearly negligible in most non-embedded settings).

If you'd like more information, here are some additional resources:

- https://docs.python.org/3/library/profile.html#module-cProfile
- https://docs.python.org/3/library/profile.html#module-pstats
- https://cerfacs.fr/coop/python-profiling