# Solving Performance Issues

So: you've done all the things suggested in the last page on [Performance basics](performance_understanding.ipynb) that you can do within the constraints of your project, and you still have a performance problem. Now what?

## Profiling Code

**If you take nothing else away from this page, *please* read and remember this section!**

There's no reason to tune a line of code that is only responsible for 1/100 of your running time, so before you invest in speeding up your code, figure out *exactly* what in your code is causing it to be slow --  a process known as "profiling". 

Thankfully, because this is so important, there are lots of tools (called profilers) for measuring exactly how long your computer is spending doing each step in a block of code. Here are a couple, with some demonstrations below:

- Profiling in R: the two packages I've seen used most are [Rprof](http://www.stat.berkeley.edu/~nolan/stat133/Fall05/lectures/profilingEx.html) and [lineprof](http://adv-r.had.co.nz/Profiling.html#measure-perf).
- Profiling in Python: if you use Jupyter Notebooks or Jupyter Labs, you can use the [prun tool](http://pynash.org/2013/03/06/timing-and-profiling.html). If for some reason you're not using Jupyter, here's a [guide to a few other tools](https://zapier.com/engineering/profiling-python-boss/).


### Profiling Example

To illustrate, let's write a function (called `my_analysis`) which we can pretend is a big analysis that's causing me problems. Within this analysis we'll place several functions, most of which are fast, but one of which is slow. To make it *really* easy to see what is fast and what is slow, these functions will just call the `time.sleep()` function, which literally just tells the computer to pause for a given number of seconds (i.e. `time.sleep(10)` makes execution pause for 10 seconds). 

In [2]:
import time

def a_slow_function():
    time.sleep(5)
    return 1
    
def a_medium_function():
    time.sleep(1)
    return 1

def a_fast_function():
    return 1
    
def my_analysis():
    x = 0
    x = x + a_slow_function()
    x = x + a_medium_function()
    x = x + a_fast_function()
    print(f'the result of my analysis is: {x}')

my_analysis()

the result of my analysis is: 3


Now we can profile this code with the IPython magic `%prun`:

In [3]:
%prun my_analysis()

the result of my analysis is: 3
 

         44 function calls in 6.009 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    6.009    3.004    6.009    3.004 {built-in method time.sleep}
        3    0.000    0.000    0.000    0.000 socket.py:337(send)
        1    0.000    0.000    6.009    6.009 {built-in method builtins.exec}
        1    0.000    0.000    6.009    6.009 <ipython-input-2-2718bcdb1d57>:14(my_analysis)
        3    0.000    0.000    0.000    0.000 iostream.py:197(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:384(write)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        3    0.000    0.000    0.000    0.000 threading.py:1080(is_alive)
        1    0.000    0.000    1.004    1.004 <ipython-input-2-2718bcdb1d57>:7(a_medium_function)
        3    0.000    0.000    0.000    0.000 threading.py:1038(_wait_for_tstate_lock)
        2    0.000    0.000    0.000    0.000 iostream.py:309(

The output shows a number of things, but the most important are `tottime` and `cumtime`. 

From `tottime` we can see that 6 seconds was dedicated to running `time.sleep()`.  

From `cumtime`, you can also see *in which functions* `time.sleep()` took the most time. As you can see, `cumtime` is not equal to the total time the function took to run -- rather, it's all the time spent within each function. `time.sleep()` has a `cumtime` of 6.009 because a total of 6 seconds was spend while that function ran, but it is *also* the case that `a_slow_function` (listed as `<ipython-input-2-2718bcdb1d57>:3(a_slow_function)`) has a `cumtime` of 5 seconds (because that function was in the process of executing when `time.sleep()` paused for 5 seconds).  

From this, we can deduce that `time.sleep()` was slowing down our code, and that the occurance of `time.sleep()` that slowed down our code the most was in `a_slow_function`. 

## Speeding Code with Cython

There are two libraries designed to allow you to massively speed up Python code. The first is called `Cython`, and it is a way of writing code that is basically Python with type declarations. For example, if you wanted to add up all the numbers to a million in Python, you could write something like the following (obviously not the most concise way to do it, but you get the idea): 

In [10]:
def avg_numbers_up_to(N):
    adding_total = 0
    for i in range(N):
        adding_total = adding_total + i
    
    avg = adding_total / N
    
    return avg

But in Cython, you would write:

```cython 
    def avg_numbers_up_to(int N):
        cdef int adding_total

        adding_total = 0

        for i in range(N):
            adding_total = adding_total + i

        cdef float avg
        avg = adding_total / N

        return avg
```

Then to integrate this into your Python code, you would save this function definition into a new file (with the suffix `.pyx` (say, `avg_numbers.pyx`), and put this code at the top of your Python script: 

```python
from distutils.core import setup
from Cython.Build import cythonize

setup(ext_modules=cythonize('avg_numbers.pyx'))
```

Then you can call your Cythonized function (`avg_number_up_to`) in your normal Python script, but you'll now find it runs ~10x - 100x faster! (Note that this speedup is only likely when compared to *pure python* code. If you're comparing Cython to a library function that was already written in C, youre Cythonized Python is unlikely be any faster (and may be slower) than that library function. 

### Cython limitations

There are a few limitations to be aware of, however: 

- Cython only really works with (a) native Python and (b) NumPy ([numpy instructions here](https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#numpy-tutorial)). *Some* other libraries are / can be supported, but it's not nearly as straightfoward as the example above. 
    - [Here's a guide to use with pandas.](https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html)
- The function you write will *not* be dynamically typed, so if you said the function would accept integers, you can only give it integers. 
- Distributing code you write with Cython can be tricky. 

## Speeding Code with Numba
