<img src='img/anaconda-logo.png' align='left' style="padding:10px">
<br>
*Copyright Continuum 2012-2016 All Rights Reserved.*

# Accelerate Profiler

The Python standard library includes tools to profile code. The `accelerate.profiler` module extends that functionality.

## Table of Contents
* [Accelerate Profiler](#Accelerate-Profiler)
	* [Set-up](#Set-up)
* [Using the ``accelerate.profiler``](#Using-the-accelerate.profiler)
* [Printing Profiler Results](#Printing-Profiler-Results)
	* [Function Signatures, Typing](#Function-Signatures,-Typing)
	* [Profiling Numpy, shape and dtype](#Profiling-Numpy,-shape-and-dtype)
* [Visualizing Profiler Results](#Visualizing-Profiler-Results)
* [Profiling Compiled Code](#Profiling-Compiled-Code)


## Set-up

In [None]:
from accelerate import profiler

# Using the ``accelerate.profiler``

Let's start with a demonstration of how to use the `accelerate.profiler`

Define a code block you wish to profile: for this example, we use an implementation of the [Wallis product](https://en.wikipedia.org/wiki/Wallis_product) for estimating the value of $\pi$.

In [None]:
def compute_pi(n=1000000):
    pi = 2.0
    for i in range(1,n):
        tmp = 4*i**2
        pi*=tmp/(tmp-1)
    return pi

In [None]:
%timeit compute_pi()

The steps in using the Accelerate profiler are:
* Import the accelerate profiler
* Construct a `Profile()` object
* enable the profile object
* execute the code you wish to be profiled
* disable the profile object

In [None]:
p = profiler.Profile()
p.enable()
compute_pi()  # call the code you want to profile
p.disable()

# Printing Profiler Results

To see the results, use the `print_stats()` method on the `profile` object:

Notice the ***`tottime`*** column, just as seen in `cProfile`. 

In [None]:
p.print_stats()

Notice the first row, last column, reference to `compute_pi(n:int)`... 

## Function Signatures, Typing

Notice the first row in the output:
* the **`tottime`** column is largest for the first row
* the **`filename:lineno(function)`** column shows **`compute_pi(n:int)`**
* notice that the function input parameter name and type are reported as **`n:int`**

Recall that the `cProfile` module reports how much time is spent in each function. 
* Often the precise control flow (and thus function performance) depends on the ***argument types***.
* So `accelerate.profiler` extends profiling functionality by ***also recording the function signature.***

However, this change has important implications to the way profiling works. Multiple invocations to a given function was accounted for in a single profile stats entry, while with Accelerate profiler, they ***generate different entries, depending on their argument types***.

## Profiling Numpy, shape and dtype

For numpy array types, this includes not only the **dtype** attribute, but also the array's **shape**. 

To demonstrate this, let's profile the numpy implementation of the Wallis product for estimating $\pi$.

In [None]:
import numpy as np

def compute_pi_np(n=1000000):
    series=np.arange(1,n)**2*4.
    series/=(series-1)
    return 2.*series.prod()

In [None]:
from accelerate import profiler
p = profiler.Profile()
p.enable()
compute_pi_np()
p.disable()

When we now `print_stats()`, notice the profiler reveals both the function signature `compute_pi_np(n:int)` and the array details `ndarray(dtype=float64, shape=(999999,)`.

In [None]:
p.print_stats()

For another example, let us define a simple `dot()` function, and inspect the resulting print output for the `accelerate.profiler`

In [None]:
import numpy as np

def dot(a, b):
    sum=0
    for i in range(len(a)):
        sum += a[i]*b[i]
    return sum

In [None]:
# prepare data
a = np.arange(16, dtype=np.float32)
b = np.arange(16, dtype=np.float32)

In [None]:
from accelerate import profiler

In [None]:
# run profiler
p = profiler.Profile() # add `signatures=False` to get the original behaviour
p.enable()
dot(a, b)
p.disable()

Notice that `print_stats()` reveals details about the numpy array in the function signature as `dot(a:ndarray(dtype=float32, shape=(16,)))`

In [None]:
p.print_stats()

# Visualizing Profiler Results

Accelerate Profiler can also generate an ***interactive plot*** for visualizing code performance.

When looking at the visual output, mouse-over the visual segments, and notice that code run in the notebook is identified under the **`File:`** section:

* `ipython-input-9` refers to `In [9]:`, the cell in which `dot()` was defined
* `ipython-input-12` refers to `In [12]:`, the cell in which `dot()` was called and the profiler object was enabled.

In [None]:
profiler.plot(p)

Calling the same function with different signatures will allow us to inspect how the results of the profiling depend on input type.

In [None]:
# prepare data
a = np.arange(16, dtype=np.float32)
b = np.arange(16, dtype=np.float64)
c = a.reshape(2, 8)
d = b.reshape(2, 8)

In [None]:
# run profiler
p = profiler.Profile()
p.enable()
dot(a, b)
dot(c, d)
p.disable()
p.print_stats()

Note: 2D dot spend most of the time.

In [None]:
profiler.plot(p)

# Profiling Compiled Code

The profiled performance may change from run to run. In particular, let's look at the profiling results of a block of code that uses the numba jit.

Define a simple `count()` function:

In [None]:
def count(n):
    c = 0
    for i in range(n):
        c += i
    return c

Now wrap this `count()` with the number jit:

In [None]:
# create is the jit-compiled version of count
from numba import jit
jit_count = jit(count)

Define a function that will run both `count()` and `jit_count()`:

In [None]:
def run_both(n=100000):
    count(n)        # run interpreted version
    jit_count(n)    # run jitted version

**Run the profile cell once**

During the first execution of the cell above, the `jit_count()` function is being compiled.  

The compilation dominates the execution time. Look for `CPUDsipatcher.compile()` in the mouse-overs of the visualize profiler results.

In [None]:
p = profiler.Profile()
p.enable()
run_both()
p.disable()
profiler.plot(p)

**Run the profie cell again, below,** and you will see that _numba_ is using the **previously compiled** function and the pure python `count()` function now dominates the execution time.

In [None]:
p = profiler.Profile()
p.enable()
run_both()
p.disable()
profiler.plot(p)

---
*Copyright Continuum 2012-2016 All Rights Reserved.*