# Profiling and Timing Code
#### Cecelia Henson

In the process of developing code and creating data processing pipelines, there are often trade-offs you can make between various implementations.
Early in developing your algorithm, it can be counterproductive to worry about such things. As Donald Knuth famously quipped, "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

Once you have your code working, it can be useful to dig into its efficiency a bit.
Sometimes it's useful to check the execution time of a given command or set of commands; other times it's useful to dig into a multiline process and determine where the bottleneck lies in some complicated series of operations.
IPython provides access to a wide array of functionality for this kind of timing and profiling of code.
Here we'll discuss the following IPython magic commands:

- ``%time``: Time the execution of a single statement
- ``%timeit``: Time repeated execution of a single statement for more accuracy
- ``%prun``: Run code with the profiler


## Timing Code Snippets: ``%timeit`` and ``%time``

``%timeit`` line-magic and ``%%timeit`` cell-magic can be used to time the repeated execution of snippets of code:

In [1]:
%timeit sum(range(100))

1.05 µs ± 66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Note that because this operation is so fast, ``%timeit`` automatically does a large number of repetitions.
For slower commands, ``%timeit`` will automatically adjust and perform fewer repetitions:

In [2]:
%%timeit
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j

328 ms ± 9.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Sometimes repeating an operation is not the best option.
For example, if we have a list that we'd like to sort, we might be misled by a repeated operation.
Sorting a pre-sorted list is much faster than sorting an unsorted list, so the repetition will skew the result:

In [3]:
import random
L = [random.random() for i in range(100000)]
%timeit L.sort()

1.27 ms ± 65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


For this, the ``%time`` magic function may be a better choice. It also is a good choice for longer-running commands, when short, system-related delays are unlikely to affect the result.
Let's time the sorting of an unsorted and a presorted list:

In [4]:
import random
L = [random.random() for i in range(100000)]
print("sorting an unsorted list:")
%time L.sort()

sorting an unsorted list:
Wall time: 19.2 ms


In [5]:
print("sorting an already sorted list:")
%time L.sort()

sorting an already sorted list:
Wall time: 2.96 ms


Notice how much faster the presorted list is to sort, but notice also how much longer the timing takes with ``%time`` versus ``%timeit``, even for the presorted list!
This is a result of the fact that ``%timeit`` does some clever things under the hood to prevent system calls from interfering with the timing.
For example, it prevents cleanup of unused Python objects (known as *garbage collection*) which might otherwise affect the timing.
For this reason, ``%timeit`` results are usually noticeably faster than ``%time`` results.

For ``%time`` as with ``%timeit``, using the double-percent-sign cell magic syntax allows timing of multiline scripts:

In [6]:
%%time
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j

Wall time: 438 ms


For more information on ``%time`` and ``%timeit``, as well as their available options, use the IPython help functionality (i.e., type ``%time?`` at the IPython prompt).

## Profiling Full Scripts: ``%prun``

A program is made of many single statements, and sometimes timing these statements in context is more important than timing them on their own.
Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function ``%prun``.

By way of example, we'll define a simple function that does some calculations:

In [7]:
def sum_of_lists(N):
    total = 0
    for i in range(5):
        L = [j ^ (j >> i) for j in range(N)]
        total += sum(L)
    return total

Now we can call ``%prun`` with a function call to see the profiled results:

In [8]:
%prun sum_of_lists(1000000)

 

In the notebook, the output is printed to the pager, and looks something like this:

```
14 function calls in 0.714 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5    0.599    0.120    0.599    0.120 <ipython-input-19>:4(<listcomp>)
        5    0.064    0.013    0.064    0.013 {built-in method sum}
        1    0.036    0.036    0.699    0.699 <ipython-input-19>:1(sum_of_lists)
        1    0.014    0.014    0.714    0.714 <string>:1(<module>)
        1    0.000    0.000    0.714    0.714 {built-in method exec}
```

The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of execution time is in the list comprehension inside ``sum_of_lists``.
From here, we could start thinking about what changes we might make to improve the performance in the algorithm.

For more information on ``%prun``, as well as its available options, use the IPython help functionality (i.e., type ``%prun?`` at the IPython prompt).

# Memory Profiling
In many cases it is useful to know how much storage space something requires.  Python's system library (sys) provides a getsizeof() method that returns the number of bytes used by the variable.  

In [9]:
import sys
my_int = 1
my_string = "hello"
my_double = 3.14159

print(sys.getsizeof(my_int))
print(sys.getsizeof(my_string))
print(sys.getsizeof(my_double))

28
54
24


In the following cell, add at least 4 more new objects and primitives to test their sizes.  See if you can find the primitive that takes the least space.  Do big numbers require more storage?  How is python different than Java when it comes to primitive storage sizes?

Answer: The primative that takes the most space are the double and floatng points. Bigger numbers do not require more storage because having an int be 1 or being 70,000 it still resulted in the same amount of storage space that it requires. Python is different than Java when it comes to primative storage sizes because Jave stores primatives using 32 bit memory 

In [10]:
test_value_1 = 70000
test_value_2 = "This is a longer string than before"
test_value_4 = 70.238291183920
test_value_5 = 0.24313618

print(sys.getsizeof(test_value_1))
print(sys.getsizeof(test_value_2))
print(sys.getsizeof(test_value_4))
print(sys.getsizeof(test_value_5))

28
84
24
24


In the next cell, we explore what size differences exist between various lists and other data structures.  Add a data structure for a dict as well to see what size it is.  Why are some of these are equal to each other?  Reading the documentation might be helpful: https://docs.python.org/3/library/sys.html

Answer: The test.clear() and test.reverse() were the same size because with sys.getsizeof() it only accounts for the memory consumption that is directly affecting the object not what it refers to, this means that the size of the action on the data structrue is seen as the same when affecting test.

In [11]:
print(sys.getsizeof([]))
print(sys.getsizeof([1]))
print(sys.getsizeof(['a quick brown fox jumped over the lazy dog']))
print(sys.getsizeof([1,2,3,4,5,6,7,8,9]))
print(sys.getsizeof(["a","quick","brown","fox","jumped","over","the","lazy","dog"]))
print('\n')
test = ['a', 'b', 'c', 'd']
test_set = {1, 2, 3, 4}

print(sys.getsizeof(dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])))

print(sys.getsizeof(test_set))
print(sys.getsizeof(test))

print(sys.getsizeof(test.clear()))
print(sys.getsizeof(test.reverse()))

56
64
64
152
152


232
216
120
16
16


If you haven't figured it out yet, the memory that is reported only accounts for the space to store the reference to the object, not to the underlying objects contained therein.  Therefore, we need a recursive way to find the actual memory footprint of the object.  Consider the following method...

In [12]:
from sys import getsizeof,stderr
from itertools import chain
from collections import deque
try:
    from reprlib import repr
except ImportError:
    pass

def total_size(o, handlers={}, verbose=False):
    """ Returns the approximate memory footprint an object and all of its contents.

    Automatically finds the contents of the following builtin containers and
    their subclasses:  tuple, list, deque, dict, set and frozenset.
    To search other containers, add handlers to iterate over their contents:

        handlers = {SomeContainerClass: iter,
                    OtherContainerClass: OtherContainerClass.get_elements}
                    
    Source: https://code.activestate.com/recipes/577504/
    """
    dict_handler = lambda d: chain.from_iterable(d.items())
    all_handlers = {tuple: iter,
                    list: iter,
                    deque: iter,
                    dict: dict_handler,
                    set: iter,
                    frozenset: iter,
                   }
    all_handlers.update(handlers)     # user handlers take precedence
    seen = set()                      # track which object id's have already been seen
    default_size = getsizeof(0)       # estimate sizeof object without __sizeof__

    def sizeof(o):
        if id(o) in seen:       # do not double count the same object
            return 0
        seen.add(id(o))
        s = getsizeof(o, default_size)

        if verbose:
            print(s, type(o), repr(o), file=stderr)

        for typ, handler in all_handlers.items():
            if isinstance(o, typ):
                s += sum(map(sizeof, handler(o)))
                break
        return s

    return sizeof(o)


In the following cell, use the method above to calculate the actual size of the objects you created above.  Please record any reactions to this exercise in the last cell.  

In [13]:
print(total_size(test_value_4))
print(total_size(test))
print(total_size(test_set))
print(total_size(dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])))
print(total_size([]))

24
56
328
476
56


# Put reactions to this assignment here

When I calculated the total_size function with primative objects the value did not change but when I used python data structures like list and sets I got a different number for the total size of the object. My list value when using sys.getsizeof() was 120 but when I used the total_size method I got 56 as the actual size of my list object. When I did my set value when using sys.getsizeof() I got a result of 216 but with the total_size method it ended uo being larger and having a true total size of 328

*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

This content has been modified by Dr. Derek Riley