<table style="float:left;">
<tbody>
<tr>
<td ><img src="https://static1.squarespace.com/static/5992c2c7a803bb8283297efe/t/59c803110abd04d34ca9a1f0/1530629279239/" alt="Kenzie Logo" width="93" height="93" /></td>
<td >
<h1>&nbsp;A Brief Tour of Timeit&nbsp;</h1>
</td>
</tr>
</tbody>
</table>

<p style="float:left"><a href="https://colab.research.google.com/github/KenzieAcademy/python-notebooks/blob/master/demo_timeit.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" width="188" height="32" /> </a>
</p>

The timeit module provides a simple interface for determining the execution time of small bits of Python code. It uses a platform-specific time function to provide the most accurate time calculation possible and reduces the impact of start-up or shutdown costs on the time calculation by executing the code repeatedly.

The module function `timeit.timeit(stmt, setup, timer, number)` accepts four arguments:

- `stmt` which is the statement you want to measure; it defaults to 'pass'.
- `setup` which is the code that you run before running the stmt; it defaults to 'pass'.
- `timer` which is a timeit.Timer object; it usually has a sensible default value so you don't have to worry about it.
- `number` which is the number of executions you'd like to run the stmt.

Where the `timeit.timeit()` function returns the number of seconds it took to execute the code.

In [None]:
# First, here is a naiive way to time a function
import time
start = time.time()
# my function code goes here
total = time.time() - start

### It's okay, but ...
This method is good for a quick look at a single time cost of a function.  But you may be mislead by the result:  There are background processes e.g. "[Garbage Collection (GC)](https://rushter.com/blog/python-garbage-collector/)" that may happen while you are timing.  This background process will throw off your result.

It's better to use the built-in `timeit` module for more accurate measurements.


In [None]:
import timeit

def costly_func():
    """Joins numbers together"""
    return "-".join(str(n) for n in range(10))

# Measure it since costly_func is a callable without argument
# costly_func will be called 1000000 times (the default)
result = timeit.timeit(costly_func)
print("Callable  cost: {:.4} seconds".format(result))

In [None]:
# Try it as a quoted statement -- also called 1000000 times
result = timeit.timeit('"-".join(str(i) for i in range(10))')
print("Statement cost: {:.4} seconds".format(result))

Remember that `timeit` measures the CUMULATIVE time over 1000000 calls.  One million = 1_000_000.  In order to get the time cost of a single call to the function, we divide by 1000000.  This is an easy conversion -- it's the same number just in microseconds.  One millionth of a second is a microsecond.

In [None]:
result = timeit.timeit('"-".join(str(i) for i in range(10))')
print("Cumulative time cost: {:.4} seconds".format(result))
print("Per call time cost  : {:.4} seconds".format(result/1000000))

Why is the cost of running as a statement, LESS than the cost of the callable method (above) ?  That's a [good question](https://stackoverflow.com/questions/55187176/why-does-the-timeit-function-return-different-results-when-handed-a-function-v) ... The additional time cost is from setting up for a function call.

Let's find out how much time is spent just setting up for a function call:

In [None]:
# A function that does nothing (NO OPERATION)
def no_op_func():
    pass

# The theory is that t_stmt + t_noop = t_costly
t_noop = timeit.timeit(no_op_func)
t_costly = timeit.timeit(costly_func)
t_stmt = timeit.timeit('"-".join(str(i) for i in range(10))')
print("Does {:.2} + {:.2} ≈ {:.2} ?".format(t_stmt, t_noop, t_costly))

### Using the Jupyter %timeit
Here are three different ways of creating a string of 100 integers.

In [None]:
%timeit '"-".join(str(n) for n in range(10))' # generator

%timeit '"-".join([str(n) for n in range(10)])' # list comprehension

%timeit '"-".join(map(str, range(10)))' # map function

# Wait for it .... (below)

### Importing the timeit module

In [None]:
import timeit
n = 10000
t1_sec = timeit.timeit('"-".join(str(n) for n in range(100))', number=n) # generator
t2_sec = timeit.timeit('"-".join([str(n) for n in range(100)])', number=n) # list
t3_sec = timeit.timeit('"-".join(map(str, range(100)))', number=n) # map

t1_us = t1_sec / n * 1e+6
t2_us = t2_sec / n * 1e+6
t3_us = t3_sec / n * 1e+6
print(f'Generator method: {t1_us:.4f} usec')
print(f'list comprehension method: {t2_us:.4f} usec')
print(f'map method: {t3_us:.2f} usec')

**Note** By default, timeit() temporarily turns off garbage collection during the timing. The advantage of this approach is that it makes independent timings more comparable. This disadvantage is that GC may be an important component of the performance of the function being measured. If so, GC can be re-enabled as the first statement in the setup string. For example:

> `timeit.Timer('for i in xrange(10): oct(i)', 'gc.enable()').timeit()`


### Creating a timeit.Timer class object
We can use an instance of a `timeit.Timer` object to explore the `timer` and `repeat` methods

In [None]:
import timeit

# Create the instance
t = timeit.Timer("print('main statement')", "print('setup')")

print('TIMEIT number=2:')
print(t.timeit(number=2))

print('REPEAT repeat=3 number=2:')
print(t.repeat(repeat=3, number=2))

## Storing Values in a Dictionary
These snippets use `timeit` to compare various methods of populating a dictionary

This more complex example compares the amount of time it takes to populate a dictionary with a large number of values using a variety of methods. First, a few constants are needed to configure the Timer. The setup_statement variable initializes a list of tuples containing strings and integers that will be used by the main statements to build dictionaries using the strings as keys and storing the integers as the associated values.


In [None]:
# First, a few setups ...
range_size = 1000
count = 1000
setup_statement = ';'.join([
    "l = [(str(x), x) for x in range(1000)]",
    "d = {}",
])
setup_statement

In [None]:
# A helper function to print results
def show_results(result):
    """Print microseconds per pass and per item."""
    global count, range_size
    per_pass = 1000000 * (result / count)
    print('{:6.2f} usec/pass'.format(per_pass), end=' ')
    per_item = per_pass / range_size
    print('{:6.2f} usec/item'.format(per_item))


print("{} items".format(range_size))
print("{} iterations".format(count))
print()

To establish a baseline, the first configuration tested uses __setitem__(). All of the other variations avoid overwriting values already in the dictionary, so this simple version should be the fastest.

The first argument to Timer is a multi-line string, with white space preserved to ensure that it parses correctly when run. The second argument is a constant established to initialize the list of values and the dictionary.

In [None]:
import textwrap
# Using __setitem__ (square brackets []) without checking for existing values first
print('__setitem__:', end=' ')
t = timeit.Timer(
    textwrap.dedent(
        """
        for s, i in l:
            d[s] = i
        """),
    setup_statement,
)
show_results(t.timeit(number=count))

In [None]:
# Using setdefault method
print('setdefault :', end=' ')
t = timeit.Timer(
    textwrap.dedent(
        """
        for s, i in l:
            d.setdefault(s, i)
        """),
    setup_statement,
)
show_results(t.timeit(number=count))


This method adds the value only if a KeyError exception is raised when looking for the existing value.

In [None]:
# Using exceptions
print('KeyError   :', end=' ')
t = timeit.Timer(
    textwrap.dedent(
        """
        for s, i in l:
            try:
                existing = d[s]
            except KeyError:
                d[s] = i
        """),
    setup_statement,
)
show_results(t.timeit(number=count))

And the last method uses “in” to determine if a dictionary has a particular key.

In [None]:
# Using "in"
print('"not in"   :', end=' ')
t = timeit.Timer(
    textwrap.dedent(
        """
        for s, i in l:
            if s not in d:
                d[s] = i
        """),
    setup_statement,
)
show_results(t.timeit(number=count))

## Conclusions
- Use the `timeit` module for more accurate measurements of small code samples.  It collects results over many iterations of the code.
- Use `timeit` to try out different approaches to solving an algorithm, transform, or other problem.  Compare the times and pick the approach that yields the fastest time.