# DS-GA-3001 Advanced Python for Data Science

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [1]:
NAME = "Jiayi Lu(jl6583)"
COLLABORATORS = ""

---

# Python Peformance Tips

In this module, we will look at some simple ways for improving the performance of Python programs. As you work through the module, you will get the opportunity to try out each of the techniques and observe the results. The idea is not to get through the work as quickly as possible, but rather to understand the reasoning behind the improvements so you can apply them to other Python programs.

## Background

Python is a dynamic interpreted language. The "dynamic" part means that the types of variables, functions arguments, etc. are not known until the program runs. An interpreted language is one that is directly executed, or compiled to bytecode for an abstract machine, rather than being compiled to the native object code and executed on a computer system. Python take the second approach, and compiles the source to bytecode (\*.pyc files) which is then interpreted by the Python interpreter (CPython).

While dynamic interpreted languages have great flexibility, they also suffer from significant performance limitations when compared to languages that compile directly to native object code. There are two primary reasons for this. The first is due to the dynamic nature of the language. Because the types of Python objects (variables, function arguments, etc.) are not known until the program is run, it is very difficult to optimize the interpreter, since it is not possible to know what the bytecode is going to do before it is executed. The second issue is that the bytecode generated from compiling Python is targeted at a Python Virtual Machine (PVM), not at the hardware that the program is running on. The Python interpreter must convert the PVM instructions into hardware instructions in order to execute the bytecode. This coversion is always going to carry some overhead, and is why languages such as Java are always somewhat slower than native languages such as C or C++.

A good analysis of the Python interpreter is available [here](http://akaptur.com/blog/2013/11/15/introduction-to-the-python-interpreter).

Although Python is a relatively slow language, that doesn't stop it from being useful for hight performance computing or for Data Science applications. There are a variety of techniques that can be used to improve performance, and we will be covering some of these here.

## Timing Python code

In order to evaluate the usefulness of these performance improvement techniques, we are going to be timing how long it takes run snippits of code. While this does not provide an accurate measure of the performance, it does give an idea of the relative improvement (or decrease) in performance.

Python provides the `timeit` module for [measuring the execution time of small code snippits](https://docs.python.org/2/library/timeit.html). This can be called from the command line, or by importing it into an exisiting program.

In [2]:
import timeit
def f():
    y = 3.1415
    for x in range(100):
        y = y ** 0.7
    return y

print timeit.timeit(f, number=1000000)

13.3676941395


The `timeit` function will run the code a set number of times (in this case 1000000) and then report how long it took to run (in seconds), so to find out how long it took to run once, divide the result by 1000000. The `repeat` function can be used to perform multiple runs if desired (default 3) and report the result for each run.

An even easier method is to use the IPython `%timeit` or `%%timeit` magic functions. These do essentially the same thing as the timeit module. The IPython `%timeit` magic function is documented [here](http://ipython.readthedocs.org/en/stable/interactive/magics.html?highlight=timeit#magic-timeit).

<div class="alert alert-success">
Use the `%timeit` magic function to time the function `f()` using the same number of loops as the previous example.
</div>

In [3]:
%timeit -n 1000000 f()

1000000 loops, best of 3: 11.9 µs per loop


<div class="alert alert-success">
The `timeit` module and `%timeit` magic function examples you ran above did not take the same amount of time to report the execution time of function `f`. Why did they take a different amount of time? (hint: read the documentation).
</div>

Because the %timeit magic function runs loop 3 times by default, however, timeit.timeit() only runs the loop once

## Use built-in functions rather than Python code

One of the easiest ways to improve Python performance is not to execute any Python code at all! Python provides a large number of built-in functions that perform a wide variety of operations. These built-in functions are written in C, and so are generally very fast. See the [Python documentation](https://docs.python.org/2/library/functions.html#built-in-functions) for a list of the available functions.

The function called `my_max` takes a list of integers as an argument and finds the maximum value. 

In [4]:
def my_max(values):
    min_value = values[0]
    for v in values:
        if v < min_value:
            min_value = v
    return min_value

import random
randlist = [random.randint(0,1000) for p in range(0,1000)]

<div class="alert alert-success">
Compare the time it takes to run your function to the builtin `max` function using the same `randlist` list of integers.
</div>

In [5]:
%timeit my_max(randlist)

10000 loops, best of 3: 42.9 µs per loop


In [6]:
%timeit max(randlist)

10000 loops, best of 3: 23.7 µs per loop


## Intrinsic operators

Another performance improvement is to use *intrinsic operators* (+, -, \*, etc.) instead of a user defined funciton. The `operator` module exports a set of efficient functions corresponding to the intrinsic operators of Python. For example, `operator.add(x, y)` is equivalent to the expression `x+y`. See the [Python documentation](https://docs.python.org/2/library/operator.html#module-operator) for a list of standard operators.

In [7]:
randlist2 = [random.randint(0,1000) for p in range(0,1000)]

In [8]:
%timeit map(lambda x,y: x+y, randlist, randlist2)

10000 loops, best of 3: 145 µs per loop


If the user defined function is simple, it can be replaced directly with the equivalent intrinsic operator:

In [9]:
import operator
operator.add(100, 300)

400

<div class="alert alert-success">
Using the same `map` function above, replace the lambda function with the equivalent intrinsic operator and measure the time it takes to run.
</div>

In [10]:
%timeit map(operator.add, randlist, randlist2)

10000 loops, best of 3: 89 µs per loop


## Function call overhead

Function call overhead in Python is relatively high, especially compared with the execution speed of a builtin function. The overhead in Python is mainly due to the dynamic type checking of function arguments that must be performed before and after the function call. This strongly suggests that where appropriate, functions should handle data aggregation rather than being called on a per element basis. 

In the following example, the function `func_f1` is called for each element in the list. 

In [11]:
x = 0
def func_f1(i):
    global x
    x = x + i
    
def func_test1():
    for i in range(10000): 
        func_f1(i)

<div class="alert alert-success">
Time how long it takes for function `func_test1` to execute.
</div>

In [12]:
%timeit func_test1()

1000 loops, best of 3: 1.76 ms per loop


In the next example, the loop is moved inside the function so that the function is only called once instead of 10000 times.

In [13]:
x = 0
def func_f2(list):
    global x
    for i in list:
        x = x + i

def func_test2():
    func_f2(range(10000))

<div class="alert alert-success">
Does moving the function call result in faster execution? Time how long it takes to run the following code compared to the previous example.
</div>

In [14]:
%timeit func_test2()

1000 loops, best of 3: 881 µs per loop


<div class="alert alert-success">
In the second example, there is still function call overhead every time `func_test2` is called. Modify the function to eliminate this overhead, then time the resulting execution.
</div>

In [15]:
def func_test2():
    global x
    for i in range(10000):
        x = x + i
%timeit func_test2()

1000 loops, best of 3: 887 µs per loop


## Membership testing with sets and dictionaries
 
Python is very fast at checking if an element exists in a `dict` or in a `set`. This is because both `dict` and `set` are implemented using a hash table. Although this seems an obvious choice of data structure for a `dict`, it is not so obvious that a `set` would also use a hash table. Therefore, if you need to check membership very often, use dict or set as your container rather than searching a list.

First, let's see how long it takes to find different elements in a list.

In [16]:
mylist = [x for x in 'abcdefghijklmnopqrstuvwxyz']

<div class="alert alert-success">
Time how long it takes to find 'a' in mylist
</div>

In [17]:
%timeit 'a' in mylist

10000000 loops, best of 3: 76.1 ns per loop


<div class="alert alert-success">
Time how long it takes to find 'z' in mylist
</div>

In [18]:
%timeit 'z' in mylist

1000000 loops, best of 3: 428 ns per loop


Now do the same thing, but use a set instead of a list.

In [19]:
myset = set('abcdefghijklmnopqrstuvwxyz')

<div class="alert alert-success">
Time how long it takes to find 'a' in myset
</div>

In [20]:
%timeit 'a' in myset

10000000 loops, best of 3: 69.3 ns per loop


<div class="alert alert-success">
Time how long it takes to find 'z' in myset
</div>

In [21]:
%timeit 'z' in myset

The slowest run took 30.63 times longer than the fastest. This could mean that an intermediate result is being cached 
10000000 loops, best of 3: 70.1 ns per loop


## String concatenation

Strings in Python are immutable, which has some advantages and disadvantages. This attrbute means that strings can be used as keys in dictionaries and individual copies can be shared among multiple variable bindings. (Python automatically shares one- and two-character strings.) However the disadvantage is that you can't say something like, "change all the 'a's to 'b's" in any given string. Instead, you have to create a new string with the desired properties. This continual copying can lead to significant inefficiencies in Python programs because it means using the '+' or '+=' operators may build new strings for each intermediate step.

In [22]:
def make_string(a_list):
    mystring = ''
    for x in a_list:
        mystring += x #so you can't set mutable objects as dictionary keys
    return mystring

mylist = [x for x in 'abcdefghijklmnopqrstuvwxyz']

<div class="alert alert-success">
Time how long it takes to call `make_string` with `mylist`.
</div>

In [23]:
%timeit make_string(mylist)

The slowest run took 9.00 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 1.99 µs per loop


<div class="alert alert-success">
Time how long it takes to create the string using `''.join(mylist)`
</div>

In [24]:
%timeit ''.join(mylist)

The slowest run took 9.11 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 523 ns per loop


## Cache results with a Python decorator

The symbol “@” is Python decorator syntax. Python decorators are normally used for tracing, locking, or logging. However, you can also decorate a Python function so that it remembers the results needed later. See the [`functools` documentation](https://docs.python.org/2/library/functools.html#functools.wraps) for more information on creating and using decorators.

The following function computes the `i`th fibonacci number for a given value of `i`. 

In [25]:
def fib(i):
    if i < 2: return 1
    return fib(i-1) + fib(i-2)


<div class="alert alert-success">
Time how long it takes to find `fib(20)`.
</div>

In [26]:
%timeit fib(20)

100 loops, best of 3: 3.33 ms per loop


Using the following code, we can create a decorator that saves each intermediate value in memory rather than calculating it every time. 

In [27]:
from functools import wraps
def cache(f):
    cache = { }
    @wraps(f) #caches the result of function
    def wrap(*arg):  #closure structure, when you need your function to return a function pointer
        if arg not in cache: cache[arg] = f(*arg)
        return cache[arg]
    return wrap

<div class="alert alert-success">
Time how long the same `fib` code takes if it is decorated with `@cache`
</div>

In [28]:
@cache
def fib(i):
    if i < 2: return 1
    return fib(i-1) + fib(i-2)

%timeit fib(20)

The slowest run took 226.35 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 296 ns per loop


## Optimizing loops

Remember that everything you put in a loop gets executed for each loop iteration. They key to optimising loops is to minimize this. Even something as simple as using the '.' notation can add extra overhead. Since Python functions can be assigned to variables, this is one way to speed up the execution. However, this comes at the cost of making the code less readable and maintainable, so should be used with caution.

The following code creates a list of the same words in `lowerlist` but first converts them to uppercase.

In [29]:
lowerlist = ['abcdefghijklmnopqrstuvwxyz'[:random.randint(0,25)] for x in range(500)]
upperlist = []

def do_append():
    for word in lowerlist:
        upperlist.append(str.upper(word))

<div class="alert alert-success">
Time how long it takes to execute `do_append`.
</div>

In [30]:
%timeit do_append()

1000 loops, best of 3: 292 µs per loop


Using the same code as above, first save the functions `str.upper` and `upperlist.append` into separate variables and replaced the calls to these functions with your variable names:

```
upper = str.upper
append = upperlist.append
```

<div class="alert alert-success">
Now try timing the loop again. This avoids a method lookup overhead each time the function is called. While this is very small, it does add up when executing a large number of loops.
</div>

In [31]:
def do_append():
    upper = str.upper
    append = upperlist.append
    for word in lowerlist:
        append(upper(word))
%timeit do_append()

1000 loops, best of 3: 230 µs per loop


In this example, the `upperlist`, `upper` and `append` variables are global. Another optimization for the loop version is to use local variables rather than global variables as these can be accessed much more efficiently in Python.

<div class="alert alert-success">
Re-write the above example to use local variables rather than global variables and time the execution.
</div>

In [32]:
upper = str.upper
append = upperlist.append
def do_append():
    for word in lowerlist:
        append(upper(word))
%timeit do_append()

1000 loops, best of 3: 232 µs per loop


Another way to speed up loops is to remove the looping stucture altogether. If the body of the loop is simple, the interpreter overhead of the `for` loop itself can be a substantial amount of the overhead. This is where the `map` function is handy. You can think of `map` as a for loop moved into C code. The only restriction is that the "loop body" of `map` must be a function call. Besides the syntactic benefit of list comprehensions, they are often as fast or faster than equivalent use of map.

<div class="alert alert-success">
Rewrite the `do_append` function using `map` to apply the `str.upper` function to `lowerlist` and time how long it takes to execute.
</div>

In [33]:
def do_append():
    upper = str.upper
    map(upper,lowerlist)
%timeit do_append()

10000 loops, best of 3: 184 µs per loop


<div class="alert alert-success">
The same loop can also be written with a list comprehension. Do this, and time how long it takes.
</div>

In [34]:
def do_append():
    upper = str.upper
    upperlist = [upper(word) for word in lowerlist] #still has loop overheads
%timeit do_append()

10000 loops, best of 3: 233 µs per loop


You can use list comprehension to replace many “for” and “while” blocks. List comprehension is faster because it is optimized for the Python interpreter to spot a predictable pattern during looping.

The following code computes a list of even numbers.

In [35]:
def evens():
    evens = []
    for i in range(1000):
        if i % 2 == 0:
            evens.append(i)
    return evens

<div class="alert alert-success">
Time how long it takes for this function to execute.
</div>

In [36]:
%timeit evens()

1000 loops, best of 3: 182 µs per loop


<div class="alert alert-success">
Time how long it takes to create the same list with an equivalent comprehension.
</div>

In [37]:
%timeit [i for i in range(1000) if i % 2 == 0]

10000 loops, best of 3: 103 µs per loop


## Import overhead

import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.

The following two functions do the same thing, the only difference is the location of the `import` statement.

In [38]:
def func():
    import string
    string.lower('Python')

In [39]:
import string
def func():
    string.lower('Python')

<div class="alert alert-success">
Compare the execution of the two functions by timing how long it takes to run each 10000 times.
</div>

In [40]:
%timeit -n 10000 func()

10000 loops, best of 3: 479 ns per loop


Note that using string methods avoids the need to import at all, and runs even faster.

In [41]:
def func():
    'Python'.lower()

<div class="alert alert-success">
Try executing this function 10000 times and see how long it takes.
</div>

In [42]:
%timeit -n 10000 func()

10000 loops, best of 3: 301 ns per loop
