# The first rule of optimization

> Premature optimization is the root of all evil.
>
> Donald Knuth

So first get your code working! Maybe it's fast enough anyway and you're just wasting your time?

# The second rule: profiling.

Figure out what is the slowest part, and only change that part. Often you will find that only a small fraction of code takes most time. Equally often you will find that the part that you need to optimize isn't what you expect.

The process of figuring out what is fast and what is slow is called *profiling*. There are s
Now I will just show a simple example of how to do so.

In Python there are several main tools for profiling:
* A clock, or `timeit` built-in function that just measures how much time is spent in a piece of code.
* the builtin module `cProfile`, which checks how much time is spent inside each **function**.
* the `line_profiler` tool that measures time spent in each code line.

Let's see all the three in action.

In [8]:
# We need to install the line profiler
# pip3 is the python package manager that installs different packages
# Putting exclamation mark in Jupyter notebooks just executes commands in shell, try running "!ls"

!echo jupyter|sudo -S pip3 install line_profiler

[sudo] password for jupyter: [33mYou are using pip version 7.1.0, however version 7.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [13]:
# Load the line profiler into the notebook
import line_profiler
%load_ext line_profiler# The extra commands are for the 

ImportError: No module named 'line_profiler# The extra commands are for the'

In [14]:
# Code that we're going to profile

def square(valuelist):
    return (value**2 for value in valuelist)

def compute_sum(max_number):
    a = range(max_number)
    a = square(a)
    sum(a)
    return a

In [15]:
# Measure how long the computation runs

%timeit compute_sum(1000)
%timeit compute_sum(1000000)

1000 loops, best of 3: 464 µs per loop
1 loops, best of 3: 476 ms per loop


Useful to compare different versions of the function, but not very telling about which parts are fast

In [16]:
%prun -T output -q compute_sum(1000000)
!cat output

 
*** Profile printout saved to text file 'output'. 
         1000007 function calls in 1.732 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000001    1.068    0.000    1.068    0.000 <ipython-input-14-2816b731c858>:4(<genexpr>)
        1    0.663    0.663    1.732    1.732 {built-in method sum}
        1    0.000    0.000    1.732    1.732 {built-in method exec}
        1    0.000    0.000    1.732    1.732 <ipython-input-14-2816b731c858>:6(compute_sum)
        1    0.000    0.000    1.732    1.732 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 <ipython-input-14-2816b731c858>:3(square)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

We see how many times each function was called and how much time it costed.

In [12]:
# Output pops up in pager.!rm output

%lprun -f compute_sum -f square compute_sum(1000000)

ERROR: Line magic function `%lprun` not found.


## Task

Change square brackets inside the function `square` into round brackets. Compare the performance. Can you figure out what is happening?

# The third rule of optimization: understand what is happening.

In order to write faster code you need to know what exactly is happening. This depends on the language you use, and on the algorithm you implement. More about this in a separate topic (or google for things like `optimized Python code`).