# Why are Python programs slow?

* For **compiled** programming languages, the  translation is done by a compiler before the execution of the program 
* For **interpreted** languages, the translation is done by an interpreter during the execution of the program

Python is an interpreted language, and many features that make development rapid with Python are a result of that, with the price of reduced performance in some cases.

## Dynamic typing

Python is a very dynamic language. As variables get type only during the runtime as values (Python objects) are assigned to them, it is more difficult for the interpreter to optimize the execution (in comparison, a compiler can make extensive analysis and optimization before the execution).

## Flexible data structures

The built-in data structures of Python, such as lists and dictionaries, are very flexible, but they are also very generic, which makes them not so well suited for extensive numerical computations. Actually, the implementation of the data structures (e.g. in the standard CPython interpreter)

## Multithreading

The performance of a single CPU core has stagnated over the last ten years, and as such most of the speed-up in modern CPUs is coming from using multiple CPU cores, i.e. parallel processing. Parallel processing is normally based either on multiple threads or multiple processes. 

Unfortunately, the memory management of the standard CPython interpreter is not thread-safe, and it uses something called Global Interpreter Lock (GIL) to safeguard memory integrity.

# Where program spends time?

One should try to optimize the program. It is very typical that a program spents most of the time only in a small part of the program as exemplified by the **common 90/10 rule: 90% of time is spent in 10% of the source code.**

Thus, it is clear that when optimizing one should focus only in the time-critical parts of the program.

How to find these hotspots is the task of performance analysis. The two main ways to analyze performance are via **applications own timers**, which measure the time spent in specific region of a program, or by utilizing special **performance analysis** software.

Performance analysis software can often include information about hardware counters such as **floating pointer operations per second, memory access, cache hits and misses etc.** In the end of this activity we look at some **Python specific tools.**

# Using applications own timers

In order to get a bigger picture of the performance of a program, it can be useful to measure the time spent in a specific region of the program. The region can be a function, or just a part of a function, and the region can contain calls to other functions. In a typical usage pattern one obtains a value from some “clock”

In [1]:
from math import exp, sin
import time

def calculate(a):
    result = 0
    for val in a:
        result += exp(val) * sin(val)
    return result

x = [0.1 * i for i in range(1000)]
t0 = time.process_time()
for r in range(1000):
    calculate(x)
t1 = time.process_time()
print("Time spent", t1 - t0)

Time spent 0.5625


## timing with a context manager

Python context managers provide a nice feature for executing functions when entering and exiting a region. The example below shows how one can utilize a context manager and the with statement for timing a part of a code.

In [2]:
from math import exp, sin
import time

class Timer:
    def __enter__(self):
        self.start = time.process_time()
        return self

    def __exit__(self, *args):
        self.end = time.process_time()
        self.interval = self.end - self.start

def calculate(a):
    result = 0
    for val in a:
        result += exp(val) * sin(val)
    return result

x = [0.1 * i for i in range(1000)]
with Timer() as t:
    for r in range(1000):
        calculate(x)
print("Time spent", t.interval)

Time spent 0.671875


## Measuring small code snippets with timeit

In [3]:
from math import sin, cos

In [4]:
%timeit sin(0.2)

230 ns ± 58.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [5]:
%timeit cos(0.2)

206 ns ± 29.9 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [6]:
import numexpr
import numpy

In [7]:
numexpr.set_num_threads(2)

x = numpy.random.random((1000000, 1))
y = numpy.random.random((1000000, 1))

poly = numexpr.evaluate("((.25*x + .75)*x - 1.5)*x - 2")

In [8]:
poly

array([[-2.23762644],
       [-2.50466019],
       [-2.52357466],
       ...,
       [-2.56639307],
       [-2.46904589],
       [-2.02311321]])