# Need for Speed

> "Premature optimization is the root of all evil." - Donald Knuth and Tony Hoare

- However, if a Python program has a long runtime and is run frequently we may want to increase its speed.  We can try the following:

1. Profile code to see slowest parts.  Focus on those first.  After making any change, profile again.
1. Optimize code algorithms.  This primarily means big O algorithm analysis.  Additional tips include choosing libraries like NumPy because they use fast data structures, reduce the number of operations within loops, and cache data that is re-used instead of retrieving the same data from a database or web server repeatedly.
1. Consider multiprocessing for CPU bound programs.  Realize there are trade offs in program complexity and ease of debugging.  Use the `multiprocessing` module.  Multiprocessing can be used with distributed cloud computing.  
1. Consider multithreading for I/O bound programs.  Realize there are trade offs in program complexity and ease of debugging.  Use the `asyncio` module if possible.  Avoid using the `threading` module.
1. Consider using using PyPy for CPU bound long running programs.  **PyPy** is a Python implementation that uses **just-in-time (JIT)** compilation.  It is the fastest Python implementation, being about 4.5x faster than the standard CPython.  PyPy is "compliant", meaning that the vast majority of Python code we write works with PyPy without changing grammar.  However, we'd need to install the PyPy interpreter and PyPy versions of third-party libraries.
1. Consider using a fully compiled programming language for CPU bound programs.  Python is known for being easy to read and write as well as being relatively slow.  For CPU bound programs switching to a compiled language could increase speed by a factor of 10 or more. Note that switching languages will not always provide this significant of a speed increase if a CPU bound program is already using a highly optimized library like NumPy.

---

## Profile
- **Profile**--set of statistics that describes how often and for how long various parts of a program execute
There are 4 main ways to profile code in Python
    1. `time` module can be used to manually measure the time it takes to run a program.  Simple tool to introduce the concept of profiling.
    1. `timeit` module can be used to measure the time it takes to run a short piece of code
    1. `cProfile` module generates profile in a more automated way.  Uses C code to increase speed of profile functions.
    1. `profile` module is the same as `cProfile`, but `profile` is written in written in Python code.  Slower than `cProfile` so not commonly used.
    1. `pstats` module formats profile returned by `cProfile` into a report

Code | Use
--- | ---
`time` | Module
`time.time()` | Returns time in seconds since Unix epoch.  Can be used to profile code by taking time at start and end of program and subtracting the two.

Code | Use
--- | ---
`timeit` | Module
`timeit.timeit()` | Returns seconds argument takes to run.  Argument is string of short section of code.

Code | Use
--- | ---
`cProfile` | Module
`cProfile.run()` | Prints profile for argument run.  Argument is string of function.  Unlike most functions used in arguments, we type the parentheses.

`cProfile.run()` outputs the following:
- ncalls  The number of calls made to the function
- tottime  The total time spent in the function, excluding time in subfunctions
- percall  The total time divided by the number of calls
- cumtime  The cumulative time spent in the function and all subfunctions
- percall  The cumulative time divided by the number of calls
- filename:lineno(function)  The file the function is in and at which line number

---

**EXAMPLES**

In [1]:
import time
import timeit
import cProfile

**`time.time()`**

In [2]:
def add_numbers():
    i_sum = 1
    for number in range(1, 1000000):
        i_sum += number

t_start = time.time()
add_numbers()
t_end = time.time()

print(f'This code took: {t_end - t_start} seconds to run.')

This code took: 0.06101226806640625 seconds to run.


**`timeit.timeit()`**

In [3]:
timeit.timeit('a, b = 42, 101; a, b = b, a')

0.02737289999999959

**`cProfile.run()`**

In [4]:
def add_numbers():
    i_sum = 1
    for number in range(1, 100000):
        i_sum += number

cProfile.run('add_numbers()')

         4 function calls in 0.006 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.006    0.006 999411293.py:1(add_numbers)
        1    0.000    0.000    0.006    0.006 <string>:1(<module>)
        1    0.000    0.000    0.006    0.006 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




---

## Multiprocessing
- **CPU**--Central Processing Unit (or simply processing unit or processor).  Electronic circuitry that executes instructions comprising a computer program.  Theses include arithmetic, logic operations, controlling, and input/output.  I.e. the brains of the computer. 
- **Multi-core processor**--CPU containing multiple processing units on the same semiconductor chip.  Each processing unit is called a core.
- **Program**--software application.  E.g. Mozilla Firefox.
- **Process**--a running instance of a program.  We can have multiple processes open at one time.  E.g. multiple instances of Firefox. 
- **Mulitprocessing**--when a CPU uses multiple cores in **parallel**.  Similar to parallel electrical circuits.  In multiprocessing, we run different processes on different processors.  E.g. Instance of Mozilla on core A and instance of Notepad on core B.  We can also run a single program across different processors.
- From now on, we'll be talking about running a single program across different processors.  The program is our Python script.  We can run a single Python script across multiple processors if the script has many activities that are not dependent on each other and the script can be redesigned so that each code chunk representing each activity is defined as its own process. Each of these processes can then be run on a different processor.
- Pros
    - Increase speed of **CPU bound** programs.  E.g. a program that applies mathematical expressions to large collections.  More cores means more computations can be done in a shorter amount of time, decreasing total run time.
    - Simpler than multithreading
- Cons
    - Increases program complexity.  Must separate single program into independent processes.
    - Hard to debug
    - Processes can not (or at least it is very hard to) share variables and information with each other
    - May or may not slow down "I/O bound" programs.  It takes a lot of CPU resources to divvy up processes to their processors.
- *Best practice is to keep # processes =< # cores.  That way, each process can run at the same time without competing for processor time.*

- Python uses 1 process by default
- The `multiprocessing` module can be used to divide a Python script into multiple processes that run on multiple processors
- "Multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows."

Code | Use
--- | ---
`multiprocessing` | Module
`multiprocessing.Process(target= <TARGET_FUNCTION>)` | Return process object.
`.start()` | Process object method.  Return process object.

---

**EXAMPLES**

- Multiprocessing actually doesn't work the same in Jupyter because of an issue with `__main__`.  The below code is in Markdown, but shows how to launch a process.  Grammar is similar to what is seen in the `threading` module.

```python
import multiprocessing

def process_function_2():
    print("Process 2 is now running.")
    print("Process 2 is done running.")

if __name__ == '__main__':
    po_process_2 = multiprocessing.Process(target = process_function_2)
    po_process_2.start()
```

---

## Multithreading
- **Thread**--any single program (like our Python script) in which many activities are not dependent on each other can be redesigned so that the code chunk representing each activity is defined as a "thread"
- **Multithreading**--ability of a computer central processing unit (or a single core in multi-core processors) to provide multiple threads of execution **concurrently**.  The term concurrently here does NOT mean threads run at the same time on the same processor.  Threads on the same processor share the processor resources and the processor repeatedly switches between threads.
- Pros
    - Increases speed for **I/O bound** programs.  E.g a program that reads and writes to an external resource like a file system, database, or web server.  The slowest thread of the program (waiting on an external resource) will not act as a bottleneck for the execution of the entire program because the processor can switch away from the slow thread and come back to it later.  Multithreading is said to hide "latency".
    - Threads can communicate and share variables (though this could lead to error)
- Cons
    - Increases program complexity.  Must separate single program into independent threads.
    - Very hard to debug
    - Can cause concurrency issues.  There are techniques like "locks" that solve this but these are complicated, error prone, and require CPU resources.
    - Will slow down CPU bound program because it takes time to switch back and forth between threads
- *Best practice is to never let multiple threads read or write the same variables and only use local variables (not global) in each thread.  This will reduce concurrency issues*

- Below we have a chart showing multithreading and multiprocessing.  The left chart shows mulithreading on a single processor. It is almost like a single chef switching between jobs in a kitchen.    The right chart shows 4 processors running 4 jobs.  This is like 4 chefs working at the same time.  Multithreading = concurrent.  Multiprocessing = parallel.

![](images/concurrent_parallel.jpg)

---

### Threading
- Python scripts have a single thread of execution by default
- The `threading` module creates multithreading programs
- The `threading` module implements a **Global Interpreter Lock (GIL)**, so that only one thread executes Python bytecode at a time.  This means that multiple threads are only ever run on a single processor.
- "This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.  If we want our application to make better use of the computational resources of multi-core machines, we are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if we want to run multiple I/O-bound tasks simultaneously."
- It is generally NOT recommend to use the `threading` module.  Even with GIL we can still run into concurrency issues.  The threading module is apparently VERY hard to get right, even for experienced programmers.
- This is in part because the `threading` module switches between threads after a certain number of milliseconds.  If any threads were to share any variables then we could run into problems like this:
    1. Thread 1 reads variable A, obtaining the value 5
    1. OS automatically switches threads to Thread 2
    1. Thread 2 reads variable A, obtaining the value 5, adds 10, and assigns the value 15 to variable A
    1. OS automatically switches threads back to Thread 1
    1. Thread 1 still thinks variable A is 5, even though it it is now 15
    1. Thread 1 and 2 go back and forth using using variable A, but now all calculations are off
- With that being said, we'll provide a little code and do a couple examples to help understand the principles of mulithreading.

Code | Use
--- | ---
`threading` | Module
`threading.Thread(target = <TARGET_FUNCTION>)` | Return thread object.  If the target function takes arguments, pass them through the `Thread()` function as keyword argument.  Use `args=[<LIST_OF_TARGET_AGRS>]` and/or `kwargs={DICT_OF_TARGET_KWARGS}`.  Keyword arguments within keyword arguments...inception!!!
`.start()` | Thread object method.  Start executing code in thread object.  

---

**EXAMPLES**

In [5]:
import threading
import time

- Notice how the OS switches back and forth between threads automatically
- The program starts at Thread 1, switches to Thread 2 as instructed, but then Thread 2 sleeps for 5 seconds.  The OS automatically switches back to Thread 1, finishes running Thread 1, and then automatically switches back to Thread 2 and finishes Thread 2.
- The sleep function represents a slow I/O function and demonstrates how we can hide this latency

In [6]:
print("Thread 1 is now running.")

def func_thread_2():
    print("Thread 2 is now running.")
    time.sleep(5)
    print("Thread 2 is now running.")
    print("Thread 2 is done running.")

to_thread_2 = threading.Thread(target = func_thread_2)
to_thread_2.start()

print("Thread 1 is now running.")
print("Thread 1 is done running.")

Thread 1 is now running.
Thread 2 is now running.
Thread 1 is now running.
Thread 1 is done running.


- Pass arguments to target function.  We'll first show these arguments are normally passed to a function, and then show how they are passed through `.Thread()`.

In [7]:
print("Hello", "world", sep = " ")

Hello world


In [8]:
to_thread_2 = threading.Thread(target=print, args=["Hello", "world"], kwargs={"sep":" "})
to_thread_2.start()

Hello world
Thread 2 is now running.
Thread 2 is done running.


---

### Asyncio
- `asyncio`--stands for asynchronous input output.  Newer Python multithreading module.
- The `threading` module switches threads after a certain amount of time.  This is convenient because we don't need to add any code to cause thread switch
- The `asyncio` module switches threads "cooperatively".  We explicitly add `yield` or `await` grammar to tell the program when to switch between threads.
- Pros of `asyncio` compared to `threading` module
    - No locks needed to prevent concurrency issues.  This increases speed and more importantly makes code easier to understand and debug
    - Switching tasks in `asyncio` takes fewer CPU resources than switching threads with the `threading` module.  We might have 100s of concurrent threads with the `threading` module while we could have thousands of concurrent tasks with `asyncio`.
- Cons of `asyncio` compared to `threading` module
    - Need added syntax to switch between threads
    - Need to use "async" versions of functions and libraries because we can't have any "blocking" function calls
    - Steep learning curve

---