# `multiprocessing`

Parallel computation on a single machine in Python. One of the most important tools for a data scientist. Part of the Python standard library.


## Python standard library parallel computation ecosystem

[Multiprocessing Vs. Threading In Python - Sid Panjwani](https://timber.io/blog/multiprocessing-vs-multithreading-in-python-what-you-need-to-know/)

`threading` - uses threads (same memory space) - IO bound problems.

`multiprocessing` - uses processes (different memory space) - CPU bound.

How does this relate to CPU cores:

- CPU cores are fixed (usually 4-16 in laptops - depends on your physical hardware),
- more cores = true parallelism (opposed to the very fast task switching done by the OS),
- your computer can have many threads and many processes (depends on the OS),
- the OS will schedule these threads/processes to available cores,
- a single thread consumes an entire core.

[Multithreading and multicore differences](https://stackoverflow.com/questions/11835046/multithreading-and-multicore-differences)

*But my CPU cores have two threads*:

- this is a different use of the term (the hardware thread),
- CPU having threads allows a core to run thread in parallel, as if there were multiple cores - known as hyperthreading.


## Why do we need `multiprocessing`?

Python has a Global Interpreter Lock (GIL) that prevents parallelizing computation across multiple cores:

Python is not thread safe - requires a lock when accessing an object (a form of memory management).


## What can be hard in multiprocessing?

Sharing things between processes:

- solution = don't use it in this way,
- make every process independent,
- a more functional style = no interaction (because interaction = side effects!).


## `multiprocessing` 101

We map functions to data - but in parallel!

First let's do a `map` in Python:

In [None]:
import time
import numpy as np

from src import subtract

data = np.random.uniform(0, 100, size=10).tolist()
st = time.time()

result = list(map(subtract, data))
print(time.time() - st)

Let's parallelize this using `multiprocessing`:

In [None]:
from multiprocessing import Pool

num_process = 2
st = time.time()

with Pool(num_process) as pool:
    out = pool.map(subtract, data)
    
print(time.time() - st)

A common use case is to have arguments for the function being mapped:

In [None]:
from functools import partial

st = time.time()
with Pool(num_process) as p:
    rewards = p.map(partial(subtract, sleep=0.1), data)
    
print(time.time() - st)

Note that when we remove our sleep, the non-mulitprocessing `map` is faster:

In [None]:
st = time.time()
result = list(map(partial(subtract, sleep=0.0), data))
print(time.time() - st)

Distributed computation has overhead (fixed + variable) - make sure your function runs long enough to justify it!

## Exercise - Blockchain Mining

Write multiprocessed code to solve a hashing problem (similar to how *proof of work* works in Bitcoin).

Our proof of work is as follows:

- take a given input string (base string),
- add strings on the end of it until you get a hash with a trailing `1` (this is what we consider the hash problem as solved).

You can make this hash problem harder to solve by being more strict (perhaps a condition of `hash[-2:] == '11'` - it's totally arbitrary).

We can hash in Python using `zlib`:

In [None]:
from zlib import adler32
str(adler32('baseman'.encode()))

We can add characters onto the end of this string and we will get a different hash:

In [None]:
str(adler32('basemans'.encode()))

This task (finding a string that solves our hash problem) can be run in parallel:

Suggested approach:
1. write a `for` loop,
2. convert to a `map`,
3. multiprocess :)

You can then extend the program to look across many different base hashes at once.

In [None]:
import string
import random
from zlib import adler32


def check_string(s):
    hsh = str(adler32(s.encode()))
    if hsh[-1] == str(1):
        return True
    else:
        return False
    
base = 'baseman'

#  this for loop is your oppourtunity to parallelize
for _ in range(64):
    new = random.choice(string.ascii_lowercase)
    success = check_string(base + new)
    
    #  I think we could use := here ?
    if success:
        print(success)