# Python Parallelization for Lazy People (like me): 1 
### Single Node Embarassingly Parallel Loops with Joblib

### Recap: List Comprehension

In [None]:
def square(number):
    return number **2

The following two code blocks are equivalent, thought one is more compact than the other.

(there are subtle differences which we will not bother about now)

In [None]:
outputs = []
inputs = [1,43,72,786]

for i in inputs:
    outputs.append(square(i))
print("Output: ", outputs)

In [None]:
inputs = [1,43,72,786]
outputs = [square(i) for i in inputs]
print("Output: ", outputs)

The later method of generating a list is called `list comprehension`. 

But what if our function has two outputs?

In [None]:
def square_root(number):
    return number**0.5, -1*number**0.5

In [None]:
output_p = []
output_n = []

inputs = [1,43,72,786]

for i in inputs:
    p,n = square_root(i)
    output_p.append(p)
    output_n.append(n)
    
print("Positive output: ", output_p)
print("Negative output: ", output_n)

In [None]:
output_p, output_n = zip(*[square_root(i) for i in inputs])
print("Positive output: ", output_p)
print("Negative output: ", output_n)

### Embarassingly Parallel Loops ([Moler, 1986](http://blogs.mathworks.com/cleve/2013/11/12/the-intel-hypercube-part-2-reposted/#096367ea-045e-4f28-8fa2-9f7db8fb7b01))

When multiple iterations of the same function is being run where the outcome of one iteration does not affect or depend on the outcomes of another iteration. 

i.e. each iteration of a loop is an independent run of the code for all practical purposes.

In [None]:
from joblib import Parallel, delayed

In [None]:
outputs = Parallel(n_jobs=2)( #Parallel(n_jobs)( the input is the list comprehension)
    delayed(square)(i) # delayed(the argument is the name of the original function)
    for i in inputs)
print(outputs)

What if my function has multiple outputs?

In [None]:
output_p, output_n = zip(*Parallel(n_jobs=2)(delayed(square_root)(i) for i in inputs))
print("Positive output: ", output_p)
print("Negative output: ", output_n)

How do I know how many threads are available?

In [None]:
import multiprocessing

n_workers = -1

if n_workers <= 0:
    n_workers = multiprocessing.cpu_count()
else:
    n_workers = min(int(n_workers), multiprocessing.cpu_count())
print("Number of Workers: ", n_workers)

### Potential Pitfall: Oversubscription of cores 
There might be some numpy functions which parallelize things internally. Leading to the total number of parallel processes to blow up.
It is better to limit the internal parallelization using either of the following ways.

```python
import os

os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
```

```python
from joblib import Parallel, delayed, parallel_backend

with parallel_backend("loky", inner_max_num_threads=1):
    results = Parallel(n_jobs=4)(delayed(func)(x, y) for x, y in data)
```