# Parallel Processing

You probably have a multi-core CPU, and for tedious tasks I highly recommend you using it. Parallel Processing is a computational method that allows multiple tasks to be executed concurrently.

You should consider parallelizing your program:
1. When tasks are independent - that is they can be executed without needing intermediate results or synchronization.
2. For CPU-bound workloads - that is when programs spend significant time on computations
3. For I/O-bound workloads - that is when tasks need to wait for external resources like file I/O, database queries, or network requests
4. When processing large data
5. When you have multi-core/cluster resources 

Is it worth the overhead costs? Good question! When do we know? Suppose we have the following:
- $T$ tasks you want to compute that are independent of one another (perhaps the same sort of task with different parameter values)
- $N$ CPUs on your computer
- $M$ GB of memory
- $t$ is the time it takes all tasks to run on one core.
- $m$ is the amount of memory taken by each task.

If $Nm < M$, then you can run one task on each of the $N$ cores. This will take $\frac{t}{N}$ time. (It's all a little more complicated than this for hardware reasons, but let's not worry about that).

You may ask, how many cores does my computer actually have? Well, you can use `os` for that

In [None]:
import os

## More Details (that you may or may want to know)
Borrowed heavily from [Parallel Process reference](https://www.run.ai/guides/deep-learning-for-computer-vision/python-parallel-processing)

There are multiple ways of implementing parallel processing in Python: multithreading, multiprocessing, and asynchronous programming.
### Multi-Threading
Multi-threading is a form of parallelism that allows programs to execute multiple commands concurrently. In Python, the `threading` module provides method to create and manage threads. Each thread can run a specific function or method, running independently of each other. However, because Python contains the Global Interpreter Lock (GIL) multi-threading doesn't always lead to improved performance. The GIL prevents true parallel processing for many tasks. IO-bound tasks are freed from the GIL, allowing IO-heavy programs to benefit.

In [1]:
import sys
print(f"Python implementation: {sys.implementation.name}")

Python implementation: cpython


In [3]:
import threading as th
import time

In [5]:
def task(name):
    print(f"Task {name} started")
    time.sleep(2) 
    print(f"Task {name} finished")

In [7]:
for i in range(3):
    task(i)

Task 0 started
Task 0 finished
Task 1 started
Task 1 finished
Task 2 started
Task 2 finished


In [9]:
threads = [th.Thread(target=task, args=(i,)) for i in range(3)]

for t in threads:
    t.start()

for t in threads:
    t.join()

print("All threads completed")


Task 0 started
Task 1 started
Task 2 started
Task 0 finished
Task 1 finished
Task 2 finished
All threads completed


### Multiprocessing
Multiprocessing is another form of parallelism that allows multiple processes to run simultaneously. Unlike threads, each process runs its own Python interpreter, bypassing the GIL. The `multipocessing` module provides ways of creating and managing processes, as well as sharing data among various processes. Multiprocessing works well for CPU-bound tasks that spend most of their time performing computations.

In [None]:
from multiprocessing import Process

def task(name):
    print(f"Task {name} started")
    time.sleep(2)  
    print(f"Task {name} finished")

processes = [Process(target=task, args=(i,)) for i in range(3)]

for p in processes:
    p.start()

for p in processes:
    p.join()

print("All processes completed")

### Asynchronous programming
Asynchronous programming is a form of concurrent programming that involves executing tasks in a non-blocking manner. The `asyncio` module provides a way to write asynchronous code. 
Asynchronous programming can be a bit more complex than multi-threading or multiprocessing, as it requires a different way of thinking about the program's flow. However, it can be a powerful tool for writing efficient, high-performance code, particularly for IO-bound tasks.


In [None]:
import asyncio

In [None]:
async def task(name):
    print(f"Task {name} started")
    await asyncio.sleep(2)  # Simulates a delay (e.g., waiting for a network response)
    print(f"Task {name} finished")

async def main():
    # Schedule multiple tasks concurrently
    tasks = [task(i) for i in range(3)]
    await asyncio.gather(*tasks)  # Run tasks concurrently

# Run the main coroutine
#asyncio.run(main()) # this line may be trouble in interactive environments like Jupyter notebooks.
await main()

## Joblib
As you see, there are many librbraries for parallel processing in Python. I am most familiar with `joblib `.

In [None]:
import random
import joblib
import time
import math

The useful class in `joblib` is `Parallel`. It abstracts away much of the complexities involved in managing parallel processing. `Parallel` has an optional backend argument that allow for thread- or process-based parallelization.
- `threading`
- `loky` (default)

We need some function we want to repeat. Imagine you love computing factorials of random integers slowly.

In [None]:
def func_to_repeat(x):
    time.sleep(1) 
    return math.factorial(x)

In [None]:
from joblib import Parallel, delayed

N = 4
many_exs = Parallel(n_jobs=1, backend='loky')(delayed(func_to_repeat)(x) for x in [random.randint(0,10) for _ in range(N)])

Took about 4 seconds. Not surprising. Why?

In [None]:

N = 8
many_exs = Parallel(n_jobs=2)(delayed(func_to_repeat)(x) for x in [random.randint(0,10) for _ in range(N)])

Clearly there is some degradation here, but still, better than nothing.

What if we set N higher and wanted to check progress in real time? Use `tqdm`.

In [None]:
from tqdm import tqdm

N = 8
many_exs = Parallel(n_jobs=8)(delayed(func_to_repeat)(x) for x in tqdm([random.randint(0,10) for _ in range(N)]))


In [None]:
N = 100
many_exs = Parallel(n_jobs=4)(delayed(func_to_repeat)(x) for x in tqdm([random.randint(0,10) for _ in range(N)]))

In [None]:
for i in tqdm(range(10), desc="Processing", unit="item", colour="blue"):
    time.sleep(0.1)

# When NOT to use Parallel Processing
Parallel processing will (probably) not bring benefits when you dealing with 
- Downloading files
- Dependent tasks
- Sequential tasks
- Short tasks
- Limited CPU resources

Before parallelizing any code, remember to consider (1) the overhead time that it will introduce and (2) parallel code is more complex and harding to maintain.