# Parallel Processing

## Overview

### The GIL (Global Interpreter Lock) in Python
Python's GIL allows only one thread to execute at a time in a single process, affecting CPU-bound threading performance.

### Parallelism vs. Concurrency
- Parallelism: Performing multiple tasks simultaneously.
- Concurrency: Managing multiple tasks at the same time but not necessarily simultaneously.

### Challenges
- Data Race: Two threads access shared data simultaneously, causing inconsistent results.
- Deadlock: Two or more threads wait indefinitely for resources held by each other.
- Thread Safety: Writing code that functions correctly during simultaneous execution.

## Distributed Parallel Processing with `joblib`

`joblib` provides a simple interface for efficient parallel computing.


`Parallel(n_jobs=-1)` will automatically distribute the workload across all available CPU cores.

Here is a basic example, where we need to call `time_intensive_method()` several times. Doing this in parallel streamlines it. 

In [2]:
from joblib import Parallel, delayed
import time

def time_intensive_method(n):
    time.sleep(1)
    return n

results = Parallel(n_jobs=-1)(delayed(time_intensive_method)(n) for n in range(5))

## Parallel Processing with `multiprocessing`

```{note}
Special considerations apply when using Jupyter or Windows for this. Check out  
[Bob Swinkels's blog on the topic](https://bobswinkels.com/posts/multiprocessing-python-windows-jupyter/) for details.
```

Creating Processes

In [None]:
from multiprocessing import Process

def worker(num):
    print(f'Worker {num}')

processes = []
for i in range(5):
    p = Process(target=worker, args=(i,))
    processes.append(p)
    p.start()
    
for p in processes:
    p.join()

Parallelize the computation using `Pool` to create 4 worker processes. This distributes the function across them.

In [None]:
from multiprocessing import Pool

def square(x):
    return x * x

with Pool(4) as pool:
    result = pool.map(square, [1, 2, 3, 4])

### Communication Between Processes

`Queue` allows us to communicate between processes. 

In this example, a new process starts with the queue as an argument. Inside the worker function, `q.put('Hello from worker')` adds a message to the queue.

In [None]:
from multiprocessing import Process, Queue

def worker(q):
    q.put('Hello from worker') # Add message to the queue

if __name__ == '__main__':
    q = Queue()
    p = Process(target=worker, args=(q,))
    p.start()
    print(q.get())  # Output: Hello from worker
    p.join() # Ensures the worker process completes before exiting.

### Synchronizing Processes
Locks, Semaphores, Events, and Conditions

In [1]:
from multiprocessing import Process, Lock

def printer(item, lock):
    with lock:
        print(item)

lock = Lock()
items = ['A', 'B', 'C']
for item in items:
    Process(target=printer, args=(item, lock)).start()

### Managing Shared Memory
We can use `Value` and `Array` to share memory across processes.

In [None]:
from multiprocessing import Process, Value, Array

def increment(value):
    value.value += 1

num = Value('i', 0)
p = Process(target=increment, args=(num,))
p.start()
p.join()

Here is how it would look with `Array`

In [None]:
def increment(array):
    for i in range(len(array)):
        array[i] += 1

nums = Array('i', [0, 1, 2, 3])  # Shared array of integers
p = Process(target=increment, args=(nums,))
p.start()
p.join()

## Async
Async programming allows for concurrent code execution using coroutines.

In [7]:
import asyncio

async def say_hello():
    print('Hello')
    await asyncio.sleep(1)
    print('World')

asyncio.run(say_hello())

Here is an example to perform asynchronous HTTP requests, which allows other tasks to proceed while waiting for the response.

In [6]:
import asyncio
import aiohttp

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    async with aiohttp.ClientSession() as session:
        html = await fetch(session, 'https://example.com')
        print(html)

asyncio.run(main())

## Additional Topics
- Swifter: Parallel Data Processing with Pandas [(Github page)](https://github.com/jmcarpenter2/swifter)
- Dask enables parallel computing with task scheduling and can handle datasets larger than memory.
- Numba and CuPy for GPU parallel processing 