# Parallel Processing

## Overview

### The GIL (Global Interpreter Lock) in Python
Python's GIL allows only one thread to execute at a time in a single process, affecting CPU-bound threading performance.

### CPU-bound vs. I/O-bound Tasks
- CPU-bound tasks: Require significant CPU resources (e.g., mathematical computations).
- I/O-bound tasks: Spend time waiting for input/output operations (e.g., reading files, network requests).

### Parallelism vs. Concurrency
- Parallelism: Performing multiple tasks simultaneously.
- Concurrency: Managing multiple tasks at the same time but not necessarily simultaneously.

### Challenges
- Data Race: Two threads access shared data simultaneously, causing inconsistent results.
- Deadlock: Two or more threads wait indefinitely for resources held by each other.
- Thread Safety: Writing code that functions correctly during simultaneous execution.

## Distributed Parallel Processing with `joblib`

`joblib` provides a simple interface for efficient parallel computing.


`Parallel(n_jobs=-1)` will automatically distribute the workload across all available CPU cores.

Here is a basic example, where we need to call `time_intensive_method()` several times. Doing this in parallel streamlines it. 

In [1]:
from joblib import Parallel, delayed
import time

def time_intensive_method(n):
    time.sleep(5)
    return n

results = Parallel(n_jobs=-1)(delayed(time_intensive_method)(n) for n in range(5))

## Parallel Processing with `multiprocessing`
```{note}:
Special considerations apply when using Jupyter or Windows for this. Check out  
[Bob Swinkels's blog on the topic](https://bobswinkels.com/posts/multiprocessing-python-windows-jupyter/) for details.
```
Creating Processes

In [2]:
from multiprocessing import Process

def worker(num):
    print(f'Worker {num}')

processes = []
for i in range(5):
    p = Process(target=worker, args=(i,))
    processes.append(p)
    p.start()
    
for p in processes:
    p.join()

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^

Process, Pool, and `map`

In [3]:
from multiprocessing import Pool

# worker.py
def square(x):
    return x * x

with Pool(4) as pool:
    result = pool.map(square, [1, 2, 3, 4])
print(result)

Process SpawnPoolWorker-16:
Traceback (most recent call last):
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'square' on <module '__main__' (built-in)>
Process SpawnPoolWorker-17:
Traceback (most recent call last):
Process SpawnPoolWorker-18:
Traceback (most recent call last):
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/process.py", line 314, in _boo

Traceback (most recent call last):
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'square' on <module '__main__' (built-in)>


Process SpawnPoolWorker-23:
Process SpawnPoolWorker-22:
Traceback (most recent call last):
Process SpawnPoolWorker-21:
Process SpawnPoolWorker-20:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/queues.py", line 364, in get
    with self._rlock:
  File "/Users/jordanbarker/miniconda3/envs/py311/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
           ^^^^^^^^^^^^^

KeyboardInterrupt: 

### Communication Between Processes

In [None]:
from multiprocessing import Process, Queue

def worker(q):
    q.put('Hello from worker')

if __name__ == '__main__':
    q = Queue()
    p = Process(target=worker, args=(q,))
    p.start()
    print(q.get())  # Output: Hello from worker
    p.join()

### Synchronizing Processes
Locks, Semaphores, Events, and Conditions

In [1]:
from multiprocessing import Process, Lock

def printer(item, lock):
    with lock:
        print(item)

lock = Lock()
items = ['A', 'B', 'C']
for item in items:
    Process(target=printer, args=(item, lock)).start()

### Managing Shared Memory
Value and Array in Shared Memory

In [2]:
from multiprocessing import Process, Value, Array

def increment(value):
    value.value += 1

num = Value('i', 0)
p = Process(target=increment, args=(num,))
p.start()
p.join()
print(num.value)  # Output: 1

0


## Async
Async programming allows for concurrent code execution using coroutines.

In [7]:
# import asyncio

# async def say_hello():
#     print('Hello')
#     await asyncio.sleep(1)
#     print('World')

# asyncio.run(say_hello())

In [6]:
# import asyncio
# import aiohttp

# async def fetch(session, url):
#     async with session.get(url) as response:
#         return await response.text()

# async def main():
#     async with aiohttp.ClientSession() as session:
#         html = await fetch(session, 'https://example.com')
#         print(html)

# asyncio.run(main())

## Additional Topics
- Swifter: Parallel Data Processing with Pandas [(Github page)](https://github.com/jmcarpenter2/swifter)
- Dask enables parallel computing with task scheduling and can handle datasets larger than memory.
- Numba and CuPy for GPU parallel processing 