# Multiprocessing: True Parallelism for CPU-Bound Work

We’ll compare **sequential** vs **ProcessPoolExecutor** using the same CPU-heavy task.

Plan:
1) Baseline (serial)
2) Parallel with `map()` — ordered, uniform tasks
3) Parallel with `submit()` + `as_completed()` — streaming, variable tasks

Notes:
- These cells are designed to run as **scripts**; in notebooks, multiprocessing can be
  finicky. Prefer saving to `.py` and running with `python file.py`.
- Always keep functions **module-level** and guard entry with:
  `if __name__ == "__main__":`
- Observe CPU monitors (htop/Activity Monitor): all cores should light up in parallel runs.


In [1]:
import os, time, random
from concurrent.futures import ProcessPoolExecutor, as_completed
from mp_tasks import cpu_heavy

rng = random.Random(0)

n_workers = os.cpu_count() or 4

uniform_tasks = [12_000_000] * n_workers
nonuniform_tasks = [8_000_000 + rng.randint(0, 8_000_000) for _ in range(n_workers)]

### Baseline — Sequential CPU-Bound Execution

Purpose:
- Establish the runtime when running the CPU-heavy function **back-to-back** on one core.
- This is our **reference** for speedup.

Expect:
- One core near 100%.
- Total time ≈ repeats × single-task time.


In [None]:
t0 = time.perf_counter()

for t in uniform_tasks:
    cpu_heavy(t)

dt = time.perf_counter() - t0

print(f"Serial: ran {len(uniform_tasks)} CPU tasks in {dt:.2f}s")

Serial: ran 12 CPU tasks in 12.61s


### Parallel — `ProcessPoolExecutor.map()` (ordered, uniform tasks)

Purpose:
- Run **one task per core** with identical workloads using `map()`.

Key points:
- **Ordered results** (same order as inputs).
- Great for **uniform durations**.
- Consider `chunksize` if tasks are very small.

Expect:
- All cores active.
- Wall time drops by ~number of workers (minus overhead).


In [5]:
t0 = time.perf_counter()

with ProcessPoolExecutor() as ex:
    results = list(ex.map(cpu_heavy, uniform_tasks, chunksize=1))

dt = time.perf_counter() - t0

print(f"map(): ran {len(uniform_tasks)} CPU tasks in {dt:.2f}s")

map(): ran 12 CPU tasks in 3.05s


### Parallel — `submit()` + `as_completed()` (streaming, variable tasks)

Purpose:
- Schedule tasks individually and **consume results as they finish**.

Key points:
- **Out-of-order** completion reflects true variability in work.
- Ideal for progress reporting and early aggregation.
- Handle exceptions via `f.result()` (they propagate from workers).

Expect:
- Intermittent progress prints as tasks complete.
- Similar total speedup to `map()`, but more responsive output.


In [None]:
t0 = time.perf_counter()
    
with ProcessPoolExecutor() as ex:
    futures = [ex.submit(cpu_heavy, n) for n in nonuniform_tasks]
    done = 0
    
    for f in as_completed(futures):
        _ = f.result()  # raises if worker failed
        done += 1
        if done % 3 == 0:
            print(f"{done}/{len(nonuniform_tasks)} finished...")

dt = time.perf_counter() - t0

print(f"submit() + as_completed(): ran {len(nonuniform_tasks)} CPU tasks in {dt:.2f}s")

3/12 finished...
6/12 finished...
9/12 finished...
12/12 finished...
submit()+as_completed(): ran 12 CPU tasks in 3.18s
