In [1]:
import threading
import time
import os
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

### basics

- GIL：Global Interpreter Lock
    - 锁什么，锁的是 python 的全局解释器；
        - 确保同一时刻只有一个线程执行 Python 字节码。
    - GIL 的存在导致：伪多线程。**GIL contention across threads**
        - 多个线程无法同时执行 Python 字节码，导致多线程在 CPU 密集型任务中无法提高性能。
            - 名义上可以继续写 multi thread 的代码，但实际上并不会得到预期的效率提升；
        - 在执行 I/O 操作（如文件读写、网络请求）时，线程会释放 GIL，允许其他线程运行。
- `concurrent.futures`：多线程/多进程管理及更方便地取出线程及进程的返回值
    - 当你使用 `with ThreadPoolExecutor(...) as executor:` 时，with 块会在退出时自动调用 `executor.shutdown(wait=True)`。shutdown 方法会等待所有正在执行的任务完成。
- `threading` => `concurrent.futures.ThreadPoolExecutor`
- `multiprocessing` => `concurrent.futures.ProcessPoolExecutor`

### DP vs. DDP (torch)

- https://pytorch.org/tutorials/beginner/ddp_series_theory.html


| DataParallel                                    | DistributedDataParallel                   |
|-------------------------------------------------|-------------------------------------------|
| More overhead; model is replicated and destroyed at each forward pass | Model is replicated only once             |
| Only supports single-node parallelism           | Supports scaling to multiple machines     |
| Slower; uses multithreading on a single process and runs into Global Interpreter Lock (GIL) contention | Faster (no GIL contention) because it uses multiprocessing |
