## Python - Concurrency

---

## Global Interpreter Lock (GIL)

A **mutex lock** in CPython that ensures only one Python thread executes bytecode at a time, even on multi-core systems.

The GIL exists because:
* Reference counting for garbage collection must be atomic
* Multiple threads updating refcounts simultaneously = crashes
* GIL simplifies implementation, prevents race conditions

Free threaded CPython exists in Python 3.14, one to investigate.

In [1]:
import asyncio
import logging
import os
import sysconfig
import time
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

In [2]:
log = logging.getLogger(__name__)
logging.basicConfig(
    level=logging.DEBUG, format="%(asctime)s [%(levelname)s] %(message)s"
)

## System Status 

In [3]:
os.cpu_count()

4

In [4]:
gil_disabled = sysconfig.get_config_var("Py_GIL_DISABLED")
log.info(f"Python GIL disabled: {gil_disabled == 1}")

2026-01-16 12:25:25,074 [INFO] Python GIL disabled: False


## Thread & Process Pools

TL;dr - use multi-threading for I/O bound tasks, multiprocessing for CPU bound tasks.

| Aspect | Multithreading | Multiprocessing |
|--------|----------------|-----------------|
| **Uses** | **Threads** (software) on **1 core** | **Separate processes** on **multiple cores** |
| **Memory** | **Shared** (fast data sharing) | **Separate** (no sharing by default) |
| **GIL** | **Limited** (1 thread executes at a time) | **Bypassed** (each process has own GIL) |
| **Parallelism** | **Concurrency** only (pseudo-parallel) | **True parallelism** (multiple cores) |
| **Overhead** | **Low** (lightweight threads) | **High** (full process startup) |
| **Best for** | **I/O-bound** (waiting tasks) | **CPU-bound** (computation tasks) |


**What Happens During `process.start()`**

1. **OS allocates new process** (memory, PID, resources)
2. **Loads Python interpreter** (~100MB memory)
3. **Imports all modules** (your code, libraries)
4. **Serializes function + data** (pickling)
5. **Deserializes in child** (unpickling)
6. **Runs `if __name__ == '__main__':` block**
7. **Executes target function**

**Start Methods (Speed vs Safety)**

| Method | Platforms | Speed | Safety | What it does |
|--------|-----------|-------|--------|--------------|
| **`spawn`** (default Windows/macOS) | All | **Slowest** | ✅ Safest | Full Python reload |
| **`fork`** (default Linux) | Unix | **Fastest** | ⚠️ Risky | Copies process memory |
| **`forkserver`** | Unix | Medium | ✅ Safe | Pre-forked server |

## CPU Task

In [5]:
def cpu_work(n: int = 10**6) -> int:
    total = 0
    for i in range(n):
        total += i**2
    return total

In [6]:
%%timeit -n 5 -r 1

# Single-threaded - CPU-bound
for _ in range(os.cpu_count()):
    cpu_work()

261 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 5 loops each)


In [7]:
%%timeit -n 5 -r 1

# Multi-threaded - CPU-bound
with ThreadPoolExecutor(os.cpu_count()) as executor:
    futures = [executor.submit(cpu_work) for _ in range(os.cpu_count())]
    [f.result() for f in futures]

264 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 5 loops each)


In [8]:
%%timeit -n 5 -r 1

# Multi-process - CPU-bound
with ProcessPoolExecutor(os.cpu_count()) as executor:
    futures = [executor.submit(cpu_work) for _ in range(os.cpu_count())]
    [f.result() for f in futures]

137 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 5 loops each)


## I/O Task

In [9]:
def io_work(t: float = 0.005) -> None:
    time.sleep(t)

In [10]:
%%timeit -n 5 -r 1

# Single-threaded - I/O-bound (sequential execution)
for _ in range(os.cpu_count()):
    io_work()

20.3 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 5 loops each)


In [11]:
%%timeit -n 5 -r 1

# Multi-threaded - I/O-bound (os.cpu_count() threads, concurrent execution)
with ThreadPoolExecutor(max_workers=os.cpu_count()) as executor:
    futures = [executor.submit(io_work) for _ in range(os.cpu_count())]
    [f.result() for f in futures]

5.73 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 5 loops each)


In [12]:
%%timeit -n 5 -r 1

# Multi-process - I/O-bound (os.cpu_count() processes)
with ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
    futures = [executor.submit(io_work) for _ in range(os.cpu_count())]
    [f.result() for f in futures]

20.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 5 loops each)


## Coroutines

A coroutine is a special type of function that can pause its execution (suspend) and resume later, allowing other coroutines to run concurrently. Coroutines enable cooperative multitasking: each coroutine explicitly yields control, allowing multiple asynchronous operations to be interleaved efficiently on a single thread.

Why use them? 
* Coroutines are more lightweight than threads (lower memory and startup cost).
* Facilitate asynchronous programming, especially for I/O-bound tasks.
* They run within one thread and share that thread's event loop.

In [13]:
async def fake_io_request(t: float = 0.005) -> None:
    await asyncio.sleep(t)


async def main() -> None:
    tasks = [fake_io_request() for _ in range(os.cpu_count())]
    await asyncio.gather(*tasks)


times = []
for _ in range(5):
    start = time.perf_counter()
    await main()
    end = time.perf_counter()
    times.append(end - start)

avg_time = sum(times) / len(times)
var = sum((x - avg_time) ** 2 for x in times) / len(times)
std = var**0.5

log.info(f"Average runtime: {avg_time * 1000:.5f} ms ± {std * 1000:.5f} ms")

2026-01-16 12:25:28,762 [INFO] Average runtime: 5.19356 ms ± 0.04075 ms
