From ada709801bacc4c9410ee91a77693c0d200c3d1f Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 18 Apr 2026 14:50:51 +0000 Subject: [PATCH] docs: migrate all wiki pages to mkdocs - Add docs/architecture.md (radix router, compiled DI, middleware chain, zero-copy JSON, uvloop internals) - Add docs/benchmark-methodology.md (benchmark types, hardware specs, direct ASGI protocol, CI workflow) - Add docs/concepts/sub-interpreters.md (PEP 684/734, concurrency API, SubInterpreterPool internals, examples) - Enhance docs/migration-from-fastapi.md with field aliases (msgspec rename) and migration checklist - Update mkdocs.yml nav to include all four new/updated pages Wiki pages are now redundant as all content lives in the mkdocs site. https://claude.ai/code/session_01SF7ikYNfez1eVEK9d473Ef --- docs/architecture.md | 316 +++++++++++++++++++++++++++++ docs/benchmark-methodology.md | 217 ++++++++++++++++++++ docs/concepts/sub-interpreters.md | 323 ++++++++++++++++++++++++++++++ docs/migration-from-fastapi.md | 29 +++ mkdocs.yml | 3 + 5 files changed, 888 insertions(+) create mode 100644 docs/architecture.md create mode 100644 docs/benchmark-methodology.md create mode 100644 docs/concepts/sub-interpreters.md diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..e4df88a --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,316 @@ +# Architecture + +This page explains FasterAPI's internal architecture. Read this before contributing — it covers why each component exists and how they interact. + +--- + +## Request Lifecycle + +``` +Incoming ASGI Request + │ + ▼ +┌──────────────────────────┐ +│ Middleware Chain │ Built once at first request, cached. +│ (CORS → GZip → ...) │ Each middleware wraps the next as an ASGI app. +└────────────┬─────────────┘ + │ + ▼ +┌──────────────────────────┐ +│ Faster.__call__ │ ASGI entry point. Routes to HTTP, WebSocket, +│ │ or Lifespan handler based on scope["type"]. +└────────────┬─────────────┘ + │ (HTTP) + ▼ +┌──────────────────────────┐ +│ RadixRouter.resolve() │ O(k) path lookup (k = path segments). +│ │ Returns (handler, path_params, metadata). +└────────────┬─────────────┘ + │ + ▼ +┌──────────────────────────┐ +│ _resolve_handler() │ Iterates pre-compiled _ParamSpec tuples. +│ │ Injects dependencies, parses params. +│ │ Zero per-request introspection. +└────────────┬─────────────┘ + │ + ▼ +┌──────────────────────────┐ +│ Handler executes │ async def → event loop (uvloop) +│ │ plain def → process pool (CPU auto-detect) +└────────────┬─────────────┘ + │ + ▼ +┌──────────────────────────┐ +│ _send_response() │ dict/Struct → msgspec.json.encode → bytes +│ │ Zero-copy: Rust encodes directly to bytes. +│ │ Pre-encoded headers avoid repeated .encode(). +└────────────┬─────────────┘ + │ + ▼ + Response sent + │ + ▼ (if any) + BackgroundTasks.run() +``` + +--- + +## 1. Radix Tree Router + +**File:** `FasterAPI/router.py` + +### Why not regex? + +FastAPI (via Starlette) compiles each route path into a regex pattern and checks them sequentially on every request — O(n) where n = total routes. This is fine for 10 routes, but at 100+ routes the linear scan becomes measurable overhead. + +### How the radix tree works + +Routes are decomposed into segments and inserted into a tree at startup: + +``` +Registered routes: + GET /users + GET /users/{id} + GET /users/{id}/posts + GET /health + GET /orgs/{org_id}/teams/{team_id} + +Tree structure: + + root + / \ + users health → handler + | + [leaf] → handler (GET /users) + | + {id} → handler (GET /users/{id}) + | + posts → handler (GET /users/{id}/posts) + + orgs + | + {org_id} + | + teams + | + {team_id} → handler +``` + +### Resolution algorithm + +The `_walk` method uses **iterative traversal** (not recursion) for the common case: + +```python +while idx < n: + seg = segments[idx] + child = node.children.get(seg) # Try static match first + if child is not None: + node = child; idx += 1; continue + param_child = node.children.get("*") # Then try param wildcard + if param_child is not None: + params[param_child.param_name] = seg + node = param_child; idx += 1; continue + return None # No match +``` + +**Key design choices:** + +- Static children are checked before param wildcards (most routes are static segments) +- `__slots__` on `RadixNode` eliminates per-instance `__dict__` — less memory, faster attribute access +- Path splitting uses a list comprehension (`[s for s in path.split("/") if s]`) — this hits CPython's fast C path + +### Complexity + +| Operation | Radix Tree | Regex (Starlette) | +|---|---|---| +| Lookup | O(k) where k = path segments | O(n) where n = total routes | +| 100 routes | ~3 segment checks | ~50 regex evaluations (avg) | + +--- + +## 2. Compiled Dependency Injection + +**File:** `FasterAPI/dependencies.py` + +### The problem + +FastAPI calls `inspect.signature()` and `typing.get_type_hints()` on **every request** to figure out what a handler needs. These are expensive reflection operations. + +### The solution: compile once, resolve many + +At route registration time, `compile_handler(func)` introspects the handler once and produces a tuple of `_ParamSpec` objects: + +``` +Route registration (startup): + @app.get("/users/{id}") + async def get_user(id: str = Path(), q: str = Query(None)): + ... + + compile_handler(get_user) is called immediately. + Returns: ( + _ParamSpec(name="id", kind=_KIND_PATH, ...), + _ParamSpec(name="q", kind=_KIND_QUERY, ...), + ), is_async=True +``` + +At request time, `_resolve_from_specs` iterates the pre-compiled tuple with integer kind comparisons — no reflection, no isinstance chains: + +``` +Request time (hot path): + for spec in specs: + if spec.kind == _KIND_PATH: kwargs[spec.name] = path_params[spec.name] + elif spec.kind == _KIND_QUERY: kwargs[spec.name] = request.query_params.get(...) + ... +``` + +### _ParamSpec design + +```python +class _ParamSpec: + __slots__ = ("name", "kind", "annotation", "default", "marker") +``` + +- `kind` is an integer constant (0–11), not an enum — integer comparison is faster than `isinstance` +- `__slots__` avoids `__dict__` overhead +- `@lru_cache(maxsize=512)` on `compile_handler` means the same function is never introspected twice +- Dependencies (`Depends(...)`) are compiled recursively — the entire dependency tree is pre-resolved + +--- + +## 3. Request Object — Lazy Parsing + +**File:** `FasterAPI/request.py` + +Most handlers only need 1-2 request attributes (e.g., path params and body). Parsing all headers, query params, and cookies on every request wastes time. + +FasterAPI's `Request` uses lazy properties: + +```python +@property +def headers(self) -> dict[str, str]: + h = self._headers # Check cache + if h is None: # First access → parse + raw = self._scope.get("headers", []) + h = {k.decode("latin-1").lower(): v.decode("latin-1") for k, v in raw} + self._headers = h # Cache for subsequent access + return h +``` + +The same pattern applies to `query_params`, `cookies`, and `body`. If a handler never accesses `request.cookies`, they're never parsed. + +--- + +## 4. Middleware Chain + +**File:** `FasterAPI/middleware.py` + +### How the chain is built + +Middleware is registered via `app.add_middleware(CORSMiddleware, allow_origins=["*"])` and stored as `(class, kwargs)` pairs. + +On the first request, the chain is built **once** by wrapping the core app in reverse order: + +``` +Registration order: [CORS, GZip, TrustedHost] +Build order (reversed): TrustedHost(GZip(CORS(app))) + +Request flow: + → TrustedHost.__call__ (checks Host header) + → GZip.__call__ (buffers response for compression) + → CORS.__call__ (injects CORS headers) + → app._asgi_app (route dispatch) +``` + +The built chain is cached in `self._middleware_app`. Adding middleware after the first request invalidates the cache (sets it to `None`). + +### ASGI middleware pattern + +Each middleware is a valid ASGI app that wraps another ASGI app: + +```python +class CORSMiddleware(BaseHTTPMiddleware): + def __init__(self, app, **kwargs): + self.app = app # The next app in the chain + + async def __call__(self, scope, receive, send): + if scope["type"] != "http": + await self.app(scope, receive, send) # Pass through non-HTTP + return + await self.dispatch(scope, receive, send) +``` + +--- + +## 5. Response Path — Zero-Copy JSON + +**File:** `FasterAPI/app.py` (module-level `_send_response`) + +When a handler returns a dict or msgspec Struct, the response path is: + +``` +dict → msgspec.json.encode(dict) → bytes → ASGI send + +One allocation. msgspec's Rust core converts Python objects directly +to JSON bytes without an intermediate string step. +``` + +Compare to the standard approach: + +``` +dict → json.dumps(dict) → str → str.encode("utf-8") → bytes → send + +Three allocations. Each creates a new Python object the GC must track. +``` + +Additionally, common header values are pre-encoded as module-level bytes constants: + +```python +_CT_JSON = b"application/json" +_HEADER_CT = b"content-type" +``` + +This avoids calling `.encode()` on every response. + +--- + +## 6. Event Loop — uvloop + +**File:** `FasterAPI/concurrency.py` + +uvloop replaces Python's default asyncio event loop with one backed by libuv (the same C library that powers Node.js). It handles I/O polling, callback scheduling, and timer management in C instead of Python. + +```python +def install_event_loop() -> str: + try: + import uvloop + if _PY312_PLUS: + asyncio.set_event_loop_policy(uvloop.EventLoopPolicy()) + else: + uvloop.install() + return "uvloop" + except ImportError: + return "asyncio" +``` + +This is called at module import time (`_event_loop = install_event_loop()`) so it's set before any async code runs. + +--- + +## File Map + +| File | Responsibility | +|---|---| +| `app.py` | ASGI entry point, route registration, HTTP/WS/lifespan dispatch | +| `router.py` | Radix tree router + FasterRouter (sub-router/blueprint) | +| `dependencies.py` | Compiled DI, `Depends()`, param resolution | +| `request.py` | Lazy-parsed Request object | +| `response.py` | Response classes (JSON, HTML, Streaming, File) | +| `middleware.py` | CORS, GZip, TrustedHost, HTTPS redirect | +| `concurrency.py` | uvloop, sub-interpreters, thread/process pools | +| `exceptions.py` | HTTPException, validation errors, default handlers | +| `params.py` | Path, Query, Body, Header, Cookie, File, Form descriptors | +| `background.py` | BackgroundTasks (post-response execution) | +| `websocket.py` | WebSocket connection handler | +| `datastructures.py` | UploadFile, FormData | +| `openapi/` | Auto-generated OpenAPI 3.0 schema + Swagger/ReDoc UI | diff --git a/docs/benchmark-methodology.md b/docs/benchmark-methodology.md new file mode 100644 index 0000000..c26a52a --- /dev/null +++ b/docs/benchmark-methodology.md @@ -0,0 +1,217 @@ +# Benchmark Methodology + +This page explains how FasterAPI's performance claims are measured, what hardware was used, and how to reproduce the results yourself. + +--- + +## Benchmark Types + +We run three categories of benchmarks: + +| Category | What it measures | How | +|---|---|---| +| **Component** | Individual operations (routing, JSON encode/decode) | Tight loop, `time.perf_counter()` | +| **Framework (Direct ASGI)** | Full request cycle without network overhead | Synthetic ASGI scope → `app(scope, receive, send)` | +| **End-to-End HTTP** | Real HTTP performance including server + network | `httpx.AsyncClient` against a live uvicorn server | + +The README numbers come from **Framework (Direct ASGI)** benchmarks — these isolate the framework's actual performance without conflating uvicorn's overhead. + +--- + +## Hardware & Environment + +### README Baseline (Python 3.13.7) + +``` +Machine: Apple Silicon (M-series) +OS: macOS +Python: 3.13.7 +uvloop: 0.21.x +msgspec: 0.19.x +FastAPI: 0.115.x (comparison target) +Pydantic: 2.10.x +``` + +### CI Benchmark Runner + +``` +Machine: GitHub Actions ubuntu-latest (2-core x86_64) +OS: Ubuntu 22.04 +Python: 3.13 +``` + +!!! note + CI runners are significantly slower than local Apple Silicon. The CI benchmark workflow compares **speedup ratios** (FasterAPI/FastAPI), not raw req/s, to account for hardware differences. + +--- + +## Direct ASGI Benchmark (Primary) + +This is the main benchmark used for the README results. It bypasses the network layer entirely. + +### What it does + +1. Creates both a FasterAPI and FastAPI app with identical routes +2. Constructs synthetic ASGI `scope`, `receive`, `send` functions +3. Calls `await app(scope, receive, send)` in a tight loop +4. Measures throughput in requests/second + +### Routes tested + +| Endpoint | Method | Purpose | +|---|---|---| +| `/health` | GET | Minimal handler — measures framework dispatch overhead | +| `/users/{id}` | GET | Path parameter extraction + JSON response | +| `/users` | POST | JSON body parsing + validation + response | + +### Protocol + +``` +1. Warm-up phase: 500 requests (not timed) +2. Measured phase: 50,000 requests +3. Timing: time.perf_counter() around the measured phase +4. Result: requests / elapsed_seconds +``` + +Both frameworks are benchmarked in the same process, same event loop, same Python version. This eliminates environmental variance. + +### Code + +The benchmark is in `benchmarks/compare.py`. Run with `--direct` flag: + +```bash +python benchmarks/compare.py --direct +``` + +--- + +## Component Benchmarks + +### Routing (Radix Tree vs Regex) + +``` +Setup: + - 100 routes registered (50 static, 30 single-param, 20 multi-param) + - 3 representative lookup paths tested + - 500,000 iterations × 3 paths = 1,500,000 total lookups + +Measured: ops/second for each router implementation +``` + +### JSON Encoding (msgspec vs json.dumps) + +``` +Setup: + - Dict payload: {"id": 42, "name": "test", "email": "t@t.com", "scores": [1,2,3]} + - 1,000,000 iterations + +Measured: encode ops/second +``` + +### JSON Decode + Validate (msgspec vs Pydantic v2) + +``` +Setup: + - Same payload as encoding, as raw bytes + - Decoded into a typed Struct/BaseModel + - 1,000,000 iterations + +Measured: decode+validate ops/second +``` + +--- + +## How to Reproduce + +### Prerequisites + +```bash +cd FasterAPI +python -m venv .venv +source .venv/bin/activate +pip install -e ".[dev,benchmark]" +``` + +### Run all benchmarks + +```bash +python benchmarks/compare.py +``` + +### Run only the direct ASGI benchmark (fastest, most accurate) + +```bash +python benchmarks/compare.py --direct +``` + +### Run individual component benchmarks + +```python +import time +import msgspec +import json + +data = {"id": 42, "name": "test", "scores": [1, 2, 3]} +N = 1_000_000 + +# msgspec +start = time.perf_counter() +for _ in range(N): + msgspec.json.encode(data) +msgspec_rps = N / (time.perf_counter() - start) + +# stdlib json +start = time.perf_counter() +for _ in range(N): + json.dumps(data).encode() +json_rps = N / (time.perf_counter() - start) + +print(f"msgspec: {msgspec_rps:,.0f} ops/s") +print(f"json: {json_rps:,.0f} ops/s") +print(f"speedup: {msgspec_rps/json_rps:.1f}x") +``` + +--- + +## CI Benchmark Workflow + +Every PR to `stage` or `master` triggers an automated benchmark that: + +1. Runs the direct ASGI benchmark (50,000 requests per endpoint) +2. Runs the routing benchmark (1.5M lookups) +3. Posts a comment on the PR with results + +### How to read the PR comment + +``` +| Endpoint | FasterAPI | FastAPI | Speedup | vs Baseline | +|------------------|----------------|---------------|---------|-------------| +| GET /health | 150,000/s | 22,000/s | 6.82x | ⚪ -0.4% | +| GET /users/{id} | 128,000/s | 15,000/s | 8.53x | ⚪ -2.3% | +| POST /users | 95,000/s | 13,000/s | 7.31x | 🟢 +2.2% | +``` + +- **Speedup** = FasterAPI req/s ÷ FastAPI req/s +- **vs Baseline** compares the speedup ratio against the README baseline +- 🟢 = speedup improved by >2% +- ⚪ = within noise (±5%) +- 🔴 = speedup regressed by >5% — needs investigation + +!!! note + Raw req/s on CI runners will be 2-3x lower than local Apple Silicon. This is expected. The **speedup ratio** is hardware-independent and is what matters. + +--- + +## Fairness & Methodology Notes + +1. **Same process, same loop** — Both frameworks run in the same Python process and event loop. No one gets a "warm" advantage. + +2. **Warm-up phase** — 500 requests are run before timing starts to ensure JIT-like optimizations (e.g., `__pycache__`, `lru_cache` warming) are accounted for. + +3. **Identical routes** — Both apps define the exact same endpoints with equivalent handler logic. The FasterAPI handler uses `msgspec.Struct`, FastAPI uses `pydantic.BaseModel`. + +4. **No GC interference** — The benchmark runs long enough (50K requests) that GC pauses are amortized and don't skew results. + +5. **Deterministic input** — The same request payload is used for every iteration. No randomness that could cause branch prediction differences. + +6. **Open source** — All benchmark code is in `benchmarks/compare.py`. Run it yourself. diff --git a/docs/concepts/sub-interpreters.md b/docs/concepts/sub-interpreters.md new file mode 100644 index 0000000..e41f5e9 --- /dev/null +++ b/docs/concepts/sub-interpreters.md @@ -0,0 +1,323 @@ +# Sub-Interpreters Guide + +This page is a deep dive into Python's sub-interpreter support (PEP 684 / PEP 734) and how FasterAPI uses it for CPU-bound parallelism. + +--- + +## The GIL Problem + +Python's Global Interpreter Lock (GIL) prevents true parallel execution of Python bytecode across threads in the same process. For I/O-bound work this doesn't matter (threads release the GIL during I/O), but for CPU-bound work it's a hard ceiling: + +``` +Thread 1: [compute]---[wait for GIL]---[compute]---[wait for GIL] +Thread 2: [wait for GIL]---[compute]---[wait for GIL]---[compute] + ↑ + Only one thread runs Python at a time +``` + +Before Python 3.13, the only way to achieve CPU parallelism in Python was `multiprocessing` / `ProcessPoolExecutor`: + +``` +Process 1 (own GIL): [compute]──────────────────────── +Process 2 (own GIL): [compute]──────────────────────── + ↑ + True parallel, but expensive: + - Fork/spawn overhead (~100ms) + - Memory duplication + - Arguments must be picklable + - No shared state +``` + +--- + +## PEP 684 & PEP 734: Per-Interpreter GIL + +### PEP 684 (Python 3.12) — The Foundation + +Added per-interpreter GIL support at the C level. Each sub-interpreter can optionally have its own GIL, meaning Python bytecode in different interpreters runs truly in parallel. + +**Python 3.12 status:** Available via the C API only. No Python-level access. + +### PEP 734 (Python 3.13) — The Python API + +Exposed sub-interpreters via the `interpreters` stdlib module, making them accessible from pure Python. + +```python +import interpreters + +interp = interpreters.create() +interp.call(some_function, arg1, arg2) +interp.close() +``` + +**Python 3.13 status:** Experimental. The `interpreters` module may not be available in all builds. FasterAPI detects this at import time and falls back gracefully. + +--- + +## How Sub-Interpreters Compare + +``` + ┌──────────────┬─────────────┬──────────────────┐ + │ Threads │ Processes │ Sub-Interpreters │ +├───────────────────────┼──────────────┼─────────────┼──────────────────┤ +│ True CPU parallelism │ No │ Yes │ Yes │ +│ Startup cost │ ~0.1ms │ ~100ms │ ~1ms │ +│ Memory overhead │ Low │ High │ Medium │ +│ Shared state │ Yes │ No │ No │ +│ Argument passing │ Direct │ Pickle │ Shareable* │ +│ GIL │ Shared │ Per-proc │ Per-interp │ +│ Best for │ I/O-bound │ CPU-bound │ CPU-bound │ +└───────────────────────┴──────────────┴─────────────┴──────────────────┘ + +* "Shareable" means types that implement the buffer protocol (bytes, + memoryview, some numeric types). Complex objects need serialization. +``` + +Sub-interpreters are ~100x lighter than processes. They're the closest Python analog to Go goroutines: lightweight, parallel, share-nothing by default. + +--- + +## FasterAPI's Concurrency API + +FasterAPI provides three concurrency primitives. Use the right one for your workload: + +### `run_in_subinterpreter(func, *args)` — CPU-bound work + +```python +from FasterAPI.concurrency import run_in_subinterpreter + +async def compute_hash(data: bytes) -> str: + return await run_in_subinterpreter(hashlib.sha256, data) +``` + +**What happens under the hood:** + +| Python Version | Backend | Behavior | +|---|---|---| +| 3.13+ (with `interpreters`) | Sub-interpreter pool | Own GIL, true parallelism, no pickling | +| 3.13+ (without `interpreters`) | ProcessPoolExecutor | Separate process, pickle-based | +| 3.10–3.12 | ProcessPoolExecutor | Separate process, pickle-based | + +**When to use:** + +- CPU-intensive computation (hashing, image processing, data crunching) +- Work that would block the event loop for >1ms +- Compute tasks that don't need shared mutable state + +**When NOT to use:** + +- I/O-bound work (database queries, HTTP calls) — use `async`/`await` instead +- Tasks that need access to the main interpreter's global state +- Very short computations (<0.1ms) — the dispatch overhead isn't worth it + +### `run_in_threadpool(func, *args)` — Blocking I/O + +```python +from FasterAPI.concurrency import run_in_threadpool + +async def read_legacy_file(path: str) -> bytes: + return await run_in_threadpool(open(path, "rb").read) +``` + +**When to use:** + +- Calling synchronous libraries that do I/O (file reads, legacy database drivers) +- Wrapping blocking SDK calls that don't have async versions +- Any blocking operation that would freeze the event loop + +**When NOT to use:** + +- CPU-bound work — threads share the GIL, so you get zero parallelism +- Operations that already have async versions — just `await` them directly + +### `run_in_executor(func, *args)` — Process pool + +```python +from FasterAPI.concurrency import run_in_executor + +async def heavy_computation(n: int) -> int: + return await run_in_executor(sum, range(n)) +``` + +**When to use:** + +- CPU-bound work on Python < 3.13 +- Functions with arguments that are picklable +- When you explicitly want process isolation + +**When NOT to use:** + +- Arguments that can't be pickled (open file handles, database connections, lambdas) +- On Python 3.13+ — prefer `run_in_subinterpreter` instead + +--- + +## Decision Flowchart + +``` +Is the work CPU-bound or I/O-bound? +│ +├── I/O-bound +│ │ +│ ├── Has async API? → Just use await +│ └── No async API? → run_in_threadpool() +│ +└── CPU-bound + │ + ├── Python 3.13+ with interpreters module? + │ └── run_in_subinterpreter() ← best option + │ + ├── Arguments picklable? + │ └── run_in_subinterpreter() ← falls back to ProcessPool + │ + └── Arguments not picklable? + └── Restructure to pass serializable data, + or run_in_threadpool() if parallelism isn't critical +``` + +--- + +## SubInterpreterPool Internals + +### Pool initialization + +```python +pool = SubInterpreterPool(max_workers=4) +``` + +On first `.run()` call (lazy init): + +- Creates `max_workers` sub-interpreters via `interpreters.create()` +- Creates a `ThreadPoolExecutor` with the same number of workers +- Creates an `asyncio.Semaphore(max_workers)` to limit concurrency + +### How a task executes + +``` +1. await pool.run(func, *args) +2. Acquire semaphore (limits concurrent sub-interpreter tasks) +3. Select interpreter via round-robin: id(current_task) % pool_size +4. Submit to ThreadPoolExecutor: interp.call(func, *args) +5. Thread runs func in the sub-interpreter (with its own GIL) +6. Result returned to the awaiting coroutine +``` + +``` +Main event loop thread Thread pool threads +───────────────────── ───────────────────── +await pool.run(f, x) + │ + ├─ acquire semaphore + ├─ submit to executor ──────→ Thread 1: interp_0.call(f, x) + │ (runs with interp_0's GIL) + │ (event loop continues │ + │ serving other requests) │ + │ ▼ + ├─ result ready ◄──────────── return value + └─ return result +``` + +### Fallback pool (no `interpreters` module) + +When the `interpreters` module isn't available, `SubInterpreterPool` is replaced with a drop-in that uses `ProcessPoolExecutor`: + +```python +class SubInterpreterPool: # fallback + def __init__(self, max_workers=None): + self._executor = ProcessPoolExecutor(max_workers=max_workers or cpu_count) + + async def run(self, func, *args): + loop = asyncio.get_running_loop() + return await loop.run_in_executor(self._executor, partial(func, *args)) +``` + +Same API, different backend. Your code doesn't change. + +--- + +## Examples + +### Image Thumbnail Generation + +```python +from PIL import Image +import io +from FasterAPI.concurrency import run_in_subinterpreter + +def generate_thumbnail(image_bytes: bytes, size: tuple[int, int]) -> bytes: + img = Image.open(io.BytesIO(image_bytes)) + img.thumbnail(size) + buf = io.BytesIO() + img.save(buf, format="JPEG") + return buf.getvalue() + +@app.post("/upload") +async def upload_image(file: UploadFile): + data = await file.read() + thumb = await run_in_subinterpreter(generate_thumbnail, data, (128, 128)) + return Response(content=thumb, media_type="image/jpeg") +``` + +### CPU-Bound Data Processing + +```python +import hashlib +from FasterAPI.concurrency import run_in_subinterpreter + +def compute_proof_of_work(data: bytes, difficulty: int) -> tuple[int, str]: + nonce = 0 + target = "0" * difficulty + while True: + attempt = hashlib.sha256(data + nonce.to_bytes(8, "big")).hexdigest() + if attempt.startswith(target): + return nonce, attempt + nonce += 1 + +@app.post("/mine") +async def mine_block(data: bytes): + nonce, hash_val = await run_in_subinterpreter(compute_proof_of_work, data, 4) + return {"nonce": nonce, "hash": hash_val} +``` + +### Mixed I/O and CPU + +```python +@app.get("/report/{id}") +async def generate_report(id: str): + # I/O-bound: fetch from database (runs on event loop) + raw_data = await db.fetch(f"SELECT * FROM reports WHERE id = $1", id) + + # CPU-bound: crunch the numbers (runs in sub-interpreter) + summary = await run_in_subinterpreter(analyze_data, raw_data) + + # I/O-bound: cache the result (runs on event loop) + await cache.set(f"report:{id}", summary, ttl=3600) + + return summary +``` + +--- + +## Limitations & Caveats + +1. **Module state is NOT shared.** Each sub-interpreter has its own import state. Global variables in one interpreter are invisible to others. + +2. **Not all types are shareable.** Complex objects (class instances, open connections, file handles) can't be passed directly. Pass bytes, numbers, or strings. + +3. **C extensions must be compatible.** Some C extensions aren't safe to use in sub-interpreters. If a function crashes in a sub-interpreter, try `run_in_executor` instead. + +4. **The `interpreters` module is experimental** in Python 3.13. It may change or be absent in some builds. FasterAPI always has a working fallback. + +5. **Pool sizing matters.** Default is `os.cpu_count()`. For mixed workloads, you may want fewer sub-interpreter workers to leave cores free for the event loop. + +--- + +## Version Compatibility Matrix + +| Python | Sub-interpreters | uvloop | Best CPU strategy | +|---|---|---|---| +| 3.10 | No | Yes | ProcessPoolExecutor | +| 3.11 | No | Yes | ProcessPoolExecutor | +| 3.12 | C API only | Yes (via policy) | ProcessPoolExecutor | +| 3.13 | Experimental | Yes (via policy) | SubInterpreterPool (if available) | +| 3.14+ | Expected stable | TBD | SubInterpreterPool | diff --git a/docs/migration-from-fastapi.md b/docs/migration-from-fastapi.md index b0381f3..5f8628a 100644 --- a/docs/migration-from-fastapi.md +++ b/docs/migration-from-fastapi.md @@ -74,6 +74,21 @@ class Product(msgspec.Struct): price: Price ``` +### Field aliases + +```python +# Before (Pydantic) +class Item(BaseModel): + item_name: str = Field(alias="itemName") + +# After (msgspec) — use the rename class option +class Item(msgspec.Struct, rename="camel"): + item_name: str + # Automatically serializes/deserializes as "itemName" +``` + +msgspec supports these rename strategies: `"camel"`, `"pascal"`, `"kebab"`, `"lower"`, `"upper"`. + ## 2. Routing and decorators `@app.get`, `@app.post`, `@app.put`, `@app.delete`, `@app.patch`, `@app.websocket`, @@ -217,6 +232,20 @@ async def get_user(id: int) -> UserPublic: Any code that imported directly from `starlette.*` needs updating to equivalent FasterAPI imports or plain Python/httpx alternatives. +## Migration checklist + +- [ ] Replace `fastapi` imports with `FasterAPI` imports +- [ ] Replace `FastAPI()` with `Faster()` +- [ ] Replace `APIRouter` with `FasterRouter` +- [ ] Convert `BaseModel` subclasses to `msgspec.Struct` +- [ ] Convert Pydantic validators to `__post_init__` +- [ ] Replace `model_dump()` with `msgspec.structs.asdict()` +- [ ] Replace `model_dump_json()` with `msgspec.json.encode()` +- [ ] Update custom middleware to ASGI-level dispatch +- [ ] Convert yield-based dependencies to return-based +- [ ] Test all endpoints +- [ ] Run benchmarks to verify speedup + ## Suggested migration order 1. Add **FasterAPI** beside FastAPI in a branch; swap the app factory. diff --git a/mkdocs.yml b/mkdocs.yml index 574efbf..d2c6934 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -88,10 +88,13 @@ nav: - Async / Await: concepts/async-await.md - Concurrency & Parallelism: concepts/concurrency.md - Python Type Hints: concepts/types-intro.md + - Sub-Interpreters Guide: concepts/sub-interpreters.md - How-To Recipes: how-to/index.md - Reference: - API Reference: api-reference.md + - Architecture: architecture.md - Benchmarks: benchmarks.md + - Benchmark Methodology: benchmark-methodology.md - Migrating from FastAPI: migration-from-fastapi.md - Python 3.13 & Compatibility: python-313.md - Changelog: changelog.md