diff --git a/site/content/docs/add-framework/directory-structure.md b/site/content/docs/add-framework/directory-structure.md
index 472a2def4..18ac531c2 100644
--- a/site/content/docs/add-framework/directory-structure.md
+++ b/site/content/docs/add-framework/directory-structure.md
@@ -41,9 +41,8 @@ The benchmark runner mounts these paths into your container (read-only):
 | Path | Purpose |
 |------|---------|
 | `/data/dataset.json` | 50-item dataset for `/json` endpoint |
-| `/data/benchmark.db` | SQLite database (100K rows) for `/db` endpoint |
-| `/data/static/` | 20 static files (CSS, JS, HTML, fonts, images) |
+| `/data/static/` | 20 static assets (CSS, JS, HTML, fonts, images) — 15 ship with `.gz` and `.br` sibling files for precompression-aware frameworks |
 | `/certs/server.crt` | TLS certificate for HTTPS/H2/H3 |
 | `/certs/server.key` | TLS private key for HTTPS/H2/H3 |
 
-All data mounts are provided unconditionally — your container always has access to all files regardless of which profiles it participates in.
+Postgres (profiles `async-db`, `crud`, `api-4`, `api-16`, and the compose-orchestrated gateway + production-stack) is provided by a separate sidecar container, reachable via the `DATABASE_URL` environment variable — not a mount. Redis (profile `crud`) is similarly reachable via `REDIS_URL`. See [Configuration](../../running-locally/configuration/) for the full env var list.
diff --git a/site/content/docs/add-framework/implementation-rules/tuned.md b/site/content/docs/add-framework/implementation-rules/tuned.md
index ba70dd79f..2ac2b0097 100644
--- a/site/content/docs/add-framework/implementation-rules/tuned.md
+++ b/site/content/docs/add-framework/implementation-rules/tuned.md
@@ -10,13 +10,17 @@ Tuned entries have more freedom. They can use non-default configurations, experi
 - Alternative JSON serializers (simd-json, sonic-json, etc.)
 - Custom buffer sizes and TCP socket options
 - Experimental or unstable framework flags
-- Pre-computed responses and response caching
 - Memory-mapped files and in-memory static file caching
 - Custom thread pools and worker configurations
 - Non-default GC settings without documentation requirement
 - Framework-specific performance flags not recommended for production
 - Any compression approach for static files — custom compression, pre-compressed file serving, alternative compression libraries
 
+## What is NOT allowed
+
+- **Pre-computed response bodies** — serializing a fixed response at startup and returning the same bytes per request (e.g. caching a JSON blob and writing it back unchanged). The serialization + compression work is the workload; bypassing it defeats the measurement.
+- **Response caching** — memoizing the full HTTP response body keyed by URL/params and replaying it. This is distinct from upstream data caching (DB query results, JWT verification, etc.), which remains allowed where the profile calls for it (e.g. the CRUD profile's read cache).
+
 ## What is still required
 
 - Must use the framework's HTTP server (not a raw socket replacement)
diff --git a/site/content/docs/add-framework/meta-json.md b/site/content/docs/add-framework/meta-json.md
index 023e9fdd4..4f7dc2d72 100644
--- a/site/content/docs/add-framework/meta-json.md
+++ b/site/content/docs/add-framework/meta-json.md
@@ -9,7 +9,7 @@ Create a `meta.json` file in your framework directory:
   "display_name": "your-framework",
   "language": "Go",
   "engine": "net/http",
-  "type": "framework",
+  "type": "production",
   "description": "Short description of the framework and its key features.",
   "repo": "https://github.com/org/repo",
   "enabled": true,
@@ -39,53 +39,30 @@ Create a `meta.json` file in your framework directory:
 | `baseline` | HTTP/1.1 | `/baseline11` |
 | `pipelined` | HTTP/1.1 | `/pipeline` |
 | `limited-conn` | HTTP/1.1 | `/baseline11` |
-| `json` | HTTP/1.1 | `/json/{count}` |
+| `json` | HTTP/1.1 | `/json/{count}?m=N` |
 | `json-comp` | HTTP/1.1 | `/json/{count}?m=N` (must honor `Accept-Encoding: gzip, br`) |
-| `json-tls` | HTTP/1.1 + TLS | `/json/{count}?m=N` on port 8081 (ALPN `http/1.1`) |
+| `json-tls` | HTTP/1.1 + TLS | `/json/{count}?m=N` (port 8081, ALPN `http/1.1`) |
 | `upload` | HTTP/1.1 | `/upload` |
-| `static` | HTTP/1.1 | `/static/*` (port 8080) |
-| `async-db` | HTTP/1.1 | `/async-db?limit=N` (requires `DATABASE_URL` env var) |
 | `api-4` | HTTP/1.1 | `/baseline11`, `/json/{count}`, `/async-db` (4 CPU, 16 GB) |
 | `api-16` | HTTP/1.1 | `/baseline11`, `/json/{count}`, `/async-db` (16 CPU, 32 GB) |
+| `static` | HTTP/1.1 | `/static/*` (port 8080) |
+| `async-db` | HTTP/1.1 | `/async-db?min=X&max=Y&limit=N` (requires `DATABASE_URL`) |
+| `crud` | HTTP/1.1 | `/api/items`, `/api/items/{id}` (GET/POST/PUT; requires `DATABASE_URL`, optional `REDIS_URL`) |
 | `baseline-h2` | HTTP/2 | `/baseline2` (TLS, port 8443) |
 | `static-h2` | HTTP/2 | `/static/*` (TLS, port 8443) |
-| `gateway-64` | HTTP/2 | `/static/*`, `/json`, `/async-db` via reverse proxy (TLS, port 8443) |
+| `baseline-h2c` | HTTP/2 cleartext | `/baseline2` (port 8082, prior-knowledge) |
+| `json-h2c` | HTTP/2 cleartext | `/json/{count}?m=N` (port 8082, prior-knowledge) |
 | `baseline-h3` | HTTP/3 | `/baseline2` (QUIC, port 8443) |
 | `static-h3` | HTTP/3 | `/static/*` (QUIC, port 8443) |
+| `gateway-64` | HTTP/2 | Compose stack serving `/static/*`, `/json`, `/async-db`, `/baseline2` (TLS, port 8443) |
+| `gateway-h3` | HTTP/3 | Compose stack serving `/static/*`, `/json`, `/async-db`, `/baseline2` (QUIC, port 8443) |
+| `production-stack` | HTTP/2 | Compose stack: edge + JWT auth sidecar + Redis + server (TLS, port 8443) |
 | `unary-grpc` | gRPC | `BenchmarkService/GetSum` (h2c, port 8080) |
 | `unary-grpc-tls` | gRPC | `BenchmarkService/GetSum` (TLS, port 8443) |
+| `stream-grpc` | gRPC | `BenchmarkService/StreamSum` (h2c, port 8080) |
+| `stream-grpc-tls` | gRPC | `BenchmarkService/StreamSum` (TLS, port 8443) |
 | `echo-ws` | WebSocket | `/ws` echo (port 8080) |
 
 Only include profiles your framework supports. Frameworks missing a profile simply don't appear in that profile's leaderboard.
 
-### async-db
-
-The `async-db` profile requires an async PostgreSQL driver. The benchmark script starts a Postgres sidecar with 100K rows and passes `DATABASE_URL=postgres://bench:bench@localhost:5432/benchmark` to your container. Your framework must:
-
-1. Connect to Postgres using the `DATABASE_URL` environment variable
-2. Implement `GET /async-db?min=X&max=Y&limit=N` that queries: `SELECT id, name, category, price, quantity, active, tags, rating_score, rating_count FROM items WHERE price BETWEEN $1 AND $2 LIMIT $3`
-3. Return JSON: `{"items": [...], "count": N}` with nested `rating: {score, count}` and `tags` as a JSON array
-4. Return `{"items":[],"count":0}` if the database is unavailable
-5. Use lazy connection initialization — retry connecting if Postgres isn't ready at startup
-
-### gateway-64
-
-The `gateway-64` profile tests your framework as part of a complete deployment stack over HTTP/2 with TLS. Unlike other tests that run a single container, this test uses **Docker Compose** to orchestrate multi-container deployments — typically a reverse proxy in front of an application server, but any architecture is allowed.
-
-**Quick start:**
-
-1. Create a `compose.gateway.yml` in your framework directory
-2. Define your services (proxy, server, cache — whatever you need)
-3. Pin each service to specific CPUs using `cpuset` — total must be exactly 64 logical CPUs (0-31 + 64-95), always in physical+SMT pairs (core N and N+64 together)
-4. All services must use `network_mode: host`, `security_opt: [seccomp:unconfined]`, and appropriate ulimits
-5. Use `${CERTS_DIR}`, `${DATA_DIR}`, and `${DATABASE_URL}` env vars — they are exported by the benchmark script
-6. Port **8443** must serve HTTPS/H2 — this is where the load generator sends requests
-7. The stack must implement `/static/*`, `/json`, `/async-db`, and `/baseline2` endpoints
-
-**What makes this different from other tests:**
-- You control the full architecture via Docker Compose
-- Multiple containers compete for a shared 64-CPU budget
-- The proxy, caching layer, and internal protocol choices are all part of the benchmark
-- Static files can be served directly by the proxy (e.g., Nginx) instead of the application server
-
-See the [Gateway-64 implementation guide](/docs/test-profiles/gateway/gateway-h2/implementation) for detailed documentation, three complete compose examples (two-tier, three-tier, and single-tier), CPU topology rules, and proxy configuration options.
+Per-profile endpoint contracts, request/response shapes, and validation rules live under the [Test Profiles](/docs/test-profiles/) section — link to the specific profile's Implementation page from your PR description when adding a new framework.
diff --git a/site/content/docs/load-generators/_index.md b/site/content/docs/load-generators/_index.md
index 1f7e3d5cc..f601c04e1 100644
--- a/site/content/docs/load-generators/_index.md
+++ b/site/content/docs/load-generators/_index.md
@@ -11,4 +11,5 @@ HttpArena uses a different load generator for each transport / workload.
   {{< card link="h2" title="HTTP/2" subtitle="h2load — nghttp2's load generator with TLS and stream multiplexing." icon="globe-alt" >}}
   {{< card link="h3" title="HTTP/3" subtitle="h2load-h3 — nghttp2 + ngtcp2 for QUIC-based HTTP/3 benchmarks." icon="globe-alt" >}}
   {{< card link="grpc" title="gRPC" subtitle="ghz — proto-aware gRPC load tester for streaming and unary RPCs." icon="globe-alt" >}}
+  {{< card link="ws" title="WebSocket" subtitle="gcannon --ws — io_uring WebSocket echo driver reusing the HTTP/1.1 engine with a frame-aware send/recv loop." icon="globe-alt" >}}
 {{< /cards >}}
diff --git a/site/content/docs/load-generators/ws/_index.md b/site/content/docs/load-generators/ws/_index.md
new file mode 100644
index 000000000..6ac589c08
--- /dev/null
+++ b/site/content/docs/load-generators/ws/_index.md
@@ -0,0 +1,56 @@
+---
+title: WebSocket
+---
+
+HttpArena drives the `echo-ws` profile with **gcannon in `--ws` mode**. The same io_uring engine documented under [HTTP/1.1 → gcannon](../h1/gcannon/) is reused here — worker threads, per-thread provided-buffer rings, multishot receives, per-connection state — with a frame-aware send/recv loop layered on top. Using one tool across transports keeps the client-side ceiling, threading model, and CPU-pinning behavior consistent so differences in the measurement land on the server, not the generator.
+
+## Handshake
+
+Each worker opens TCP connections and issues an HTTP/1.1 upgrade request to the target URL (typically `http://localhost:8080/ws`). The server must respond with `HTTP/1.1 101 Switching Protocols` and the correct `Sec-WebSocket-Accept` value derived from the client's `Sec-WebSocket-Key`. Connections that fail the handshake are reported as reconnects; the validator ([WebSocket validation](../../test-profiles/ws/echo/validation/)) checks the handshake path separately and catches framework-side bugs before benchmarks run.
+
+## Echo loop
+
+Once upgraded, each connection runs the steady-state loop:
+
+1. Build a masked client-to-server text frame with a short payload
+2. Send the frame via `io_uring_prep_send`
+3. Wait for the server to echo it back (matched server-to-client frame)
+4. On receipt, increment the per-thread frame counter and immediately send the next frame
+
+Pipeline depth is 1 for the `echo-ws` profile — one message in flight per connection — so the measurement is effectively a back-to-back request/response loop rather than a batched burst. With thousands of concurrent connections each running this loop in parallel, the steady-state throughput reflects the server's ability to multiplex WebSocket frames across a large connection count without head-of-line blocking.
+
+Both text frames (opcode `0x1`) and binary frames (opcode `0x2`) are exercised against the server during validation; benchmark runs use the text shape for simplicity. Framing follows RFC 6455: masked from client to server, unmasked from server to client, FIN bit set on every frame (no fragmented messages in the benchmark path).
+
+## Command-line usage
+
+```bash
+gcannon http://localhost:8080/ws --ws \
+        -c <connections> -t <threads> -d <duration> -p 1
+```
+
+| Flag | Description |
+|------|-------------|
+| `<url>` | The WebSocket endpoint served over HTTP/1.1 (uses `http://` scheme; the upgrade is implicit) |
+| `--ws` | Switches gcannon from HTTP request mode into WebSocket echo mode |
+| `-c` | Total concurrent connections (distributed evenly across `-t` threads) |
+| `-t` | Worker threads (each owns an io_uring and a slice of connections; defaults to `$THREADS=64`) |
+| `-d` | Test duration — `5s` for `echo-ws` |
+| `-p` | Pipeline depth — fixed at `1` for `echo-ws` (one message in flight per connection) |
+
+The profile dispatcher (`scripts/lib/tools/gcannon.sh:ws-echo`) wires all of this automatically when you invoke `./scripts/benchmark.sh <framework> echo-ws`.
+
+## Output shape
+
+gcannon reports WebSocket results with the same layout as HTTP requests, except the summary line reads "frames sent / frames received" instead of "requests / responses":
+
+```
+  2400000 frames sent     in 5.00s, 2400000 frames received
+  Throughput: 480.00K frames/s
+  WS frames: 2400000
+```
+
+The parser (`gcannon_parse ws-echo`) records `frames received` as the `status_2xx` equivalent and divides by the measured duration to produce the headline RPS number shown on the [WebSocket leaderboard](/leaderboards/websocket/). One echo round-trip counts as one unit — the frames-received count from the client side, not frames-sent, because the metric is "how many echoes the framework completed," not "how many messages the benchmarker pushed into the socket."
+
+## Why not a dedicated WebSocket tool
+
+The two common alternatives — `wrk2` with a Lua WebSocket plugin, or `artillery` — either can't saturate the server at 64-core scale (GC + per-connection Lua overhead becomes the bottleneck) or produce non-deterministic per-thread CPU pinning that makes cross-framework comparison unreliable. Reusing gcannon means the generator's tuning story is the same one already vetted against the HTTP/1.1 profiles, and the operator-side flags (`$GCANNON_CPUS`, cpuset pinning, provided buffer ring sizing) compose identically.
diff --git a/site/content/docs/running-locally/configuration.md b/site/content/docs/running-locally/configuration.md
index b25c18fe5..4e6760b36 100644
--- a/site/content/docs/running-locally/configuration.md
+++ b/site/content/docs/running-locally/configuration.md
@@ -14,18 +14,19 @@ Defined in `scripts/lib/common.sh`. Override by exporting before you run the scr
 | `DURATION` | `5s` | Load-test duration per run (`-d`/`-D` passed through to the tool). |
 | `RUNS` | `3` | Measurement iterations per (profile, connection count). Best wins. |
 | `THREADS` | `64` | gcannon / wrk worker threads. |
-| `H2THREADS` | `128` | h2load worker threads (HTTP/2, h2c gRPC). |
+| `H2THREADS` | `64` | h2load worker threads (HTTP/2, h2c gRPC). |
 | `H3THREADS` | `64` | h2load-h3 worker threads (HTTP/3 over QUIC). |
 
-In `benchmark-lite.sh`, `THREADS` / `H2THREADS` / `H3THREADS` all default to `nproc / 2` instead.
+In `benchmark-lite.sh`, `THREADS` defaults to `max(nproc / 2, 1)` and `H2THREADS` / `H3THREADS` mirror `$THREADS`. Pass `--load-threads N` to override all three in one shot.
 
 ## Ports
 
 | Variable | Default | Description |
 |---|---|---|
-| `PORT` | `8080` | HTTP/1.1 — also h2c for gRPC. |
-| `H2PORT` | `8443` | HTTPS, HTTP/2 TLS, HTTP/3 QUIC, gRPC-TLS. |
-| `H1TLS_PORT` | `8081` | HTTP/1.1 + TLS, used only by the `json-tls` profile. |
+| `PORT` | `8080` | HTTP/1.1 plaintext (all `h1*` profiles + `echo-ws`); also h2c for gRPC (`unary-grpc`, `stream-grpc` — prior-knowledge on the same socket). |
+| `H2PORT` | `8443` | HTTPS / HTTP/2 over TLS (`baseline-h2`, `static-h2`, gateway + production-stack), HTTP/3 over QUIC (`baseline-h3`, `static-h3`, `gateway-h3`), and gRPC-TLS (`unary-grpc-tls`, `stream-grpc-tls`). |
+| `H1TLS_PORT` | `8081` | HTTP/1.1 + TLS, used only by the `json-tls` profile (ALPN `http/1.1`). |
+| `H2C_PORT` | `8082` | HTTP/2 cleartext prior-knowledge for the `baseline-h2c` and `json-h2c` profiles. Must be a dedicated listener that refuses HTTP/1.1 — the validator checks this explicitly. |
 
 Every framework `Dockerfile` reads the same defaults from its env, so you rarely need to change these.
 
@@ -83,10 +84,10 @@ From `endpoint_tool()` in `scripts/lib/profiles.sh`:
 | Endpoint | Tool |
 |---|---|
 | `static`, `json-tls` | wrk |
-| `h2`, `static-h2`, `gateway-64`, `grpc`, `grpc-tls` | h2load |
-| `h3`, `static-h3` | h2load-h3 |
+| `h2`, `static-h2`, `h2c`, `json-h2c`, `gateway-64`, `grpc`, `grpc-tls`, `production-stack` | h2load |
+| `h3`, `static-h3`, `gateway-h3` | h2load-h3 |
 | `grpc-stream`, `grpc-stream-tls` | ghz |
-| everything else (`""`, `pipeline`, `upload`, `api-4`, `api-16`, `async-db`, `json`, `json-compressed`, `ws-echo`, …) | gcannon |
+| everything else (`""`, `pipeline`, `upload`, `api-4`, `api-16`, `async-db`, `crud`, `json`, `json-compressed`, `ws-echo`) | gcannon |
 
 ## Small-machine overrides
 
diff --git a/site/content/docs/running-locally/scripts/benchmark-lite.md b/site/content/docs/running-locally/scripts/benchmark-lite.md
index 21b2025e6..b68174350 100644
--- a/site/content/docs/running-locally/scripts/benchmark-lite.md
+++ b/site/content/docs/running-locally/scripts/benchmark-lite.md
@@ -12,8 +12,8 @@ weight: 4
 | Default load generators | Native binaries | **Always** docker (forced — no env override) |
 | CPU pinning | Per-profile `--cpuset-cpus` | None — all containers see every core |
 | `THREADS` default | 64 | `nproc / 2` |
-| `H2THREADS` / `H3THREADS` default | 128 / 64 | Same as `THREADS` |
-| Profile set | 21 profiles | 15 — skips `api-4`, `api-16`, `json-tls`, `gateway-64`, `stream-grpc`, `stream-grpc-tls` |
+| `H2THREADS` / `H3THREADS` default | 64 / 64 | Same as `THREADS` |
+| Profile set | 26 profiles | 15 — skips `api-4`, `api-16`, `json-tls`, `crud`, `baseline-h2c`, `json-h2c`, `gateway-64`, `gateway-h3`, `production-stack`, `stream-grpc`, `stream-grpc-tls` |
 | Connection counts | Varies (512, 1024, 4096, 16384, …) | One per profile (mostly 512; upload 128; h3 64) |
 | Framework selection | One framework, always | Optional — runs every enabled framework if omitted |
 
diff --git a/site/content/docs/running-locally/scripts/benchmark.md b/site/content/docs/running-locally/scripts/benchmark.md
index 643e8fa2a..e193b0afb 100644
--- a/site/content/docs/running-locally/scripts/benchmark.md
+++ b/site/content/docs/running-locally/scripts/benchmark.md
@@ -64,16 +64,17 @@ Set via `VAR=value ./scripts/benchmark.sh ...` or `export VAR=value`.
 | `DURATION` | `5s` | `-d`/`-D` value passed to each load generator. |
 | `RUNS` | `3` | Measurement iterations per (profile, conns). Best result wins. |
 | `THREADS` | `64` | Load-generator threads for gcannon, wrk, and the default path. |
-| `H2THREADS` | `128` | h2load worker threads (h2, h2c gRPC). |
+| `H2THREADS` | `64` | h2load worker threads (h2, h2c gRPC). |
 | `H3THREADS` | `64` | h2load-h3 worker threads (HTTP/3 over QUIC). |
 
 ### Ports
 
 | Variable | Default | Description |
 |---|---|---|
-| `PORT` | `8080` | HTTP/1.1 (and h2c for gRPC). |
-| `H2PORT` | `8443` | HTTPS, HTTP/2 TLS, HTTP/3 QUIC, gRPC-TLS. |
+| `PORT` | `8080` | HTTP/1.1 plaintext (all `h1*` profiles + `echo-ws`); also h2c for gRPC (`unary-grpc`, `stream-grpc`). |
+| `H2PORT` | `8443` | HTTPS / HTTP/2 TLS (`baseline-h2`, `static-h2`, gateway + production-stack), HTTP/3 QUIC (`baseline-h3`, `static-h3`, `gateway-h3`), gRPC-TLS (`unary-grpc-tls`, `stream-grpc-tls`). |
 | `H1TLS_PORT` | `8081` | HTTP/1.1 + TLS — only used by the `json-tls` profile. |
+| `H2C_PORT` | `8082` | HTTP/2 cleartext prior-knowledge for `baseline-h2c` and `json-h2c`. Must refuse HTTP/1.1 — the validator checks this. |
 
 ### Load generator selection
 
@@ -93,11 +94,11 @@ LOADGEN_DOCKER=true ./scripts/benchmark.sh aspnet-minimal
 
 | Variable | Default | Used for |
 |---|---|---|
-| `GCANNON` | `gcannon` | Native binary — baseline, pipelined, limited-conn, json, json-comp, upload, api-4/16, async-db, echo-ws. |
+| `GCANNON` | `gcannon` | Native binary — baseline, pipelined, limited-conn, json, json-comp, upload, api-4/16, async-db, crud, echo-ws. |
 | `GCANNON_IMAGE` | `gcannon:latest` | Docker image when `LOADGEN_DOCKER=true`. |
-| `H2LOAD` | `h2load` | Native binary — baseline-h2, static-h2, unary-grpc, unary-grpc-tls, gateway-64. |
+| `H2LOAD` | `h2load` | Native binary — baseline-h2, static-h2, baseline-h2c, json-h2c, unary-grpc, unary-grpc-tls, gateway-64, production-stack. |
 | `H2LOAD_IMAGE` | `h2load:latest` | Docker image (Ubuntu 24.04 + glibc build; do **not** use the alpine/musl image — it's 20–40% slower). |
-| `H2LOAD_H3` | `h2load-h3` | Native binary — baseline-h3, static-h3. |
+| `H2LOAD_H3` | `h2load-h3` | Native binary — baseline-h3, static-h3, gateway-h3. |
 | `H2LOAD_H3_IMAGE` | `h2load-h3:local` | Docker image with `quictls` + `nghttp3` + `ngtcp2` + `nghttp2 --enable-http3` built from source. |
 | `WRK` | `wrk` | Native binary — static, json-tls. |
 | `WRK_IMAGE` | `wrk:local` | Docker image. |
@@ -109,7 +110,7 @@ LOADGEN_DOCKER=true ./scripts/benchmark.sh aspnet-minimal
 | Variable | Default | Description |
 |---|---|---|
 | `PG_CONTAINER` | `httparena-postgres` | Name of the sidecar container. |
-| `DATABASE_URL` | `postgres://bench:bench@localhost:5432/benchmark` | Passed to framework containers for `async-db`, `api-4`, `api-16`, `gateway-64`. |
+| `DATABASE_URL` | `postgres://bench:bench@localhost:5432/benchmark` | Passed to framework containers for `async-db`, `crud`, `api-4`, `api-16`, `gateway-64`, `gateway-h3`, `production-stack`. |
 
 ## Profiles
 
diff --git a/site/content/docs/running-locally/scripts/run.md b/site/content/docs/running-locally/scripts/run.md
index 7721207cf..dab8aa41d 100644
--- a/site/content/docs/running-locally/scripts/run.md
+++ b/site/content/docs/running-locally/scripts/run.md
@@ -13,7 +13,7 @@ Run a framework's Docker container interactively for manual testing. Builds the
 
 1. Builds the Docker image for the framework (or runs `build.sh` if one exists)
 2. Starts a Postgres sidecar container with the seeded benchmark database
-3. Mounts all data files unconditionally — datasets, static files, benchmark.db, TLS certs
+3. Mounts all data files unconditionally — datasets, static files, TLS certs
 4. Sets `DATABASE_URL` and `DATABASE_MAX_CONN` environment variables
 5. Runs the container attached so logs stream to your terminal
 6. Cleans up all containers on exit (Ctrl+C or script termination)
@@ -37,7 +37,6 @@ Uses `--network host` so the container binds directly to the host's network inte
 
 # In another terminal:
 curl http://localhost:8080/baseline11?a=1&b=2
-curl http://localhost:8080/json
-curl http://localhost:8080/db?min=10&max=50
-curl http://localhost:8080/async-db?min=10&max=50
+curl http://localhost:8080/json/5?m=3
+curl http://localhost:8080/async-db?min=10&max=50&limit=20
 ```
diff --git a/site/content/docs/scoring/_index.md b/site/content/docs/scoring/_index.md
index 7f1b5d4d2..7d187dd0b 100644
--- a/site/content/docs/scoring/_index.md
+++ b/site/content/docs/scoring/_index.md
@@ -7,5 +7,5 @@ weight: 5
 How HttpArena computes the composite score that ranks frameworks across all test profiles.
 
 {{< cards >}}
-  {{< card link="composite-score" title="Composite Score" subtitle="Normalized arithmetic mean across scored profiles with optional CPU and memory efficiency factors." icon="chart-bar" >}}
+  {{< card link="composite-score" title="Composite Score" subtitle="Sum of per-profile normalized scores (0–100 each) across all scored profiles, with an optional memory-efficiency bonus." icon="chart-bar" >}}
 {{< /cards >}}
diff --git a/site/content/docs/scoring/composite-score.md b/site/content/docs/scoring/composite-score.md
index bdc045ece..00a05dbe9 100644
--- a/site/content/docs/scoring/composite-score.md
+++ b/site/content/docs/scoring/composite-score.md
@@ -30,7 +30,7 @@ The final composite score is the **sum** of per-profile scores across all **scor
 composite = sum(scored_profile_scores)
 ```
 
-Summing instead of averaging means the composite scales with the number of scored profiles: a framework that places well in many profiles separates cleanly from one that only wins a single profile. A perfect-across-the-board framework earns 100 points per profile, so with ~15 scored profiles the ceiling is around 1,500.
+Summing instead of averaging means the composite scales with the number of scored profiles: a framework that places well in many profiles separates cleanly from one that only wins a single profile. A perfect-across-the-board framework earns 100 points per profile, so with the current 26 scored profiles for production/tuned entries the raw-throughput ceiling is ~2,600, rising to ~3,900 when the memory-efficiency toggle is on (each profile adds up to 50 more points). Engine and infrastructure entries are scored on smaller subsets and have correspondingly lower ceilings.
 
 Frameworks that don't participate in a scored profile receive 0 for that profile, which lowers their composite by the full 100-point ceiling of that profile.
 
diff --git a/site/content/docs/test-profiles/_index.md b/site/content/docs/test-profiles/_index.md
index 3814fc647..5cffda6f8 100644
--- a/site/content/docs/test-profiles/_index.md
+++ b/site/content/docs/test-profiles/_index.md
@@ -12,12 +12,15 @@ Each profile is run at multiple connection counts to show how frameworks scale u
 
 ## Benchmark parameters
 
+Five load generators are dispatched per profile — each one is built for a specific protocol + workload shape. See [Load Generators](../load-generators/) for per-tool details.
+
 | Parameter | Value |
 |-----------|-------|
-| Threads | 64 (gcannon) / 128 (h2load) |
-| Duration | 5s |
-| Runs | 3 (best taken) |
-| Networking | Docker `--network host` |
+| Load generators | `gcannon` (HTTP/1.1, upload, WebSocket), `wrk` (static + json-tls rotation), `h2load` (HTTP/2, h2c, gateway), `h2load-h3` (HTTP/3 / QUIC), `ghz` (gRPC) |
+| Threads | 64 for `gcannon` / `wrk` / `h2load` / `h2load-h3` (`$THREADS` / `$H2THREADS` / `$H3THREADS`); `ghz` scales workers dynamically as `connections × 4` |
+| Duration | 5s default; `async-db` 10s; `api-4`, `api-16`, `crud` 15s (hardcoded in the profile dispatcher) |
+| Runs | 3 per (profile, connection count) — best RPS wins |
+| Networking | Docker `--network host` for all containers (server + load generator + Postgres + Redis sidecars) |
 
 ## Data mounts
 
@@ -25,10 +28,21 @@ Data files are **mounted automatically** by the benchmark runner — your Docker
 
 | Path | Description |
 |------|-------------|
-| `/data/dataset.json` | 50-item dataset for `/json` |
-| `/data/static/` | 20 static files for `/static/*` |
-| `/certs/server.crt`, `/certs/server.key` | TLS certificate and key for HTTPS/H2/H3 |
-| `DATABASE_URL` env var | Postgres connection string for `/async-db` (set automatically when `async-db` profile runs) |
+| `/data/dataset.json` | 50-item dataset for `/json`, `/db`, and `/async-db` |
+| `/data/static/` | 20 static assets for `/static/*` (HTML, JS, CSS, SVG, WebP, woff2, JSON). 15 assets ship with pre-built `.gz` and `.br` sibling files (e.g. `app.js`, `app.js.gz`, `app.js.br`) so frameworks that support precompressed serving can skip on-the-fly compression. The 5 already-binary formats (`hero.webp`, `thumb1.webp`, `thumb2.webp`, `bold.woff2`, `regular.woff2`) have no precompressed variants. See the [Static](h1/isolated/static/) profile for how to wire Accept-Encoding lookup. |
+| `/certs/server.crt`, `/certs/server.key` | TLS certificate and key for HTTPS / H2 / H2 h2c (port 8082 is cleartext) / H3 |
+
+## Environment variables
+
+Set by the benchmark runner when the relevant profile runs — your process will see them via `os.environ` / `std::env::var` / equivalent.
+
+| Variable | Profiles | Value |
+|----------|----------|-------|
+| `DATABASE_URL` | `async-db`, `crud`, `api-4`, `api-16` | Postgres connection string (`postgres://bench:bench@127.0.0.1:5432/benchmark`) |
+| `DATABASE_MAX_CONN` | same as above | `256` — the Postgres sidecar's `max_connections`; size your pool ≤ this |
+| `REDIS_URL` | `crud` | `redis://127.0.0.1:6379` — multi-process frameworks can use Redis as a cross-process cache; single-heap frameworks (Go, ASP.NET, etc.) typically ignore it and keep their in-process cache |
+
+Gateway and `production-stack` profiles are compose-orchestrated, so their services receive additional env (e.g. `JWT_SECRET` for the production-stack auth sidecar) via their `compose.*.yml` files rather than through the runner. See the per-profile pages under [Gateway](gateway/) for details.
 
 {{< cards >}}
   {{< card link="h1" title="H/1.1" subtitle="Isolated single-endpoint benchmarks and multi-endpoint workload mixes over plain TCP." icon="lightning-bolt" >}}
diff --git a/site/content/docs/test-profiles/h1/isolated/async-database/implementation.md b/site/content/docs/test-profiles/h1/isolated/async-database/implementation.md
index 2c4802ca9..7bf86cf92 100644
--- a/site/content/docs/test-profiles/h1/isolated/async-database/implementation.md
+++ b/site/content/docs/test-profiles/h1/isolated/async-database/implementation.md
@@ -4,7 +4,7 @@ title: Implementation Guidelines
 {{< type-rules production="Must use an async PostgreSQL driver with standard connection pooling. Size the pool from `DATABASE_MAX_CONN` (currently 256), not from CPU count." tuned="May use custom pool sizes, prepared statement caching, or driver-specific optimizations beyond defaults." engine="No specific rules." >}}
 
 
-The Async Database profile measures how efficiently a framework handles concurrent database queries over a network connection. Unlike the [synchronous SQLite `/db` endpoint](../../database) (CPU-bound, tested as the `sync-db` profile), this test exercises async I/O scheduling, connection pooling, and async Postgres driver efficiency.
+The Async Database profile measures how efficiently a framework handles concurrent database queries over a network connection — exercising async I/O scheduling, connection pooling, and async Postgres driver efficiency.
 
 **This test is for framework-type entries only** - engines (nginx, h2o, etc.) are excluded.
 
@@ -30,7 +30,7 @@ The Async Database profile measures how efficiently a framework handles concurre
 
 ## Database schema
 
-The `items` table in Postgres (100,000 rows, identical logical data to the SQLite `benchmark.db`):
+The `items` table in Postgres (100,000 rows):
 
 ```sql
 CREATE TABLE items (
@@ -47,10 +47,6 @@ CREATE TABLE items (
 -- No index on price - forces sequential scan
 ```
 
-Key differences from the SQLite schema:
-- `active` is `BOOLEAN` (not `INTEGER 0/1`) - no conversion needed
-- `tags` is `JSONB` (not `TEXT`) - no JSON string parsing needed
-
 ## SQL query
 
 ```sql
@@ -108,7 +104,7 @@ The benchmark runner provides these environment variables to your container:
 - **Prepared statements** - prepare the query once per connection, reuse across requests
 - **Default parameters** - all three query parameters are integers. If `min` or `max` is missing, default to `10` and `50`. If `limit` is missing, default to `50`. Clamp `limit` to the range 1–50
 - **Integer types matter** - `price` and `rating_score` are `INTEGER` columns. Read them as `i32`/`int`/equivalent — using `f64`/`double` will fail with type-mismatch errors in strict drivers like `tokio-postgres`
-- **Tags are JSONB** - Postgres returns them as native JSON, no string parsing needed (unlike the SQLite `/db` endpoint)
+- **Tags are JSONB** - Postgres returns them as native JSON, no string parsing needed
 
 ## Important: environment variables and initialization