Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,7 @@ CueAPI does one thing: schedules tasks and tells you whether they succeeded.
| [Configuration](docs/configuration.md) | Every environment variable documented |
| [Workers](docs/workers.md) | Worker transport setup and scaling |
| [Production](docs/production.md) | PostgreSQL, Redis, reverse proxy, monitoring |
| [Body-integrity verification](docs/body-verify.md) | Opt-in echo-back primitive that detects caller-side shell expansion + silent body corruption |
| [FAQ](docs/faq.md) | Common self-hosting questions |

**Stack:** FastAPI, PostgreSQL 16, Redis 7, Docker
Expand Down
113 changes: 113 additions & 0 deletions docs/body-verify.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Body integrity verification

cueapi-core ships an opt-in body-integrity primitive that protects against silent body-content corruption — most commonly caller-side shell expansion of metacharacters (`$(...)`, backticks, `${VAR}`) in body arguments BEFORE the request leaves the caller's environment.

This is the open-core substrate behind the SDK-level auto-verify in [cueapi-python](https://github.com/cueapi/cueapi-python) and [cueapi-cli](https://github.com/cueapi/cueapi-cli). Self-hosters running cueapi-core get the substrate for free; their SDKs talk to it the same way the hosted SDKs do.

## The bug class this prevents

```bash
BODY="message with $(date) timestamp"
curl -X POST https://your-cueapi-instance/v1/messages \
-H "Authorization: Bearer $CUEAPI_API_KEY" \
-H "X-Cueapi-From-Agent: $SENDER" \
-H "Content-Type: application/json" \
-d "{\"to\":\"$RECIPIENT\",\"body\":\"$BODY\"}"
```

Bash evaluates `$(date)` at variable-assignment time. The body you THOUGHT you were sending is gone before curl runs. The substrate accepts the mutated JSON with HTTP 200. The recipient sees corrupted content. No fail-loud signal anywhere in the pipeline.

LLM-generated curl examples, CI scripts, and bash automation reproduce the pattern at scale. The verify-echo primitive surfaces it loudly.

## How it works

Add `X-CueAPI-Verify-Echo: true` to any POST/PATCH/PUT request with a body. The substrate echoes the body it received back in the response under `body_received` (plus a SHA-256 hex digest under `body_received_sha256` for constant-cost compare):

```bash
curl -X POST https://your-cueapi-instance/v1/messages \
-H "Authorization: Bearer $CUEAPI_API_KEY" \
-H "X-Cueapi-From-Agent: $SENDER" \
-H "X-CueAPI-Verify-Echo: true" \
-H "Content-Type: application/json" \
-d @/tmp/body.json
```

Response (truncated):

```json
{
"id": "msg_xxx",
"delivered": true,
"body_received": "the verbatim body the server received",
"body_received_sha256": "<64-hex sha256 of body field bytes>"
}
```

Caller computes `sha256(body_bytes)` locally; compares to `body_received_sha256`; throws on drift.

The header is opt-in. Clients without it see no behavior change — `body_received` and `body_received_sha256` are absent from the response.

## Substrate architecture

Two layers ship in cueapi-core:

**Layer 1** — Endpoint-specific echo (Phase 1): `POST /v1/messages` and `POST /v1/cues/{cue_id}/fire` extract their canonical string body field (`MessageCreate.body`, `payload_override.message` if present on fire) and echo it as a STRING. SDKs verify against the string directly.

**Layer 1.5** — Universal middleware (Phase 2): all other POST/PATCH/PUT endpoints get echo coverage via `app/middleware/verify_echo.py`. The middleware echoes the parsed request body as a JSON object. 52 endpoints covered; method-gated (GET ignored); status-gated (4xx/5xx not echoed; validation errors stay clean).

| Endpoint shape | `body_received` shape |
|---|---|
| `POST /v1/messages` | STRING — `MessageCreate.body` verbatim |
| `POST /v1/cues/{id}/fire` | STRING — `payload_override.message` (or null when no body) |
| All other POST/PATCH/PUT | OBJECT — parsed request body as JSON |

`body_received_sha256` is always a 64-character hex string of those exact UTF-8 bytes.

## SDK auto-verify

The official SDKs use `X-CueAPI-Verify-Echo: true` automatically and raise on mismatch:

| SDK | Messages | Cues fire | Opt-out |
|---|---|---|---|
| **cueapi-python** | default-on | opt-in (`auto_verify=True`) | `auto_verify=False` |
| **cueapi-cli** | default-on | opt-in (`--verify`) | `--no-verify` |
| **cueapi-mcp** | default-on | opt-in (`auto_verify: true`) | omit flag |
| **cueapi-action** | default-on (`no-verify: "true"` opts out) | opt-in (`verify: "true"`) | — |

Why fire is opt-in everywhere: the substrate's `/v1/cues/{cue_id}/fire` echoes a pydantic-after-parse body that may include server-side default-population, causing spurious diff vs the SDK's canonical-JSON serialization. The default-off design avoids false-positive mismatches; callers opt in when they know substrate echo matches their serialization (typical for the sha256 constant-cost path).

## Defense-in-depth layers (recap)

For full coverage, layer this substrate with the SDK + caller patterns:

1. **Substrate** (this guide) — `X-CueAPI-Verify-Echo` echo-back. Open-core; ships in cueapi-core.
2. **SDK auto-verify** — clients use the header automatically and raise on mismatch.
3. **Force-file mode** — `cueapi-cli messages send --message-file <path>` reads bytes verbatim; rejects inline `--body` with shell metacharacters unless `--allow-inline-metachars` is set.
4. **Documentation** — guides lead with file-payload pattern, not inline strings.

Self-hosters running cueapi-core get layers 1 + 2 automatically by using the published SDKs. Layer 3 ships in cueapi-cli; layer 4 is this guide.

## When to disable

Disable verify-echo only when:

- **Perf-sensitive flows** at the very high QPS limit. The echo doubles response-payload bytes; a sustained outbound stream may want to opt out for the 5-10% saving.
- **Streaming use cases** where the response body shape is locked downstream and adding fields breaks compatibility.

Opt-out is per-request (SDK kwarg or CLI flag); there's no global server-side disable.

## Implementation references

- `app/utils/verify_echo.py` — STRING-shape helper (Layer 1)
- `app/middleware/verify_echo.py` — universal middleware (Layer 1.5)
- `app/routers/messages.py`, `app/routers/cues.py` — Phase 1 endpoint integration

Tests pin the shape:

- `tests/test_verify_echo.py` — Layer 1 endpoint coverage
- `tests/test_verify_echo_middleware.py` — Layer 1.5 method/status/content-type gating + idempotency-when-handler-already-injected
- `tests/test_verify_endpoints.py` — verify-result endpoints

## Background

Added 2026-05-11 (Mike body-verify directive). Substrate primitive Layer 1 from cueapi-core #86; Layer 1.5 universal middleware from #87; STRING-shape spec-lock from #88 (echo-shape hotfix for `body_received` field on messages endpoint). Cross-stack SDK Phase 2 + caller force-file + this docs guide constitute the four-layer defense.
Loading