Skip to content

Admin API inspection: SSE event stream + load stats endpoint#310

Merged
daniel-thom merged 4 commits into
mainfrom
feature/admin-api-event-stream
May 10, 2026
Merged

Admin API inspection: SSE event stream + load stats endpoint#310
daniel-thom merged 4 commits into
mainfrom
feature/admin-api-event-stream

Conversation

@daniel-thom
Copy link
Copy Markdown
Collaborator

Summary

  • Adds torc admin tail-api, an SSE stream that emits a structured event for every inbound HTTP request the server processes (method, path, status, latency, span id, optional bodies). Off-by-default body capture is opt-in via --include-bodies, capped at 8 KiB per direction with a 1 MiB hard buffer ceiling, and never captures Authorization/Cookie headers.
  • Adds torc admin api-stats, a snapshot of how busy the server is over the last hour (request count, req/s, bytes in/out from Content-Length, 2xx/4xx/5xx breakdown), backed by a 1-second-bucket ring buffer in LiveServerState. Configurable --window and --interval.
  • Adds an advisory X-Torc-Client-User header sent by the CLI so the user field in events is meaningful even when the server runs without --auth-file. The header is trivially spoofable and is never used for authorization — only as a fallback label when no real auth was resolved.

Implementation notes

  • New endpoints live next to the existing /admin/reload-auth: GET /admin/api-events/stream and GET /admin/api-stats. They use standard server authentication; no admin-only role.
  • The capture middleware short-circuits when nobody is subscribed to the SSE stream — the only always-on cost is the stats record (~300 ns: two header lookups + clock read + uncontended parking_lot mutex). Instant::now() is elided on the no-subscriber fast path.
  • SSE response bodies (e.g. the new event stream itself, /workflows/{id}/events/stream) are skipped by both body capture and byte counting since they are unbounded.
  • New module src/server/api_event_stream.rs (broadcaster, body cap helpers) and src/server/api_stats.rs (ring buffer + snapshot).
  • Docs: new "Live API Request Inspection" and "Server Load Stats" sections in server-deployment.md, plus the TORC_API_EVENT_BODY_MAX_BYTES env var.

Test plan

  • cargo fmt -- --check
  • cargo clippy --all --all-targets --all-features -- -D warnings
  • dprint check
  • cargo test --all-features --lib api_event_stream (7 tests)
  • cargo test --all-features --lib api_stats (5 tests)
  • cargo test --all-features --lib live_router (12 tests, includes 4 new ones for capture middleware + api-stats endpoint)
  • Manual: run torc-server run, hit a few endpoints, then torc admin tail-api and torc admin api-stats from another shell — confirm events stream and stats report sensible numbers
  • Manual: run with --auth-file set, confirm user field shows the authenticated subject (not the advisory header)

🤖 Generated with Claude Code

daniel-thom and others added 2 commits May 9, 2026 18:13
Adds `torc admin tail-api`, which streams a structured event for every
inbound HTTP request via Server-Sent Events from a new admin endpoint
(`GET /admin/api-events/stream`). Useful for debugging traffic against a
running server without tailing log files.

Each event carries method, path, query, status, latency, span id, and
the authenticated user. With `--include-bodies`, the server also captures
request and response bodies, capped at 8 KiB per direction (override via
`TORC_API_EVENT_BODY_MAX_BYTES`) and skipped entirely above a 1 MiB hard
buffer ceiling. Bodies are off by default so payloads aren't streamed
unless requested; SSE response streams are skipped to avoid unbounded
buffering, and `Authorization`/`Cookie` headers are never captured.

When no admin client is connected the capture middleware short-circuits
so the runtime cost on the request hot path is negligible. To make the
`user` field useful even when the server runs without `--auth-file`, the
torc CLI sends an advisory `X-Torc-Client-User` header sourced from
`TORC_USERNAME`/`USER`/`USERNAME`; the middleware uses it only as a
fallback when no real authentication was resolved, never for
authorization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `torc admin api-stats`, which renders request rate, throughput, and
2xx/4xx/5xx status mix from a 1-hour ring of per-second counters
maintained by the server. The capture middleware records every request
into the ring regardless of whether anyone is connected to `tail-api`,
so the snapshot is always up to date.

Bytes are read from `Content-Length` request and response headers — fast
and zero-overhead on the hot path. Chunked / streaming responses
(notably the SSE event streams themselves) advertise no length and
contribute 0 bytes; the request itself is still counted.

The new endpoint is `GET /admin/api-stats` with optional `window_seconds`
and `interval_seconds` query parameters (defaults: 3600 / 60). The CLI
accepts `--window` and `--interval` to mirror those, and supports
`-f json` for raw output.

Per-request overhead is ~300 ns (two header lookups + one wall-clock
read + an uncontended parking_lot mutex), and `Instant::now()` is
elided on the no-subscriber fast path so we don't pay for it when
nobody's listening.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds live inspection and load observability for the server’s HTTP API by introducing an admin SSE stream of per-request events and an aggregated “busy-ness” stats endpoint, plus CLI support for both (including an advisory client username header to improve labeling when auth is disabled).

Changes:

  • Add GET /admin/api-events/stream SSE endpoint + request-capture middleware + torc admin tail-api.
  • Add GET /admin/api-stats backed by a 1-second ring buffer + torc admin api-stats.
  • Add advisory X-Torc-Client-User header emission from the CLI for better user labeling when no real auth is resolved.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/server/live_state.rs Adds new shared state for API event broadcasting and API stats ring buffer.
src/server/live_router.rs Wires new admin routes and installs capture middleware; adds endpoint handlers + tests.
src/server/api_stats.rs Implements per-second ring buffer and snapshot aggregation for API load stats.
src/server/api_event_stream.rs Implements broadcast channel + event/body types and body-capture limit helpers.
src/server.rs Exposes new server modules.
src/client/sse_client.rs Sends advisory client-user header on SSE connections.
src/client/commands/admin.rs Adds tail-api and api-stats admin commands (SSE parsing + stats rendering).
src/client/apis/configuration.rs Adds advisory client-user header constant and auto-injection in auth application.
docs/src/specialized/admin/server-deployment.md Documents new live request inspection + load stats features and env var.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/server/live_router.rs Outdated
Comment thread src/server/live_router.rs
Comment on lines +580 to +586
let bus = state.server.api_event_broadcaster.clone();
let mut receiver = bus.subscribe();
let body_guard = if params.include_bodies.unwrap_or(false) {
Some(bus.body_subscriber_guard())
} else {
None
};
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed in commit 20ef130 via the new redact_for_subscriber helper. The broadcaster still sends one event with bodies attached when any subscriber wants them, but each SSE handler clears request_body/response_body per-connection before serializing if that connection didn't pass include_bodies=true. The redact_for_subscriber_* unit tests cover both directions.

Comment thread src/server/live_router.rs Outdated
Comment thread src/server/live_router.rs
Comment on lines +4379 to +4383
let new_req = Request::from_parts(parts, Body::from(bytes));
(new_req, captured)
}
Err(_) => (Request::from_parts(parts, Body::empty()), None),
}
Comment thread src/server/live_router.rs Outdated
Comment thread src/server/live_router.rs
Comment on lines +4417 to +4421
let new_resp = Response::from_parts(parts, Body::from(bytes));
(new_resp, captured)
}
Err(_) => (Response::from_parts(parts, Body::empty()), None),
}
Comment thread src/server/live_router.rs
Comment on lines 383 to 395
.fallback(dashboard_fallback)
.layer(middleware::from_fn_with_state(
CaptureState {
bus: state.server.api_event_broadcaster.clone(),
stats: state.server.api_stats.clone(),
},
capture_api_event,
))
.layer(middleware::from_fn_with_state(
state.auth.clone(),
inject_request_context,
))
.with_state(state)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed in commit 20ef130: record_api_stats is now the outermost layer (lines 396-401), sitting outside the auth short-circuit. The api_stats_records_unauthenticated_requests test verifies that a 401 still lands in the ring as a 4xx.

Comment thread docs/src/specialized/admin/server-deployment.md Outdated
- Split capture middleware: load-stats accounting moves to an
  outermost layer that lives outside the auth check, so 401s and other
  unauthenticated traffic now appear in /admin/api-stats. Event
  broadcasting stays inside auth where the request context is available.
- Per-subscriber body redaction in admin_api_events_stream: bodies are
  captured globally when any subscriber wants them, but a metadata-only
  subscriber no longer sees payloads from other subscribers.
- Tighten body capture: only collect bodies whose size is advertised up
  front (Content-Length or body size hint). Chunked uploads with no
  advertised length are passed through untouched, closing an unbounded-
  buffer DoS gap.
- Tighten the include_bodies doc comment and the markdown's body-capture
  description to match the actual implementation.
- New tests cover all three behaviors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Comment thread src/server/live_router.rs
Comment on lines +4373 to +4377
let new_req = Request::from_parts(parts, Body::from(bytes));
(new_req, captured)
}
Err(_) => (Request::from_parts(parts, Body::empty()), None),
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed in commit 5166867: the request Err(_) branch now returns Err(error_response(StatusCode::BAD_REQUEST, ...)) and the middleware short-circuits with a 400 instead of forwarding an empty body to the handler. capture_short_circuits_on_request_body_error exercises this path.

Comment thread src/server/live_router.rs Outdated
let new_resp = Response::from_parts(parts, Body::from(bytes));
(new_resp, captured)
}
Err(_) => (Response::from_parts(parts, Body::empty()), None),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also addressed in commit 5166867: the response Err(_) branch now returns Err(error_response(StatusCode::BAD_GATEWAY, ...)). The middleware substitutes a fresh 502 response (status, headers, body) rather than truncating the original handler's response to Body::empty().

Comment thread src/server/live_router.rs
Comment on lines +4310 to +4318
};

let display_limit = body_capture_limit();

let (request, request_body) = if want_bodies {
capture_request_body(request, display_limit).await
} else {
(request, None)
};
Comment thread src/server/live_router.rs
Comment on lines +598 to +604
Ok(mut event) => {
redact_for_subscriber(&mut event, include_bodies);
let data = serde_json::to_string(&event).unwrap_or_default();
yield Ok::<_, std::convert::Infallible>(format!(
"event: api\ndata: {}\n\n",
data
));
Previously, if `body.collect().await` failed mid-stream (client
disconnect, transport error, etc.), the middleware substituted
`Body::empty()` and let the handler fail with a misleading
deserialization error. Replace both fallbacks with explicit error
responses synthesized in the middleware:

- Request body read error → 400 Bad Request, before the handler runs
- Response body read error → 502 Bad Gateway, replacing the broken
  response (the handler had returned, but no bytes were on the wire yet)

Adds a regression test that drives a streaming request body which
errors during collection and asserts the 400 short-circuit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@daniel-thom daniel-thom merged commit 31986a8 into main May 10, 2026
9 checks passed
@daniel-thom daniel-thom deleted the feature/admin-api-event-stream branch May 10, 2026 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants