feat(server): pluggable request-auth framework (management + runtime) by abhinav-galileo · Pull Request #204 · agentcontrol/agent-control

abhinav-galileo · 2026-04-28T20:25:44Z

Summary

Pluggable request-auth framework that handles both auth flows the
system needs:

Management. Online check on every request. The default
authorizer authenticates the credential and authorizes the
operation; in production this is HttpUpstreamAuthProvider
forwarding to a configurable upstream service.
Runtime. Two-phase exchange-then-verify. A target-bearing call
presents a long-lived credential plus (target_type, target_id) to
a token exchange endpoint; the server mints a short-lived HS256 JWT
bound to that target. Subsequent runtime calls verify the JWT
locally, with no upstream round-trip on the hot path.

Both flows route through the same primitives (Operation vocabulary
on endpoints, Principal returned, RequestAuthorizer Protocol
installed); a per-operation registry lets a deployment point
management ops at one provider and runtime ops at another.

Migrates the /control-bindings endpoint family onto the framework
and ships the runtime token exchange endpoint. The runtime resolution
path itself (/evaluation etc.) is wired in a follow-up; its
provider override (LocalJwtVerifyProvider) is already in place when
the runtime secret is configured.

Module layout

server/src/agent_control_server/auth_framework/
  __init__.py                   # public API
  core.py                       # Operation, Principal, RequestAuthorizer, require_operation, registry
  config.py                     # configure_auth_from_env, RuntimeAuthConfig, set_runtime_auth_config
  runtime_token.py              # HS256 mint / verify helpers, UpstreamGrantExpiredError
  providers/
    __init__.py
    header.py                   # HeaderAuthProvider + DEFAULT_OPERATION_ACCESS
    http_upstream.py            # HttpUpstreamAuthProvider (forward + parse grant)
    local_jwt.py                # LocalJwtVerifyProvider (hot-path JWT verify)

server/src/agent_control_server/endpoints/
  auth.py                       # POST /api/v1/auth/runtime-token-exchange

auth.py (legacy local credential check) is unchanged;
HeaderAuthProvider re-uses _validate_api_key from it. Non-binding
routes still go through the legacy router-level gate; their migration
happens in follow-up PRs.

Operation vocabulary

class Operation(StrEnum):
    # Wired on endpoints in this PR.
    CONTROL_BINDINGS_READ = "control_bindings.read"
    CONTROL_BINDINGS_WRITE = "control_bindings.write"
    RUNTIME_TOKEN_EXCHANGE = "runtime.token_exchange"

    # Reserved; not yet wired on endpoints.
    CONTROLS_READ = "controls.read"
    CONTROLS_CREATE = "controls.create"
    CONTROLS_UPDATE = "controls.update"
    CONTROLS_DELETE = "controls.delete"
    RUNTIME_USE = "runtime.use"

Per-operation authorizer registry

set_authorizer(authorizer, operation=...) overrides the default for
one operation. Without operation=, it becomes the default for every
operation that does not have a specific binding. Used to route
management ops through one provider and Operation.RUNTIME_USE
through LocalJwtVerifyProvider:

set_authorizer(HttpUpstreamAuthProvider(...))                 # default
set_authorizer(LocalJwtVerifyProvider(secret=...),             # override
               operation=Operation.RUNTIME_USE)

require_operation(op) consults the override first, falls back to
the default. The local-credential path (no override installed) routes
everything to HeaderAuthProvider; the no-auth flow
(api_key_enabled=False) is preserved end-to-end.

require_operation accepts an optional context_builder so the
endpoint can surface request-shaped context (path / query / body
fields) to the authorizer. The body-bearing binding endpoints, the
target-filtered list endpoint, and the runtime token exchange
endpoint all forward (target_type, target_id) so an upstream that
resolves the target's owning project has the identifiers it needs to
make a project-level decision.

Providers (three ship in-tree)

HeaderAuthProvider: local-credential path, single namespace.

Maps each Operation to one of three access levels (PUBLIC,
AUTHENTICATED, ADMIN); single source of truth in
DEFAULT_OPERATION_ACCESS.
Reuses the existing local API-key + session-cookie credential
check from auth.py, so behavior matches the previous
require_admin_key path verbatim.
Returns a normalized runtime.use scope only for
Operation.RUNTIME_TOKEN_EXCHANGE, so the exchange endpoint can
uniformly require runtime.use in principal.scopes across every
provider; there is no implicit fallback that could escalate an
upstream-supplied empty scope grant.
The no-auth flow (api_key_enabled=False) is preserved: every
operation succeeds with a non-admin Principal. Pinned by a
regression test.
Always returns DEFAULT_NAMESPACE_KEY. The namespace header lookup
branch is preserved but inert until non-binding write endpoints are
threaded.

HttpUpstreamAuthProvider: generic upstream-delegating provider.

Forwards caller credentials (X-API-Key, Authorization,
Cookie) on a POST to a configurable URL with
{operation, context?}.
Optional service-to-service token header for upstream trust.
Parses the upstream response into a Principal: namespace_key,
is_admin, caller_id, plus optional grant fields (target_type,
target_id, scopes, expires_at) so the runtime token exchange
can mint from the same response.
Maps 200 to Principal; 401 / 403 / 404 to matching error;
5xx, network errors, malformed payloads, naive (tzinfo-less)
expires_at, and partial target grants (only one of target_type
/ target_id) all fail closed (502/503).

LocalJwtVerifyProvider: hot-path runtime verifier.

Reads a Bearer token from Authorization, verifies signature
against the runtime secret, checks domain == "runtime", the
issuer, expiry, and that the token's scope covers the requested
Operation.
Returns a Principal with the bound (namespace_key, target_type, target_id) so runtime endpoints inherit the namespace and target
binding without re-deriving them.
When the dependency surfaces target_type / target_id via
context_builder, the provider also enforces that they match the
token's binding; runtime endpoints get the request-target check
for free.

Runtime token shape

HS256, dedicated secret (AGENT_CONTROL_RUNTIME_TOKEN_SECRET),
issuer agent-control/server. Claims:

Claim	Purpose
`domain`	Pinned to `runtime`; tokens minted here MUST not be accepted on management endpoints.
`namespace_key`	The namespace the token authorizes within. Required for mint and verify; preserved end-to-end so a token minted for one namespace cannot be used to resolve controls in another.
`actor_id`	Caller identity surfaced from the upstream grant.
`scopes`	Granted runtime capabilities (e.g., `["runtime.use"]`). The exchange endpoint refuses to mint when `principal.scopes` does not contain `runtime.use`, including the case where the upstream's grant explicitly lists an empty scope set.
`target_type` / `target_id`	Bind the token to one target.
`iat` / `exp`	Bounded lifetime. The local TTL is capped by the upstream grant's `expires_at` so the local token can never outlive its grant.
`jti`	Random identifier; reserved for future revocation.

mint_runtime_token rejects an upstream_expires_at whose
tzinfo is None or whose utcoffset() is None with
RuntimeTokenError so a custom authorizer that supplies a naive
datetime surfaces as a typed auth error rather than a raw TypeError
deeper in the comparison.

Runtime token exchange endpoint

POST /api/v1/auth/runtime-token-exchange
{ "target_type": "...", "target_id": "..." }

Authenticated and authorized via Operation.RUNTIME_TOKEN_EXCHANGE
through the default authorizer (typically
HttpUpstreamAuthProvider in production). The authorizer's
context_builder forwards the requested target to the upstream so
it can authorize against the right resource.
Refuses with 503 when AGENT_CONTROL_RUNTIME_TOKEN_SECRET is not
configured.
Mints a local token from Principal.scopes /
Principal.grant_expires_at, capped by the configured TTL (default
300s).
When the provider's Principal carries a target binding, the
endpoint verifies it matches the requested target before minting.
An upstream grant whose expires_at is already in the past
surfaces as 502 (UpstreamGrantExpiredError), distinct from the
503 misconfigured-server path so the public status reflects which
side the operator should investigate.

Response: { token, expires_at, target_type, target_id, scopes }.

Storage namespace under the framework

The migrated binding endpoints take the storage namespace_key from
get_namespace_key (the same resolver the rest of the server uses),
not from principal.namespace_key. The auth chain still runs through
require_operation for authentication and authorization, but the
row's namespace is sourced from the resolver so binding writes and
runtime reads stay in lockstep until auth-derived namespace
resolution lands across /controls, /policies, /agents, and
/evaluation together. The principal's namespace is observed (and
used by LocalJwtVerifyProvider for its own contract) but is not
used to pick the row's storage namespace at this stage.

Migrated endpoints

All seven /api/v1/control-bindings* endpoints now use
Depends(require_operation(...)):

Method	Path	Operation	Context forwarded
PUT	`/control-bindings`	`control_bindings.write`	body: `target_type`, `target_id`
GET	`/control-bindings`	`control_bindings.read`	query: `target_type`, `target_id` (when present)
GET	`/control-bindings/{binding_id}`	`control_bindings.read`	N/A (namespace-wide)
PATCH	`/control-bindings/{binding_id}`	`control_bindings.write`	N/A (namespace-wide)
DELETE	`/control-bindings/{binding_id}`	`control_bindings.write`	N/A (namespace-wide)
PUT	`/control-bindings/by-key`	`control_bindings.write`	body: `target_type`, `target_id`
POST	`/control-bindings/by-key:delete`	`control_bindings.write`	body: `target_type`, `target_id`

The four binding-id-based routes are documented as namespace-wide:
their target identifiers are not available before the binding row is
loaded, and require_operation is single-pass. Clients whose
authorization model requires per-target permissions are steered to
the natural-key endpoints and the target-filtered list, all of which
forward the target to the authorizer. Two-phase auth on the by-id
routes is a follow-up.

New: POST /api/v1/auth/runtime-token-exchange (operation
runtime.token_exchange).

The framework-protected routers (/control-bindings, /auth) are
mounted with the existing non-validating get_api_key_from_header
Security extractor as a router-level dependency. require_operation
still owns runtime authentication and authorization; the Security
dependency exists purely so the generated OpenAPI spec advertises
X-API-Key on these routes for downstream SDK generation.

Generated client

The TypeScript wrapper exposes both auth and controlBindings
getters alongside the existing surface, so consumers using the
public client can call runtimeTokenExchange and the binding API
without reaching into the generated internals.

Env vars

Var	Default	Purpose
`AGENT_CONTROL_AUTH_MODE`	`header`	Default authorizer: `header` or `http_upstream`.
`AGENT_CONTROL_AUTH_UPSTREAM_URL`	none	Required when mode is `http_upstream`.
`AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS`	`5.0`	Per-request timeout.
`AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN`	none	Optional upstream service token.
`AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER`	`X-Agent-Control-Service-Token`	Header name for the service token.
`AGENT_CONTROL_RUNTIME_TOKEN_SECRET`	none	Required to enable runtime auth + the exchange endpoint. Validated at startup; rejected if shorter than 32 bytes.
`AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS`	`300`	Local token TTL ceiling (capped further by the upstream grant). Validated at startup.

configure_auth_from_env parses both runtime fields once at startup
into a frozen RuntimeAuthConfig. The exchange endpoint and
LocalJwtVerifyProvider read the same object, so the mint and verify
sides cannot drift apart on a process. When the runtime secret is
absent, RUNTIME_USE falls through to the default authorizer; this
is logged at WARNING so an operator can immediately see what trust
model is in effect. RUNTIME_USE is reserved and not wired to
/evaluation in this PR, so this fallback does not affect the
runtime hot path yet. The follow-up that wires runtime endpoints
should explicitly choose legacy fallback or fail-closed JWT-only
behavior.

Out of scope (follow-ups)

Migrate /controls CRUD onto require_operation using the
reserved CONTROLS_* operations.
Wire Operation.RUNTIME_USE on the runtime resolution path
(/evaluation, etc.) and the SDK side of the runtime exchange.
The provider override is already in place when the runtime secret
is configured.
Migrate /agents/initAgent onto require_operation. The
HttpUpstreamAuthProvider's context_builder should forward the
request's target_type / target_id to the upstream so the
upstream can authorize against the requested resource.
Auth-derived get_namespace_key so the binding endpoints can use
the principal's namespace for storage along with the rest of the
server.
Two-phase auth for the four binding-id-based routes
(GET/PATCH/DELETE /control-bindings/{binding_id}) so they can
forward target context to the upstream.
Drop auth.py's require_admin_key once every management
endpoint is migrated.

Stacking

Stacked on PR #203 (abhi/data-model-v1); rebased onto its
current head 8adc328 so the merged effective-controls contract,
namespace-threaded agent endpoints, and savepoint-protected binding
writes are the base this PR builds on. Will rebase onto main once
#203 merges.

Test plan

55 framework + endpoint tests covering:
- Default coverage: every Operation member has a default access
  mapping (regression guard).
- HeaderAuthProvider: PUBLIC bypass, AUTHENTICATED + ADMIN paths
  route to the legacy validator with the right require_admin
  flag, no-auth mode passes admin operations, namespace-header
  lookup currently inert, unknown operation raises, normalized
  runtime.use scope returned for RUNTIME_TOKEN_EXCHANGE.
- HttpUpstreamAuthProvider: 200 happy path with realistic JSON
  wire shapes (ISO datetime + JSON array scopes round-trip),
  service token forwarding, 401/403/404 mapping, 5xx fail-closed,
  network-error fail-closed, strict-grant rejection on wrong-typed
  is_admin / malformed scopes / bad expires_at / non-string
  target fields, partial target grant rejected, naive expires_at
  rejected.
- require_operation factory: routes through the installed
  authorizer, per-operation overrides take precedence, clearing an
  override falls back to the default, get_authorizer raises
  when nothing is set.
- Lifecycle: reconfiguring without the runtime secret drops the
  previous LocalJwtVerifyProvider override; teardown clears
  every authorizer; secret shorter than 32 bytes raises at
  startup; invalid TTL raises at startup.
- Runtime token mint / verify: round-trip, wrong-secret rejection,
  expiry rejection, TTL capped by upstream grant, management-domain
  token refused on runtime verify, missing-namespace rejection,
  already-expired upstream grant raises UpstreamGrantExpiredError,
  naive upstream_expires_at raises RuntimeTokenError.
- LocalJwtVerifyProvider: target-bound Principal, namespace
  carried from token, missing token returns 401, wrong scope
  returns 403, non-Bearer header returns 401, target-context match
  enforcement (mismatch on type or id returns 403).
- Exchange endpoint: 503 without secret, mint when configured,
  target mismatch rejected (400), missing target rejected (422),
  grant-without-runtime-use rejected (no privilege escalation),
  explicit empty-scope grant rejected (no fallback escalation),
  target context forwarded to authorizer, non-default namespace
  propagates into the token, full exchange-then-verify round trip,
  already-expired upstream grant surfaces as 502 distinct from the
  503 misconfigured-server path.
Full server suite: 676 passed.
make lint clean.
make typecheck clean.
make sdk-ts-generate-check clean.
TypeScript SDK regenerated alongside the new endpoint
(auth-runtime-token-exchange, request/response models, Auth
and ControlBindings groups exposed via the public client).

codecov · 2026-04-28T20:29:35Z

Codecov Report

❌ Patch coverage is 91.18280% with 41 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
.../src/agent_control_server/auth_framework/config.py	82.75%	15 Missing ⚠️
...ent_control_server/auth_framework/runtime_token.py	85.33%	11 Missing ⚠️
server/src/agent_control_server/endpoints/auth.py	87.75%	6 Missing ⚠️
...l_server/auth_framework/providers/http_upstream.py	96.87%	3 Missing ⚠️
...ntrol_server/auth_framework/providers/local_jwt.py	91.66%	3 Missing ⚠️
...agent_control_server/endpoints/control_bindings.py	82.35%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.

Endpoints declare a generic Operation; an installed RequestAuthorizer decides whether the request is allowed and returns the resolved Principal (namespace + admin flag + caller id). Two providers ship in-tree: - HeaderAuthProvider: OSS / single-namespace default. Maps each Operation to one of three access levels (PUBLIC / AUTHENTICATED / ADMIN) and reuses the legacy local credential check; behavior matches the previous require_admin_key path verbatim. V1 ignores the X-Namespace-Key header and always returns the default namespace because non-binding write endpoints still hardcode it; the branch is preserved for a follow-up that lifts the lock. - HttpUpstreamAuthProvider: forwards caller credentials to a configurable upstream URL. Maps 401/403/404 directly; fail-closed (503) on 5xx and network errors; rejects malformed principals (502). Control-binding endpoints now declare CONTROL_BINDINGS_READ / CONTROL_BINDINGS_WRITE via require_operation(...) and read the resolved namespace from the returned Principal. The router is mounted without the legacy router-level gate so the framework owns authentication and authorization end-to-end. Reserved Operation members for controls.* and runtime.use are defined but not yet wired; their migrations land in follow-up PRs.

Rename so the framework's vocabulary is factual: - OssAccessLevel -> AccessLevel - OSS_OPERATION_ACCESS -> DEFAULT_OPERATION_ACCESS - Comments / docstrings: replace "OSS / single-namespace" framing with factual descriptions of the local-credential path. Drop the unjustified MANAGEMENT_ prefix on environment variables; this PR only configures one auth flow: - AGENT_CONTROL_MANAGEMENT_AUTH_MODE -> AGENT_CONTROL_AUTH_MODE - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_URL -> AGENT_CONTROL_AUTH_UPSTREAM_URL - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_TIMEOUT_SECONDS -> AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_SERVICE_TOKEN -> AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER -> AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER Add a regression test for the no-auth flow: when api_key_enabled is False, even admin operations succeed with a non-admin Principal, matching the pre-framework local-auth behavior.

Completes the framework's auth coverage. Management and runtime are genuinely different protocols, and they now route through different authorizers via the per-operation registry: - Per-operation override on the registry. set_authorizer(authorizer, operation=...) overrides the default for one operation; calls without operation= become the default for everything else. Used to point Operation.RUNTIME_USE at LocalJwtVerifyProvider while leaving the default authorizer (header or http_upstream) for management. - Runtime token mint/verify. HS256 JWT, dedicated secret (AGENT_CONTROL_RUNTIME_TOKEN_SECRET), short TTL capped by the upstream grant's expiry. domain="runtime" claim pins the token to the runtime path. Issuer is agent-control/server. - LocalJwtVerifyProvider verifies the Bearer token, checks the scope covers the requested Operation, and returns a Principal with the bound (target_type, target_id) so endpoints can match the request target. - POST /api/v1/auth/runtime-token-exchange. Authenticates via the default authorizer (typically HttpUpstreamAuthProvider in production, which forwards the credential to the configured upstream) and mints a local runtime token from the resulting Principal. Refuses with 503 when the runtime secret is not configured. - Principal grew target_type, target_id, scopes, grant_expires_at fields so providers can surface the upstream grant's binding and the exchange endpoint can mint a token from it. HttpUpstreamAuthProvider parses the matching optional fields from the upstream JSON response. - Configuration: AGENT_CONTROL_AUTH_* configures the default authorizer; AGENT_CONTROL_RUNTIME_TOKEN_SECRET (+ optional AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS) enables the runtime override. Without the secret, runtime endpoints fall through to the default authorizer. Tests: 18 new unit + integration tests covering the registry overrides, token round-trip / wrong-secret / expired / wrong-domain rejection, JWT-verify provider behavior (target binding, missing token, wrong scope, non-Bearer header), and the exchange endpoint (503 without secret, mint when configured, target mismatch, missing target, context forwarded to authorizer, full exchange-then-verify round trip). The TypeScript SDK regenerates with the new endpoint surface (runtime-token-exchange) — committed alongside.

…es/grant Five hardening changes prompted by review: - Runtime tokens carry namespace_key. mint_runtime_token now requires it; the JWT payload includes it; verify_runtime_token rejects tokens without it; LocalJwtVerifyProvider returns the token's namespace on the resulting Principal instead of always defaulting. Otherwise a token minted for org A would resolve runtime controls in the default namespace once /evaluation is wired to RUNTIME_USE. - Exchange endpoint refuses to add runtime.use to a grant that omits it. If the upstream returned an explicit scope set without runtime.use, the credential is not authorized for runtime use on this target — minting one anyway would be privilege escalation. Defaulting to runtime.use is preserved only when the provider returned no scoped grant (e.g., local header path). - HttpUpstreamAuthProvider parses the upstream response with a strict Pydantic model (strict=True). Wrong-typed is_admin, malformed scopes, bad expires_at, and non-string target fields fail closed with 502 instead of being silently coerced or dropped. Unknown fields are still tolerated so the upstream can evolve. - LocalJwtVerifyProvider enforces target context match when the dependency surfaces it. Future runtime endpoints can declare a context_builder that extracts target_type/target_id from the request; the provider verifies the token's binding matches and rejects with 403 otherwise. - Auth provider lifecycle. configure_auth_from_env tracks installed providers; teardown_auth (called from FastAPI lifespan shutdown) closes any aclose-able providers — releases the HttpUpstreamAuthProvider's owned httpx.AsyncClient. Tests: nine new cases covering token-namespace round-trip, target context mismatch on type and id, strict grant rejection across each malformed field, the privilege-escalation guard, and a full non-default-namespace round trip through the exchange endpoint.

… on reconfigure Two follow-up fixes from review: - HttpUpstreamAuthProvider validates against the raw response bytes via _UpstreamGrant.model_validate_json instead of round-tripping through response.json() and model_validate. Pydantic's JSON parser accepts ISO datetimes and JSON arrays (the actual wire shapes any HTTP service produces) while strict=True still rejects type-coercion bugs like "false" -> True or non-string entries in scopes. Adds a regression test that pins the JSON wire shape: ISO expires_at + array scopes now round-trip correctly. - configure_auth_from_env clears any prior default and operation overrides before installing fresh ones; teardown_auth clears them too. Without this, removing the runtime token secret between two configure calls left the previous LocalJwtVerifyProvider override installed on Operation.RUNTIME_USE — silent inconsistency where the config path said runtime should fall through but the registry disagreed. Adds a regression test that exercises the full configure-then-reconfigure path.

A target binding is only meaningful as a (target_type, target_id) pair. The previous schema allowed each field independently, so a malformed grant carrying only target_type would pass type validation and the exchange endpoint's per-field equality check would fall through (the upstream's None never trips the != against the request body), letting the endpoint mint a token bound to whatever target_id the request asked for. Add a model validator on _UpstreamGrant that fails closed when exactly one of the two fields is set; both supplied or both omitted is the only acceptable shape. Pydantic's ValidationError surfaces as 502 like every other malformed-grant case. Tests cover both half-supplied shapes (target_type only and target_id only). Also drop two stale comments referring to upstream-specific implementation choices that bled in earlier — the framework is generic.

Two distinct timing-related fail-closed gaps: 1. Pydantic with strict=True still accepts a naive ISO datetime for the upstream's expires_at because strict only enforces types, not tz. Comparing the resulting naive datetime against datetime.now(UTC) at mint time raises TypeError and surfaces as a 500. Add a field validator on _UpstreamGrant.expires_at that rejects naive datetimes, so a malformed grant fails closed with a 502 alongside the rest of the strict-grant rejections. 2. mint_runtime_token would happily mint when upstream_expires_at <= issued_at, returning a 200 with an exp claim already in the past. Introduce UpstreamGrantExpiredError(RuntimeTokenError) and raise it in that case. The exchange endpoint maps this distinct error class to a 502 (upstream returned bad data) rather than the existing 503 (server misconfigured), so the public status reflects which side the operator should investigate. Tests: - _UpstreamGrant rejects naive expires_at -> 502 (parser fail-closed). - mint_runtime_token raises UpstreamGrantExpiredError when the grant is already past or exactly at issued_at. - Exchange endpoint surfaces the expired grant as 502 (vs 503 for the misconfigured-server path).

…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.

… startup, advertise APIKeyHeader Five review issues against the auth framework: 1. Empty upstream scopes: the exchange endpoint previously fell back to minting a runtime.use token whenever principal.scopes was falsey, which is the same shape an upstream produces by returning an explicit ``"scopes": []``. The fallback is removed; the endpoint now requires runtime.use to be present in principal.scopes for every provider. HeaderAuthProvider explicitly grants runtime.use only when authorizing Operation.RUNTIME_TOKEN_EXCHANGE, so the local path keeps its V1 behavior while upstream privilege escalation is closed off. 2. Runtime config consolidation: AGENT_CONTROL_RUNTIME_TOKEN_SECRET and the TTL are now parsed once at startup into a frozen RuntimeAuthConfig that the mint side and the LocalJwtVerifyProvider verify side both read. configure_auth_from_env raises at startup on misconfiguration instead of producing a runtime 500 from an invalid TTL or a too-short secret. 3. Runtime token secret strength: HS256 needs >= 32 bytes of secret material; values shorter than that are rejected at startup. 4. RUNTIME_USE fallback warning: when no runtime secret is configured the LocalJwtVerifyProvider override is not installed (V1 behavior unchanged), but the startup log now warns that RUNTIME_USE will fall through to the default authorizer, giving operators a clear signal to either configure the secret or accept the long-lived-credential trust model. 5. OpenAPI security entries: the framework-protected routers (/control-bindings, /auth) are now mounted with the existing non-validating get_api_key_from_header Security extractor as a router-level dependency. require_operation still owns runtime authentication and authorization; the Security dependency exists purely so the generated OpenAPI spec advertises X-API-Key on these routes for downstream SDK generation. Confirmed: server/.generated/ openapi.json now lists ``security: [{APIKeyHeader: []}]`` on every framework-protected operation. The TypeScript wrapper AgentControlClient is also extended with an ``auth`` getter so the runtimeTokenExchange method generated under the Auth group is reachable through the public client. A new fixture (``runtime_config_enabled``) replaces the previous os.environ patching in test_runtime_token_exchange_endpoint.py so tests exercise the same config singleton production uses; one new test pins the empty-scope rejection.

…ding routes as namespace-wide Two review issues: 1. ``mint_runtime_token`` now rejects a naive ``upstream_expires_at`` with ``RuntimeTokenError`` instead of letting the comparison against ``datetime.now(UTC)`` raise a raw ``TypeError`` (which surfaces as a 500). The HTTP-upstream parser already rejects timezone-less ``expires_at`` on the wire, but custom authorizers and tests can still call the helper directly; the lower-level API is now self-contained. 2. The four binding-id-based routes (GET/PATCH/DELETE ``/control-bindings/{binding_id}``) are documented as namespace-wide in the OpenAPI summary and docstrings. Per-target authorization is not possible on these routes today because ``require_operation`` is single-pass and the target identifiers are only discoverable after the binding row is loaded. Clients whose authorization model needs per-target permissions are explicitly steered to the natural-key endpoints (``PUT /by-key``, ``POST /by-key:delete``) and the target-filtered list, all of which forward ``(target_type, target_id)`` to the authorizer. Two-phase auth for the by-id routes is tracked as a separate follow-up. Also: TypeScript SDK regenerated to pick up the new endpoint summaries.

…ten tzinfo guard Two review issues: 1. Binding endpoints previously used ``principal.namespace_key`` for the row's storage namespace. With HeaderAuthProvider this was always the default namespace, so the V1 contract held; with HttpUpstreamAuthProvider returning an org-scoped namespace, binding writes would land in that namespace while initAgent / GET /agents/{name}/controls / /evaluation still resolved through ``get_namespace_key`` (V1 default), making target-bound controls invisible to runtime resolution. The seven binding endpoints now read storage namespace from ``get_namespace_key`` so writes and reads stay in lockstep until auth-derived namespace resolution lands across every endpoint. The auth chain still runs via ``require_operation`` for authentication and authorization; the resolved Principal is no longer used to pick the storage namespace. 2. The ``mint_runtime_token`` tzinfo guard now also checks ``utcoffset() is None`` so a custom ``tzinfo`` subclass that returns None from ``utcoffset()`` is rejected at the helper boundary instead of raising a raw ``TypeError`` from the comparison below. TypeScript SDK regenerated to pick up the binding-endpoint docstring updates.

…inctly - _load_runtime_ttl_seconds enforces a 1-day maximum on the configured TTL so a misconfigured value cannot mint long-lived tokens. The upstream-grant ceiling in mint_runtime_token only fires when the upstream surfaces an expiry; this cap closes the configuration gap. - HttpUpstreamAuthProvider distinguishes 429 from the catch-all 503 branch with a rate-limit-specific detail and a Retry-After hint, and names the unexpected status in the catch-all detail so operators can tell the two failure modes apart in logs.

## [2.5.0](ts-sdk-v2.4.0...ts-sdk-v2.5.0) (2026-05-02) ### Features * **sdk-ts:** expose debug logger option ([66aba97](66aba97)) * **sdk:** add config driven sink selection ([#176](#176)) ([64c169f](64c169f)) * **server:** namespace scoping and control bindings ([#203](#203)) ([15ed4fd](15ed4fd)) * **server:** pluggable request-auth framework (management + runtime) ([#204](#204)) ([fae0ad3](fae0ad3)), closes [#203](#203) ### Bug Fixes * **server:** add httpx to runtime dependencies ([#205](#205)) ([b4dff6f](b4dff6f))

galileo-automation · 2026-05-02T03:39:20Z

🎉 This PR is included in version 2.5.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

abhinav-galileo changed the title ~~feat(server): pluggable request-auth framework + migrate control bindings~~ feat(server): pluggable request-auth framework (management + runtime) Apr 28, 2026

abhinav-galileo marked this pull request as ready for review April 28, 2026 21:46

abhinav-galileo requested review from lan17 and namrataghadi-galileo April 28, 2026 21:46

abhinav-galileo force-pushed the abhi/management-auth-framework branch from b87b27f to 8ecb871 Compare April 29, 2026 18:56