feat(server): pluggable request-auth framework (management + runtime)#204
Merged
abhinav-galileo merged 12 commits intomainfrom Apr 30, 2026
Merged
feat(server): pluggable request-auth framework (management + runtime)#204abhinav-galileo merged 12 commits intomainfrom
abhinav-galileo merged 12 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
b87b27f to
8ecb871
Compare
abhinav-galileo
added a commit
that referenced
this pull request
Apr 29, 2026
…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.
abhinav-galileo
added a commit
that referenced
this pull request
Apr 29, 2026
…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.
70c8229 to
e5f9654
Compare
abhinav-galileo
added a commit
that referenced
this pull request
Apr 29, 2026
…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.
e5f9654 to
84db093
Compare
abhinav-galileo
added a commit
that referenced
this pull request
Apr 29, 2026
…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.
84db093 to
7698c07
Compare
namrataghadi-galileo
approved these changes
Apr 30, 2026
Endpoints declare a generic Operation; an installed RequestAuthorizer decides whether the request is allowed and returns the resolved Principal (namespace + admin flag + caller id). Two providers ship in-tree: - HeaderAuthProvider: OSS / single-namespace default. Maps each Operation to one of three access levels (PUBLIC / AUTHENTICATED / ADMIN) and reuses the legacy local credential check; behavior matches the previous require_admin_key path verbatim. V1 ignores the X-Namespace-Key header and always returns the default namespace because non-binding write endpoints still hardcode it; the branch is preserved for a follow-up that lifts the lock. - HttpUpstreamAuthProvider: forwards caller credentials to a configurable upstream URL. Maps 401/403/404 directly; fail-closed (503) on 5xx and network errors; rejects malformed principals (502). Control-binding endpoints now declare CONTROL_BINDINGS_READ / CONTROL_BINDINGS_WRITE via require_operation(...) and read the resolved namespace from the returned Principal. The router is mounted without the legacy router-level gate so the framework owns authentication and authorization end-to-end. Reserved Operation members for controls.* and runtime.use are defined but not yet wired; their migrations land in follow-up PRs.
Rename so the framework's vocabulary is factual: - OssAccessLevel -> AccessLevel - OSS_OPERATION_ACCESS -> DEFAULT_OPERATION_ACCESS - Comments / docstrings: replace "OSS / single-namespace" framing with factual descriptions of the local-credential path. Drop the unjustified MANAGEMENT_ prefix on environment variables; this PR only configures one auth flow: - AGENT_CONTROL_MANAGEMENT_AUTH_MODE -> AGENT_CONTROL_AUTH_MODE - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_URL -> AGENT_CONTROL_AUTH_UPSTREAM_URL - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_TIMEOUT_SECONDS -> AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_SERVICE_TOKEN -> AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN - AGENT_CONTROL_MANAGEMENT_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER -> AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADER Add a regression test for the no-auth flow: when api_key_enabled is False, even admin operations succeed with a non-admin Principal, matching the pre-framework local-auth behavior.
Completes the framework's auth coverage. Management and runtime are genuinely different protocols, and they now route through different authorizers via the per-operation registry: - Per-operation override on the registry. set_authorizer(authorizer, operation=...) overrides the default for one operation; calls without operation= become the default for everything else. Used to point Operation.RUNTIME_USE at LocalJwtVerifyProvider while leaving the default authorizer (header or http_upstream) for management. - Runtime token mint/verify. HS256 JWT, dedicated secret (AGENT_CONTROL_RUNTIME_TOKEN_SECRET), short TTL capped by the upstream grant's expiry. domain="runtime" claim pins the token to the runtime path. Issuer is agent-control/server. - LocalJwtVerifyProvider verifies the Bearer token, checks the scope covers the requested Operation, and returns a Principal with the bound (target_type, target_id) so endpoints can match the request target. - POST /api/v1/auth/runtime-token-exchange. Authenticates via the default authorizer (typically HttpUpstreamAuthProvider in production, which forwards the credential to the configured upstream) and mints a local runtime token from the resulting Principal. Refuses with 503 when the runtime secret is not configured. - Principal grew target_type, target_id, scopes, grant_expires_at fields so providers can surface the upstream grant's binding and the exchange endpoint can mint a token from it. HttpUpstreamAuthProvider parses the matching optional fields from the upstream JSON response. - Configuration: AGENT_CONTROL_AUTH_* configures the default authorizer; AGENT_CONTROL_RUNTIME_TOKEN_SECRET (+ optional AGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS) enables the runtime override. Without the secret, runtime endpoints fall through to the default authorizer. Tests: 18 new unit + integration tests covering the registry overrides, token round-trip / wrong-secret / expired / wrong-domain rejection, JWT-verify provider behavior (target binding, missing token, wrong scope, non-Bearer header), and the exchange endpoint (503 without secret, mint when configured, target mismatch, missing target, context forwarded to authorizer, full exchange-then-verify round trip). The TypeScript SDK regenerates with the new endpoint surface (runtime-token-exchange) — committed alongside.
…es/grant Five hardening changes prompted by review: - Runtime tokens carry namespace_key. mint_runtime_token now requires it; the JWT payload includes it; verify_runtime_token rejects tokens without it; LocalJwtVerifyProvider returns the token's namespace on the resulting Principal instead of always defaulting. Otherwise a token minted for org A would resolve runtime controls in the default namespace once /evaluation is wired to RUNTIME_USE. - Exchange endpoint refuses to add runtime.use to a grant that omits it. If the upstream returned an explicit scope set without runtime.use, the credential is not authorized for runtime use on this target — minting one anyway would be privilege escalation. Defaulting to runtime.use is preserved only when the provider returned no scoped grant (e.g., local header path). - HttpUpstreamAuthProvider parses the upstream response with a strict Pydantic model (strict=True). Wrong-typed is_admin, malformed scopes, bad expires_at, and non-string target fields fail closed with 502 instead of being silently coerced or dropped. Unknown fields are still tolerated so the upstream can evolve. - LocalJwtVerifyProvider enforces target context match when the dependency surfaces it. Future runtime endpoints can declare a context_builder that extracts target_type/target_id from the request; the provider verifies the token's binding matches and rejects with 403 otherwise. - Auth provider lifecycle. configure_auth_from_env tracks installed providers; teardown_auth (called from FastAPI lifespan shutdown) closes any aclose-able providers — releases the HttpUpstreamAuthProvider's owned httpx.AsyncClient. Tests: nine new cases covering token-namespace round-trip, target context mismatch on type and id, strict grant rejection across each malformed field, the privilege-escalation guard, and a full non-default-namespace round trip through the exchange endpoint.
… on reconfigure Two follow-up fixes from review: - HttpUpstreamAuthProvider validates against the raw response bytes via _UpstreamGrant.model_validate_json instead of round-tripping through response.json() and model_validate. Pydantic's JSON parser accepts ISO datetimes and JSON arrays (the actual wire shapes any HTTP service produces) while strict=True still rejects type-coercion bugs like "false" -> True or non-string entries in scopes. Adds a regression test that pins the JSON wire shape: ISO expires_at + array scopes now round-trip correctly. - configure_auth_from_env clears any prior default and operation overrides before installing fresh ones; teardown_auth clears them too. Without this, removing the runtime token secret between two configure calls left the previous LocalJwtVerifyProvider override installed on Operation.RUNTIME_USE — silent inconsistency where the config path said runtime should fall through but the registry disagreed. Adds a regression test that exercises the full configure-then-reconfigure path.
A target binding is only meaningful as a (target_type, target_id) pair. The previous schema allowed each field independently, so a malformed grant carrying only target_type would pass type validation and the exchange endpoint's per-field equality check would fall through (the upstream's None never trips the != against the request body), letting the endpoint mint a token bound to whatever target_id the request asked for. Add a model validator on _UpstreamGrant that fails closed when exactly one of the two fields is set; both supplied or both omitted is the only acceptable shape. Pydantic's ValidationError surfaces as 502 like every other malformed-grant case. Tests cover both half-supplied shapes (target_type only and target_id only). Also drop two stale comments referring to upstream-specific implementation choices that bled in earlier — the framework is generic.
Two distinct timing-related fail-closed gaps: 1. Pydantic with strict=True still accepts a naive ISO datetime for the upstream's expires_at because strict only enforces types, not tz. Comparing the resulting naive datetime against datetime.now(UTC) at mint time raises TypeError and surfaces as a 500. Add a field validator on _UpstreamGrant.expires_at that rejects naive datetimes, so a malformed grant fails closed with a 502 alongside the rest of the strict-grant rejections. 2. mint_runtime_token would happily mint when upstream_expires_at <= issued_at, returning a 200 with an exp claim already in the past. Introduce UpstreamGrantExpiredError(RuntimeTokenError) and raise it in that case. The exchange endpoint maps this distinct error class to a 502 (upstream returned bad data) rather than the existing 503 (server misconfigured), so the public status reflects which side the operator should investigate. Tests: - _UpstreamGrant rejects naive expires_at -> 502 (parser fail-closed). - mint_runtime_token raises UpstreamGrantExpiredError when the grant is already past or exactly at issued_at. - Exchange endpoint surfaces the expired grant as 502 (vs 503 for the misconfigured-server path).
…g endpoints The seven /control-bindings endpoints were migrated onto require_operation in #204, but none supplied a context_builder. Upstream authorizers that resolve the target's owning project (e.g., Galileo's check_management_access) need (target_type, target_id) to make a project-level decision; without them the upstream returns 400 and the provider fails closed with 503. Two builders, four endpoints wired: - _binding_body_context — reads target_type/target_id from the request body. Wired on PUT "", PUT "/by-key", POST "/by-key:delete". - _binding_list_context — reads target_type/target_id from query params when the GET list endpoint is target-scoped. Wired on GET "". The header provider's behavior is unchanged because it ignores context. Validated end-to-end against the live api PR #6350 + authz PR #145 stack: GET with target filter, PUT with owned target, foreign-target 404, no-auth 401 all behave correctly. Out of scope (separate follow-up): the binding_id-based endpoints (GET/PATCH/DELETE /{binding_id}) need a 2-phase auth — look up the binding by namespace+id to discover its target, then auth-check with target context. That's a deeper change to the require_operation contract and is tracked separately.
… startup, advertise APIKeyHeader
Five review issues against the auth framework:
1. Empty upstream scopes: the exchange endpoint previously fell back to
minting a runtime.use token whenever principal.scopes was falsey,
which is the same shape an upstream produces by returning an explicit
``"scopes": []``. The fallback is removed; the endpoint now requires
runtime.use to be present in principal.scopes for every provider.
HeaderAuthProvider explicitly grants runtime.use only when authorizing
Operation.RUNTIME_TOKEN_EXCHANGE, so the local path keeps its V1
behavior while upstream privilege escalation is closed off.
2. Runtime config consolidation: AGENT_CONTROL_RUNTIME_TOKEN_SECRET and
the TTL are now parsed once at startup into a frozen RuntimeAuthConfig
that the mint side and the LocalJwtVerifyProvider verify side both
read. configure_auth_from_env raises at startup on misconfiguration
instead of producing a runtime 500 from an invalid TTL or a too-short
secret.
3. Runtime token secret strength: HS256 needs >= 32 bytes of secret
material; values shorter than that are rejected at startup.
4. RUNTIME_USE fallback warning: when no runtime secret is configured
the LocalJwtVerifyProvider override is not installed (V1 behavior
unchanged), but the startup log now warns that RUNTIME_USE will fall
through to the default authorizer, giving operators a clear signal
to either configure the secret or accept the long-lived-credential
trust model.
5. OpenAPI security entries: the framework-protected routers
(/control-bindings, /auth) are now mounted with the existing
non-validating get_api_key_from_header Security extractor as a
router-level dependency. require_operation still owns runtime
authentication and authorization; the Security dependency exists
purely so the generated OpenAPI spec advertises X-API-Key on these
routes for downstream SDK generation. Confirmed: server/.generated/
openapi.json now lists ``security: [{APIKeyHeader: []}]`` on every
framework-protected operation.
The TypeScript wrapper AgentControlClient is also extended with an
``auth`` getter so the runtimeTokenExchange method generated under the
Auth group is reachable through the public client.
A new fixture (``runtime_config_enabled``) replaces the previous
os.environ patching in test_runtime_token_exchange_endpoint.py so tests
exercise the same config singleton production uses; one new test pins
the empty-scope rejection.
…ding routes as namespace-wide
Two review issues:
1. ``mint_runtime_token`` now rejects a naive ``upstream_expires_at``
with ``RuntimeTokenError`` instead of letting the comparison against
``datetime.now(UTC)`` raise a raw ``TypeError`` (which surfaces as a
500). The HTTP-upstream parser already rejects timezone-less
``expires_at`` on the wire, but custom authorizers and tests can
still call the helper directly; the lower-level API is now
self-contained.
2. The four binding-id-based routes (GET/PATCH/DELETE
``/control-bindings/{binding_id}``) are documented as namespace-wide
in the OpenAPI summary and docstrings. Per-target authorization is
not possible on these routes today because ``require_operation`` is
single-pass and the target identifiers are only discoverable after
the binding row is loaded. Clients whose authorization model needs
per-target permissions are explicitly steered to the natural-key
endpoints (``PUT /by-key``, ``POST /by-key:delete``) and the
target-filtered list, all of which forward
``(target_type, target_id)`` to the authorizer. Two-phase auth for
the by-id routes is tracked as a separate follow-up.
Also: TypeScript SDK regenerated to pick up the new endpoint summaries.
…ten tzinfo guard
Two review issues:
1. Binding endpoints previously used ``principal.namespace_key`` for
the row's storage namespace. With HeaderAuthProvider this was always
the default namespace, so the V1 contract held; with
HttpUpstreamAuthProvider returning an org-scoped namespace, binding
writes would land in that namespace while initAgent / GET
/agents/{name}/controls / /evaluation still resolved through
``get_namespace_key`` (V1 default), making target-bound controls
invisible to runtime resolution. The seven binding endpoints now
read storage namespace from ``get_namespace_key`` so writes and
reads stay in lockstep until auth-derived namespace resolution
lands across every endpoint. The auth chain still runs via
``require_operation`` for authentication and authorization; the
resolved Principal is no longer used to pick the storage namespace.
2. The ``mint_runtime_token`` tzinfo guard now also checks
``utcoffset() is None`` so a custom ``tzinfo`` subclass that returns
None from ``utcoffset()`` is rejected at the helper boundary
instead of raising a raw ``TypeError`` from the comparison below.
TypeScript SDK regenerated to pick up the binding-endpoint docstring
updates.
…inctly - _load_runtime_ttl_seconds enforces a 1-day maximum on the configured TTL so a misconfigured value cannot mint long-lived tokens. The upstream-grant ceiling in mint_runtime_token only fires when the upstream surfaces an expiry; this cap closes the configuration gap. - HttpUpstreamAuthProvider distinguishes 429 from the catch-all 503 branch with a rate-limit-specific detail and a Retry-After hint, and names the unexpected status in the catch-all detail so operators can tell the two failure modes apart in logs.
dee7742 to
ec77366
Compare
galileo-automation
pushed a commit
that referenced
this pull request
May 2, 2026
## [2.5.0](ts-sdk-v2.4.0...ts-sdk-v2.5.0) (2026-05-02) ### Features * **sdk-ts:** expose debug logger option ([66aba97](66aba97)) * **sdk:** add config driven sink selection ([#176](#176)) ([64c169f](64c169f)) * **server:** namespace scoping and control bindings ([#203](#203)) ([15ed4fd](15ed4fd)) * **server:** pluggable request-auth framework (management + runtime) ([#204](#204)) ([fae0ad3](fae0ad3)), closes [#203](#203) ### Bug Fixes * **server:** add httpx to runtime dependencies ([#205](#205)) ([b4dff6f](b4dff6f))
Collaborator
|
🎉 This PR is included in version 2.5.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pluggable request-auth framework that handles both auth flows the
system needs:
authorizer authenticates the credential and authorizes the
operation; in production this is
HttpUpstreamAuthProviderforwarding to a configurable upstream service.
presents a long-lived credential plus
(target_type, target_id)toa token exchange endpoint; the server mints a short-lived HS256 JWT
bound to that target. Subsequent runtime calls verify the JWT
locally, with no upstream round-trip on the hot path.
Both flows route through the same primitives (
Operationvocabularyon endpoints,
Principalreturned,RequestAuthorizerProtocolinstalled); a per-operation registry lets a deployment point
management ops at one provider and runtime ops at another.
Migrates the
/control-bindingsendpoint family onto the frameworkand ships the runtime token exchange endpoint. The runtime resolution
path itself (
/evaluationetc.) is wired in a follow-up; itsprovider override (
LocalJwtVerifyProvider) is already in place whenthe runtime secret is configured.
Module layout
auth.py(legacy local credential check) is unchanged;HeaderAuthProviderre-uses_validate_api_keyfrom it. Non-bindingroutes still go through the legacy router-level gate; their migration
happens in follow-up PRs.
Operation vocabulary
Per-operation authorizer registry
set_authorizer(authorizer, operation=...)overrides the default forone operation. Without
operation=, it becomes the default for everyoperation that does not have a specific binding. Used to route
management ops through one provider and
Operation.RUNTIME_USEthrough
LocalJwtVerifyProvider:require_operation(op)consults the override first, falls back tothe default. The local-credential path (no override installed) routes
everything to
HeaderAuthProvider; the no-auth flow(
api_key_enabled=False) is preserved end-to-end.require_operationaccepts an optionalcontext_builderso theendpoint can surface request-shaped context (path / query / body
fields) to the authorizer. The body-bearing binding endpoints, the
target-filtered list endpoint, and the runtime token exchange
endpoint all forward
(target_type, target_id)so an upstream thatresolves the target's owning project has the identifiers it needs to
make a project-level decision.
Providers (three ship in-tree)
HeaderAuthProvider: local-credential path, single namespace.Operationto one of three access levels (PUBLIC,AUTHENTICATED,ADMIN); single source of truth inDEFAULT_OPERATION_ACCESS.check from
auth.py, so behavior matches the previousrequire_admin_keypath verbatim.runtime.usescope only forOperation.RUNTIME_TOKEN_EXCHANGE, so the exchange endpoint canuniformly require
runtime.useinprincipal.scopesacross everyprovider; there is no implicit fallback that could escalate an
upstream-supplied empty scope grant.
api_key_enabled=False) is preserved: everyoperation succeeds with a non-admin
Principal. Pinned by aregression test.
DEFAULT_NAMESPACE_KEY. The namespace header lookupbranch is preserved but inert until non-binding write endpoints are
threaded.
HttpUpstreamAuthProvider: generic upstream-delegating provider.X-API-Key,Authorization,Cookie) on a POST to a configurable URL with{operation, context?}.Principal:namespace_key,is_admin,caller_id, plus optional grant fields (target_type,target_id,scopes,expires_at) so the runtime token exchangecan mint from the same response.
200toPrincipal;401/403/404to matching error;5xx, network errors, malformed payloads, naive (tzinfo-less)expires_at, and partial target grants (only one oftarget_type/
target_id) all fail closed (502/503).LocalJwtVerifyProvider: hot-path runtime verifier.Authorization, verifies signatureagainst the runtime secret, checks
domain == "runtime", theissuer, expiry, and that the token's scope covers the requested
Operation.Principalwith the bound(namespace_key, target_type, target_id)so runtime endpoints inherit the namespace and targetbinding without re-deriving them.
target_type/target_idviacontext_builder, the provider also enforces that they match thetoken's binding; runtime endpoints get the request-target check
for free.
Runtime token shape
HS256, dedicated secret (
AGENT_CONTROL_RUNTIME_TOKEN_SECRET),issuer
agent-control/server. Claims:domainruntime; tokens minted here MUST not be accepted on management endpoints.namespace_keyactor_idscopes["runtime.use"]). The exchange endpoint refuses to mint whenprincipal.scopesdoes not containruntime.use, including the case where the upstream's grant explicitly lists an empty scope set.target_type/target_idiat/expexpires_atso the local token can never outlive its grant.jtimint_runtime_tokenrejects anupstream_expires_atwhosetzinfo is Noneor whoseutcoffset()isNonewithRuntimeTokenErrorso a custom authorizer that supplies a naivedatetime surfaces as a typed auth error rather than a raw
TypeErrordeeper in the comparison.
Runtime token exchange endpoint
Operation.RUNTIME_TOKEN_EXCHANGEthrough the default authorizer (typically
HttpUpstreamAuthProviderin production). The authorizer'scontext_builderforwards the requested target to the upstream soit can authorize against the right resource.
AGENT_CONTROL_RUNTIME_TOKEN_SECRETis notconfigured.
Principal.scopes/Principal.grant_expires_at, capped by the configured TTL (default300s).
Principalcarries a target binding, theendpoint verifies it matches the requested target before minting.
expires_atis already in the pastsurfaces as 502 (
UpstreamGrantExpiredError), distinct from the503 misconfigured-server path so the public status reflects which
side the operator should investigate.
Response:
{ token, expires_at, target_type, target_id, scopes }.Storage namespace under the framework
The migrated binding endpoints take the storage
namespace_keyfromget_namespace_key(the same resolver the rest of the server uses),not from
principal.namespace_key. The auth chain still runs throughrequire_operationfor authentication and authorization, but therow's namespace is sourced from the resolver so binding writes and
runtime reads stay in lockstep until auth-derived namespace
resolution lands across
/controls,/policies,/agents, and/evaluationtogether. The principal's namespace is observed (andused by
LocalJwtVerifyProviderfor its own contract) but is notused to pick the row's storage namespace at this stage.
Migrated endpoints
All seven
/api/v1/control-bindings*endpoints now useDepends(require_operation(...)):/control-bindingscontrol_bindings.writetarget_type,target_id/control-bindingscontrol_bindings.readtarget_type,target_id(when present)/control-bindings/{binding_id}control_bindings.read/control-bindings/{binding_id}control_bindings.write/control-bindings/{binding_id}control_bindings.write/control-bindings/by-keycontrol_bindings.writetarget_type,target_id/control-bindings/by-key:deletecontrol_bindings.writetarget_type,target_idThe four binding-id-based routes are documented as namespace-wide:
their target identifiers are not available before the binding row is
loaded, and
require_operationis single-pass. Clients whoseauthorization model requires per-target permissions are steered to
the natural-key endpoints and the target-filtered list, all of which
forward the target to the authorizer. Two-phase auth on the by-id
routes is a follow-up.
New:
POST /api/v1/auth/runtime-token-exchange(operationruntime.token_exchange).The framework-protected routers (
/control-bindings,/auth) aremounted with the existing non-validating
get_api_key_from_headerSecurity extractor as a router-level dependency.
require_operationstill owns runtime authentication and authorization; the Security
dependency exists purely so the generated OpenAPI spec advertises
X-API-Keyon these routes for downstream SDK generation.Generated client
The TypeScript wrapper exposes both
authandcontrolBindingsgetters alongside the existing surface, so consumers using the
public client can call
runtimeTokenExchangeand the binding APIwithout reaching into the generated internals.
Env vars
AGENT_CONTROL_AUTH_MODEheaderheaderorhttp_upstream.AGENT_CONTROL_AUTH_UPSTREAM_URLhttp_upstream.AGENT_CONTROL_AUTH_UPSTREAM_TIMEOUT_SECONDS5.0AGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKENAGENT_CONTROL_AUTH_UPSTREAM_SERVICE_TOKEN_HEADERX-Agent-Control-Service-TokenAGENT_CONTROL_RUNTIME_TOKEN_SECRETAGENT_CONTROL_RUNTIME_TOKEN_TTL_SECONDS300configure_auth_from_envparses both runtime fields once at startupinto a frozen
RuntimeAuthConfig. The exchange endpoint andLocalJwtVerifyProviderread the same object, so the mint and verifysides cannot drift apart on a process. When the runtime secret is
absent,
RUNTIME_USEfalls through to the default authorizer; thisis logged at WARNING so an operator can immediately see what trust
model is in effect.
RUNTIME_USEis reserved and not wired to/evaluationin this PR, so this fallback does not affect theruntime hot path yet. The follow-up that wires runtime endpoints
should explicitly choose legacy fallback or fail-closed JWT-only
behavior.
Out of scope (follow-ups)
/controlsCRUD ontorequire_operationusing thereserved
CONTROLS_*operations.Operation.RUNTIME_USEon the runtime resolution path(
/evaluation, etc.) and the SDK side of the runtime exchange.The provider override is already in place when the runtime secret
is configured.
/agents/initAgentontorequire_operation. TheHttpUpstreamAuthProvider'scontext_buildershould forward therequest's
target_type/target_idto the upstream so theupstream can authorize against the requested resource.
get_namespace_keyso the binding endpoints can usethe principal's namespace for storage along with the rest of the
server.
(GET/PATCH/DELETE
/control-bindings/{binding_id}) so they canforward target context to the upstream.
auth.py'srequire_admin_keyonce every managementendpoint is migrated.
Stacking
Stacked on PR #203 (
abhi/data-model-v1); rebased onto itscurrent head
8adc328so the merged effective-controls contract,namespace-threaded agent endpoints, and savepoint-protected binding
writes are the base this PR builds on. Will rebase onto
mainonce#203 merges.
Test plan
Operationmember has a default accessmapping (regression guard).
HeaderAuthProvider: PUBLIC bypass, AUTHENTICATED + ADMIN pathsroute to the legacy validator with the right
require_adminflag, no-auth mode passes admin operations, namespace-header
lookup currently inert, unknown operation raises, normalized
runtime.usescope returned forRUNTIME_TOKEN_EXCHANGE.HttpUpstreamAuthProvider: 200 happy path with realistic JSONwire shapes (ISO datetime + JSON array scopes round-trip),
service token forwarding, 401/403/404 mapping, 5xx fail-closed,
network-error fail-closed, strict-grant rejection on wrong-typed
is_admin/ malformedscopes/ badexpires_at/ non-stringtarget fields, partial target grant rejected, naive
expires_atrejected.
require_operationfactory: routes through the installedauthorizer, per-operation overrides take precedence, clearing an
override falls back to the default,
get_authorizerraiseswhen nothing is set.
previous
LocalJwtVerifyProvideroverride; teardown clearsevery authorizer; secret shorter than 32 bytes raises at
startup; invalid TTL raises at startup.
expiry rejection, TTL capped by upstream grant, management-domain
token refused on runtime verify, missing-namespace rejection,
already-expired upstream grant raises
UpstreamGrantExpiredError,naive
upstream_expires_atraisesRuntimeTokenError.LocalJwtVerifyProvider: target-boundPrincipal, namespacecarried from token, missing token returns 401, wrong scope
returns 403, non-Bearer header returns 401, target-context match
enforcement (mismatch on type or id returns 403).
target mismatch rejected (400), missing target rejected (422),
grant-without-runtime-use rejected (no privilege escalation),
explicit empty-scope grant rejected (no fallback escalation),
target context forwarded to authorizer, non-default namespace
propagates into the token, full exchange-then-verify round trip,
already-expired upstream grant surfaces as 502 distinct from the
503 misconfigured-server path.
make lintclean.make typecheckclean.make sdk-ts-generate-checkclean.(
auth-runtime-token-exchange, request/response models,Authand
ControlBindingsgroups exposed via the public client).