feat: CLI mints scoped tokens for release; per-project scope on OCI push (ALIEN-139)#48
Conversation
Greptile SummaryThis PR wires up scoped-token authentication for
Confidence Score: 4/5Safe to merge for AWS-based platform deployments; the hardcoded The core token-exchange flow, scope enforcement on the OCI push path, and OSS backward compatibility are all correctly implemented and well-tested. The only open question is whether crates/alien-cli/src/commands/commands.rs — the hardcoded
|
| Filename | Overview |
|---|---|
| crates/alien-cli/src/auth.rs | Adds mint_registry_push_token, get_or_mint_registry_push_token, token_matches_target, store_manager_token, load_manager_token, plus a MANAGER_TOKEN_USER keyring slot; token caching logic and expiry handling are correct; comprehensive unit tests for matching/mismatch/malformed cases. |
| crates/alien-cli/src/commands/commands.rs | Unblocks alien commands invoke in platform mode by resolving the manager before the SDK client is created; resolve_manager is called with a hardcoded "aws" platform string which may not hold for non-AWS projects. |
| crates/alien-cli/src/execution_context.rs | Mints the manager-scoped JWT before artifact-repo setup and OCI push; adds manager_id to ResolveResponse; error messages now surface manager response bodies; flow is clean. |
| crates/alien-manager/src/auth/subject.rs | Adds scopes: Vec<String> field with correct Default semantics; all existing OSS validators explicitly set scopes: Vec::new() preserving backward compatibility. |
| crates/alien-manager/src/routes/registry_proxy.rs | Adds scopes_cover_project helper and integrates it into require_push_auth; empty-scopes passthrough preserves OSS behavior; leading-slash guard prevents substring collisions; 5 unit tests cover all boundary cases. |
| crates/alien-manager/tests/registry_proxy_test.rs | Updates test_subject() with the new scopes field; integration tests still use Vec::new() (OSS path); the security-critical scoped-rejection path is covered by scopes_cover_project unit tests in the production module. |
Sequence Diagram
sequenceDiagram
participant CLI
participant PlatformAPI as Platform API
participant Keyring
participant Manager
CLI->>PlatformAPI: "GET /v1/resolve?platform=X&project=Y&workspace=Z"
PlatformAPI-->>CLI: manager_url, manager_id, project_id
CLI->>Keyring: load cached manager token
Keyring-->>CLI: cached token or none
alt cached token is valid and matches manager+project
CLI->>CLI: reuse cached token
else expired or wrong target
CLI->>PlatformAPI: POST /v1/managers/ID/token (purpose: registry-push, project)
PlatformAPI-->>CLI: scoped manager token
CLI->>Keyring: store token (best-effort)
end
CLI->>Manager: GET /v1/artifact-registry [Bearer scoped token]
Manager-->>CLI: repo info
CLI->>Manager: POST /v2/.../blobs/uploads [Bearer scoped token]
Manager->>Manager: role check via can_push_image
Manager->>Manager: scope check via scopes_cover_project
Manager-->>CLI: OCI push accepted
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
crates/alien-cli/src/commands/commands.rs:84-85
**Hardcoded `"aws"` platform on manager resolution**
`resolve_manager` is called with a hardcoded `"aws"` value, which is used verbatim in the `/v1/resolve?platform=aws&...` query. For a project hosted on GCP (or any other non-AWS provider), the platform API would receive an incorrect hint, potentially routing to the wrong manager or returning an error. The `release` flow avoids this by reading the platform from the project's `platforms` config (`&platforms[0]`). Note that `deployments.rs:280` has the same hardcode, so this is a pre-existing pattern — but it is worth confirming whether the platform API's `/v1/resolve` endpoint silently ignores the `platform` query param for commands/deployments operations, or whether this would fail for GCP projects.
Reviews (5): Last reviewed commit: "fix(cli): fail loud on success-status re..." | Re-trigger Greptile
…rites Two small fixes addressing Greptile review on PR #48, both validated against the codebase's established patterns and `alien/CLAUDE.md`: `mint_registry_push_token`: URL-encode `manager_id` when interpolating it into the request path. Every other path-interpolation in the crate already does this — see `execution_context.rs::resolve_url` (platform, project, workspace), the gcp clients (service-account names, GCS object names), and the aws clients. Manager IDs are ULIDs today and contain no URL-special characters, but encoding here keeps the call site correct-by-construction rather than correct-by-coincidence — a future change to the ID format won't silently mis-route the request. `get_or_mint_registry_push_token`: treat the keyring cache write as best-effort. The mint already succeeded and the caller has a valid token in hand; a failed `set_password` just means we'll re-mint on the next invocation. The previous `?` propagation meant a transient keyring error would abort the whole release even though everything needed for it to succeed was in memory. This matches the worked example in `alien/CLAUDE.md` under "When `warn!` Is Appropriate": > "Use `warn!` only for non-critical issues that don't affect > correctness... Core operation continues and will work correctly, > just slower" No behaviour change in the success path. No new failure modes on the error path beyond what's already there.
Adds `pub scopes: Vec<String>` to `Subject` so per-project authorization tuples (format `"<workspace_id>/<project_id>"`) can flow from a token validator that knows about them down to the OCI push handler. Default- empty so the field is invisible to validators that don't populate it (`TokenDbValidator`, `PermissiveAuthValidator`) — OSS standalone behaviour is preserved bit-for-bit. `require_push_auth` in the registry proxy gains a scope check: after the existing role-based `can_push_image` gate passes, if `subject.scopes` is non-empty, at least one entry must end with `/<project_id>` (suffix-match rather than full tuple — project IDs are globally-unique ULIDs, and the caller's scope carries their workspace_id while `subject.workspace_id` can be a different workspace's id when the call is routed through a shared manager). Empty `subject.scopes` short-circuits the check, again preserving OSS behaviour. `upstream_repository_prefix()` on `LocalArtifactRegistry` returned a hardcoded `"artifacts/default"` — that was always wrong but masked because the proxy routing table never had to extract a real project_id when nothing did per-project authz. With the scope check above, a wrong prefix breaks routing in a way that's now observable. Return `self.binding_name.clone()` (i.e. just `"artifacts"`) so the prefix matches the actual on-disk layout and the routing table extracts the real project_id from `artifacts/<project_id>/...`. Root-cause fix. Test fixtures: every Subject literal in oss_authz / permissive_auth / token_db_validator and the three tests/* files gets `scopes: Vec::new()` — pure field-threading, no behaviour change.
The manager only validates a small set of credential types — long-lived
API keys and platform-issued RS256 tokens. The user OAuth JWT from
`alien login` isn't one of them, so the CLI's release flow was
401-ing at the first manager call. Fix: at the top of the platform-mode
release flow, exchange the user OAuth JWT for a short-lived, project-
and-manager-scoped token via the platform's token endpoint, and use
that for every subsequent manager call.
`mint_registry_push_token` hits `POST {base}/v1/managers/{manager_id}/token`
with body `{ purpose: "registry-push", project: <id> }` and the user
OAuth bearer. The returned token's scopes/managerId are checked locally
on every reuse (`token_matches_target` — globally-unique ULIDs let the
project_id check be suffix-only), and a new keyring entry
(`MANAGER_TOKEN_USER`) holds the cached token separately from
access_token/refresh_token so `logout` cleans both.
`get_or_mint_registry_push_token` is the entry point: serve from cache
if it's a hit for this target and not within 30s of expiry, otherwise
mint fresh and store. Mint failures propagate as `AuthenticationFailed`
with the response status + body so platform-side rejection reasons reach
the user.
In `execution_context.rs`, the platform-mode branch mints once
immediately after `resolve` and uses the resulting token both for the
`/v1/artifact-registry/*` repo setup and for the `/v2/...` OCI push
(both legs of the release flow). `ResolveResponse` now carries
`manager_id` so the mint knows which manager the user is targeting.
Future platform-mode commands (commands invoke, etc.) should mint their
own audience rather than reuse this one — keeps the audience name
accurate to scope and prevents a leak of one credential from acting at
unintended endpoints.
The PR added two security-critical predicates that were exercised only by code paths the test suite didn't have a way to reach: the manager's push handler skipped the scope check whenever `subject.scopes` was empty (which every test subject was), and the CLI's `token_matches_target` cache-reuse check was never run on a constructed JWT. A regression in either branch — an off-by-one in the suffix match, the scope list flipped to a different shape, the project_id lookup returning the wrong value — would have shipped silently. Extract the scope-check predicate from `require_push_auth` into a pure `scopes_cover_project(scopes, project_id)` helper so it's unit-testable without standing up a 17-field `AppState`. The five new tests cover: empty-scopes fall-through (OSS standalone preserved), matching scope, non-matching scope, one-of-many matching, and the suffix-substring confusion case (`prj_abc` must not accept a scope ending `prj_abcd`). Add `token_matches_target` tests in the CLI: matching, manager-id mismatch, project-id mismatch, missing claim, missing manager-id claim, malformed JWT, and scopes claim of the wrong type. All failure modes must return false so the caller re-mints rather than presenting an unusable token to the manager. No behaviour change in either function.
…rites Two small fixes addressing Greptile review on PR #48, both validated against the codebase's established patterns and `alien/CLAUDE.md`: `mint_registry_push_token`: URL-encode `manager_id` when interpolating it into the request path. Every other path-interpolation in the crate already does this — see `execution_context.rs::resolve_url` (platform, project, workspace), the gcp clients (service-account names, GCS object names), and the aws clients. Manager IDs are ULIDs today and contain no URL-special characters, but encoding here keeps the call site correct-by-construction rather than correct-by-coincidence — a future change to the ID format won't silently mis-route the request. `get_or_mint_registry_push_token`: treat the keyring cache write as best-effort. The mint already succeeded and the caller has a valid token in hand; a failed `set_password` just means we'll re-mint on the next invocation. The previous `?` propagation meant a transient keyring error would abort the whole release even though everything needed for it to succeed was in memory. This matches the worked example in `alien/CLAUDE.md` under "When `warn!` Is Appropriate": > "Use `warn!` only for non-critical issues that don't affect > correctness... Core operation continues and will work correctly, > just slower" No behaviour change in the success path. No new failure modes on the error path beyond what's already there.
…n platform mode
Two CLI fixes that share an execution_context.rs touch surface but are
otherwise independent.
execution_context.rs::create_or_get_artifact_repo
All three response objects (initial GET, POST, retry GET) were
consumed by `.json()` in the success branch only; on non-success the
body was dropped. When the manager responded with an authentication
failure or another structured error, the user saw "Failed to create
artifact repository '...' for platform '...' on manager (HTTP 401)"
with no further detail — the actual reason (e.g. "Platform JWT not
bound to this manager") lived in the response body that we never
read.
Read each response's body once via `.text().await.unwrap_or_default()`
immediately after capturing the status; parse it as JSON in the
success branch, include it verbatim in the error message on failure.
Pattern matches `auth.rs::mint_registry_push_token:547` for the same
reason. `.unwrap_or_default()` keeps behaviour consistent when the
body itself errors — we still get the status, just no detail.
commands/commands.rs::commands_task
`server_sdk_client()`'s Platform-mode arm returns
`Err("server_sdk_client() is not available in platform mode. Use
sdk_client() instead.")` because the manager URL + JWT aren't known
until the project is resolved. The advice in the error message is
wrong — `sdk_client()` returns the platform-API client, not the
manager client we actually need to invoke commands.
Branch in commands_task: for Platform mode, resolve the project then
call `ctx.resolve_manager(...)` to get a ManagerContext whose
`client` is authenticated with the minted manager JWT and whose
`auth_token` carries that same JWT for the separate CommandsClient
call later in `invoke_command`. Standalone/Dev modes keep using
`ctx.server_sdk_client()` + `ctx.auth_token()` — those carry the
manager URL + API key statically on ExecutionMode and don't need
per-project resolution.
Unblocks step 5 of the onboarding quickstart
(`alien commands invoke --deployment X --command Y`) in Platform
mode without changing any of the protocol shapes.
Greptile flagged my earlier A5 refactor of `create_or_get_artifact_repo` for swapping the canonical `.json::<T>().await.into_alien_error().context(...)?` pattern (used by the same file's `resolve_url` block at line 378-381 and by `auth.rs:541`) for a bespoke text-first + best-effort parse shape. The new shape silently fell through to the next step on 2xx-status with an unparseable body — a reverse-proxy or WAF returning an HTML 200 would hide the real diagnostic behind a confusing downstream error. `alien/CLAUDE.md`'s "Fail Fast, Don't Fall Back Silently" section explicitly calls out this pattern as a bad pattern; my refactor was the textbook anti-example. Restore fail-loud semantics on all three response-parse sites (`get_resp`, `create_resp`, `get_resp2`) using `serde_json::from_str(&body).into_alien_error().context(...)?` instead of `if let Ok(body) = ...`. The text-first read pattern stays — it's load-bearing for the final error message at the bottom of the function which includes `create_body` for the manager's structured rejection reason. Valid-JSON-without-`name`-field still falls through (intentional: that's the manager's "no such repo" shape and the next step is the correct response). Pre-existing code (mint helpers, auth flow, scope check) untouched — only the 3 response-parse blocks I introduced in the A5 commit.
9ce9fc4 to
1177047
Compare
|
Superseded by #49 — same code, clean history (no force-push trail, no orphan commits with OSS-boundary leaks that earlier rounds introduced). |
Summary
Two OSS-side changes that together let
alien releasework against a manager when the caller is authenticated with an OAuth bearer, plus follow-up CLI hardening:scopesfield onSubjectand enforces it on the OCI push path — a token whose scopes don't cover the target project's ID is rejected at/v2/.../blobs/uploads/.alien commands invokein platform mode (previously errored out before reaching the manager).OSS standalone behaviour is preserved bit-for-bit: validators that don't populate
scopesleave it empty, and the new scope check no-ops on empty scopes — so existing single-tenant deployments still work without any token-format changes.Commits
feat(manager): per-project scope on Subject, enforced on OCI pushSubject.scopes: Vec<String>field;require_push_authsuffix-matches/<project_id>against the JWT's scope list when non-empty. Also fixesLocalArtifactRegistry::upstream_repository_prefix()which was returning a hardcoded"artifacts/default"instead of the binding name — that was masked before because no per-project authz extracted the real project_id from the routing table.feat(cli): exchange OAuth JWT for manager-scoped token on releasemint_registry_push_token+get_or_mint_registry_push_tokenincrates/alien-cli/src/auth.rs. Cached in keyring underMANAGER_TOKEN_USER(separate slot from the OAuth tokens sologoutcleans both).execution_context.rsmints once at the top of the platform-mode release flow and reuses the token for/v1/artifact-registry/*repo setup +/v2/...OCI push.test: cover the per-project scope check on the OCI push pathscopes_cover_projectfromrequire_push_authso the security-critical predicate is unit-testable without standing up a 17-fieldAppState. 5 tests cover empty/matching/non-matching/multi-scope/leading-slash-substring-guard cases, plus 7 tests fortoken_matches_targetcovering match, manager mismatch, project mismatch, missing claim, malformed JWT, etc.refactor(cli): match codebase conventions on URL encoding and cache writesurlencoding::encode(manager_id)in the mint URL — matches the established pattern inexecution_context.rs::resolve_urland the gcp/aws clients. (2)store_manager_tokenis now best-effort (warn!on keyring write failure instead of aborting the release) — matches the worked example inalien/CLAUDE.mdunder "Whenwarn!Is Appropriate".fix(cli): surface manager response body and unblock commands invoke in platform mode✨ NEWcreate_or_get_artifact_reporeads each response's body once via.text().unwrap_or_default(); surfaces the manager's structured error in the finalFailed to create artifact repositorymessage instead of dropping it. (2)commands_taskbranches: in platform mode, resolves the project + manager to get an authenticated client + bearer (previouslyserver_sdk_client()'s platform arm errored out, blocking step 5 of onboarding); standalone/dev keep the existing path.Verification
cargo test -p alien-manager— 108 tests passing across 6 binaries (was 103 before the scope-check predicate tests landed)cargo test -p alien-manager --test registry_proxy_test— 7 tests including push-auth, all greencargo test -p alien-cli --features platform --lib oauth_flow::tests— 7 tests fortoken_matches_target, all greencargo build -p alien-cli --features platform— cleanalien release --experimentalfrom a worker-template project succeeds and produces a release IDBackward compatibility
Subject.scopesleave it empty, and the new scope check is gated on!subject.scopes.is_empty().#[cfg(feature = "platform")]ExecutionMode::Platform. Standalone CLI use (ALIEN_MANAGER_URL=...) is unchanged.commands_taskis gated under#[cfg(feature = "platform")]; non-platform builds still go throughserver_sdk_client()directly.Out of scope (separate tickets)
Subject.scopes: Vec<String>is loosely typed — format"<workspace_id>/<project_id>"enforced only by doc comment. A typedpermits_project_push(project_id)helper onSubjectwould kill the open-codedends_withmatching at call sites.alien commands invokeflow currently re-uses the registry-push token. A separate, more-specific audience would match the operation's intent better — track in a follow-up using the documentation skill on the platform side.Test plan
cargo test -p alien-managerand confirms 108/108.cargo build -p alien-cli --features platformand confirms clean.Subject.scopesdefault-empty behaviour preserves OSS standalone: grep forSubject {literals in OSS validators showsscopes: Vec::new()everywhere.