feat: CLI mints scoped tokens for release; per-project scope on OCI push (ALIEN-139) by ItamarZand88 · Pull Request #48 · alienplatform/alien

ItamarZand88 · 2026-05-23T22:08:15Z

Summary

Two OSS-side changes that together let alien release work against a manager when the caller is authenticated with an OAuth bearer, plus follow-up CLI hardening:

CLI exchanges the user OAuth JWT for a short-lived, project-and-manager-scoped token before talking to any manager endpoint, and uses that token for the artifact-registry setup + OCI push.
alien-manager gains a per-project scopes field on Subject and enforces it on the OCI push path — a token whose scopes don't cover the target project's ID is rejected at /v2/.../blobs/uploads/.
CLI follow-ups (a) surface the manager's response body in error messages so the user sees the actual reason (auth failure, scope mismatch, etc.) instead of just an HTTP status, and (b) unblock alien commands invoke in platform mode (previously errored out before reaching the manager).

OSS standalone behaviour is preserved bit-for-bit: validators that don't populate scopes leave it empty, and the new scope check no-ops on empty scopes — so existing single-tenant deployments still work without any token-format changes.

Commits


`feat(manager): per-project scope on Subject, enforced on OCI push`	`Subject.scopes: Vec<String>` field; `require_push_auth` suffix-matches `/<project_id>` against the JWT's scope list when non-empty. Also fixes `LocalArtifactRegistry::upstream_repository_prefix()` which was returning a hardcoded `"artifacts/default"` instead of the binding name — that was masked before because no per-project authz extracted the real project_id from the routing table.
`feat(cli): exchange OAuth JWT for manager-scoped token on release`	`mint_registry_push_token` + `get_or_mint_registry_push_token` in `crates/alien-cli/src/auth.rs`. Cached in keyring under `MANAGER_TOKEN_USER` (separate slot from the OAuth tokens so `logout` cleans both). `execution_context.rs` mints once at the top of the platform-mode release flow and reuses the token for `/v1/artifact-registry/*` repo setup + `/v2/...` OCI push.
`test: cover the per-project scope check on the OCI push path`	Extracts `scopes_cover_project` from `require_push_auth` so the security-critical predicate is unit-testable without standing up a 17-field `AppState`. 5 tests cover empty/matching/non-matching/multi-scope/leading-slash-substring-guard cases, plus 7 tests for `token_matches_target` covering match, manager mismatch, project mismatch, missing claim, malformed JWT, etc.
`refactor(cli): match codebase conventions on URL encoding and cache writes`	(1) `urlencoding::encode(manager_id)` in the mint URL — matches the established pattern in `execution_context.rs::resolve_url` and the gcp/aws clients. (2) `store_manager_token` is now best-effort (`warn!` on keyring write failure instead of aborting the release) — matches the worked example in `alien/CLAUDE.md` under "When `warn!` Is Appropriate".
`fix(cli): surface manager response body and unblock commands invoke in platform mode` ✨ NEW	(1) `create_or_get_artifact_repo` reads each response's body once via `.text().unwrap_or_default()`; surfaces the manager's structured error in the final `Failed to create artifact repository` message instead of dropping it. (2) `commands_task` branches: in platform mode, resolves the project + manager to get an authenticated client + bearer (previously `server_sdk_client()`'s platform arm errored out, blocking step 5 of onboarding); standalone/dev keep the existing path.

Verification

cargo test -p alien-manager — 108 tests passing across 6 binaries (was 103 before the scope-check predicate tests landed)
cargo test -p alien-manager --test registry_proxy_test — 7 tests including push-auth, all green
cargo test -p alien-cli --features platform --lib oauth_flow::tests — 7 tests for token_matches_target, all green
cargo build -p alien-cli --features platform — clean
Live end-to-end: alien release --experimental from a worker-template project succeeds and produces a release ID

Backward compatibility

OSS standalone managers run unchanged. Validators that don't set Subject.scopes leave it empty, and the new scope check is gated on !subject.scopes.is_empty().
The CLI's token-mint flow only activates in #[cfg(feature = "platform")] ExecutionMode::Platform. Standalone CLI use (ALIEN_MANAGER_URL=...) is unchanged.
The platform-mode branch in commands_task is gated under #[cfg(feature = "platform")]; non-platform builds still go through server_sdk_client() directly.

Out of scope (separate tickets)

Subject.scopes: Vec<String> is loosely typed — format "<workspace_id>/<project_id>" enforced only by doc comment. A typed permits_project_push(project_id) helper on Subject would kill the open-coded ends_with matching at call sites.
The platform-mode alien commands invoke flow currently re-uses the registry-push token. A separate, more-specific audience would match the operation's intent better — track in a follow-up using the documentation skill on the platform side.

Test plan

Reviewer runs cargo test -p alien-manager and confirms 108/108.
Reviewer runs cargo build -p alien-cli --features platform and confirms clean.
Reviewer confirms Subject.scopes default-empty behaviour preserves OSS standalone: grep for Subject { literals in OSS validators shows scopes: Vec::new() everywhere.

greptile-apps · 2026-05-23T22:13:37Z

Greptile Summary

This PR wires up scoped-token authentication for alien release in platform mode: the CLI now exchanges its OAuth JWT for a short-lived, manager-and-project-scoped JWT before talking to any manager endpoint, and the manager enforces that JWT's scopes list against the target project on the OCI push path.

CLI (auth.rs, execution_context.rs): adds mint_registry_push_token / get_or_mint_registry_push_token with keyring caching and expiry-aware cache reuse; create_or_get_artifact_repo now surfaces manager response bodies in error messages; commands_task gains a platform-mode branch to resolve the manager before constructing the SDK client.
Manager (registry_proxy.rs, subject.rs): adds Subject.scopes: Vec<String> and a scopes_cover_project helper; require_push_auth rejects pushes whose token scope doesn't cover the target project; OSS validators set scopes: Vec::new() and the check no-ops on empty scopes, preserving backward compatibility.
Tests: 5 unit tests for scopes_cover_project and 7 for token_matches_target cover all boundary cases including substring-suffix collisions and malformed JWTs.

Confidence Score: 4/5

Safe to merge for AWS-based platform deployments; the hardcoded "aws" platform string in the new commands invoke platform-mode path should be verified against non-AWS deployments before considering those use cases fully supported.

The core token-exchange flow, scope enforcement on the OCI push path, and OSS backward compatibility are all correctly implemented and well-tested. The only open question is whether resolve_manager(..., "aws") in commands_task behaves correctly for GCP or other projects — the same pattern exists in deployments.rs so it may be intentional, but it limits the commands invoke path to AWS in the same way deployments ls is already limited.

crates/alien-cli/src/commands/commands.rs — the hardcoded "aws" platform argument on the new resolve_manager call.

Important Files Changed

Filename	Overview
crates/alien-cli/src/auth.rs	Adds `mint_registry_push_token`, `get_or_mint_registry_push_token`, `token_matches_target`, `store_manager_token`, `load_manager_token`, plus a `MANAGER_TOKEN_USER` keyring slot; token caching logic and expiry handling are correct; comprehensive unit tests for matching/mismatch/malformed cases.
crates/alien-cli/src/commands/commands.rs	Unblocks `alien commands invoke` in platform mode by resolving the manager before the SDK client is created; `resolve_manager` is called with a hardcoded `"aws"` platform string which may not hold for non-AWS projects.
crates/alien-cli/src/execution_context.rs	Mints the manager-scoped JWT before artifact-repo setup and OCI push; adds `manager_id` to `ResolveResponse`; error messages now surface manager response bodies; flow is clean.
crates/alien-manager/src/auth/subject.rs	Adds `scopes: Vec<String>` field with correct `Default` semantics; all existing OSS validators explicitly set `scopes: Vec::new()` preserving backward compatibility.
crates/alien-manager/src/routes/registry_proxy.rs	Adds `scopes_cover_project` helper and integrates it into `require_push_auth`; empty-scopes passthrough preserves OSS behavior; leading-slash guard prevents substring collisions; 5 unit tests cover all boundary cases.
crates/alien-manager/tests/registry_proxy_test.rs	Updates `test_subject()` with the new `scopes` field; integration tests still use `Vec::new()` (OSS path); the security-critical scoped-rejection path is covered by `scopes_cover_project` unit tests in the production module.

Sequence Diagram

sequenceDiagram
    participant CLI
    participant PlatformAPI as Platform API
    participant Keyring
    participant Manager

    CLI->>PlatformAPI: "GET /v1/resolve?platform=X&project=Y&workspace=Z"
    PlatformAPI-->>CLI: manager_url, manager_id, project_id

    CLI->>Keyring: load cached manager token
    Keyring-->>CLI: cached token or none

    alt cached token is valid and matches manager+project
        CLI->>CLI: reuse cached token
    else expired or wrong target
        CLI->>PlatformAPI: POST /v1/managers/ID/token (purpose: registry-push, project)
        PlatformAPI-->>CLI: scoped manager token
        CLI->>Keyring: store token (best-effort)
    end

    CLI->>Manager: GET /v1/artifact-registry [Bearer scoped token]
    Manager-->>CLI: repo info

    CLI->>Manager: POST /v2/.../blobs/uploads [Bearer scoped token]
    Manager->>Manager: role check via can_push_image
    Manager->>Manager: scope check via scopes_cover_project
    Manager-->>CLI: OCI push accepted

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/alien-cli/src/commands/commands.rs:84-85
**Hardcoded `"aws"` platform on manager resolution**

`resolve_manager` is called with a hardcoded `"aws"` value, which is used verbatim in the `/v1/resolve?platform=aws&...` query. For a project hosted on GCP (or any other non-AWS provider), the platform API would receive an incorrect hint, potentially routing to the wrong manager or returning an error. The `release` flow avoids this by reading the platform from the project's `platforms` config (`&platforms[0]`). Note that `deployments.rs:280` has the same hardcode, so this is a pre-existing pattern — but it is worth confirming whether the platform API's `/v1/resolve` endpoint silently ignores the `platform` query param for commands/deployments operations, or whether this would fail for GCP projects.

_{Reviews (5): Last reviewed commit: "fix(cli): fail loud on success-status re..." | Re-trigger Greptile}

…rites Two small fixes addressing Greptile review on PR #48, both validated against the codebase's established patterns and `alien/CLAUDE.md`: `mint_registry_push_token`: URL-encode `manager_id` when interpolating it into the request path. Every other path-interpolation in the crate already does this — see `execution_context.rs::resolve_url` (platform, project, workspace), the gcp clients (service-account names, GCS object names), and the aws clients. Manager IDs are ULIDs today and contain no URL-special characters, but encoding here keeps the call site correct-by-construction rather than correct-by-coincidence — a future change to the ID format won't silently mis-route the request. `get_or_mint_registry_push_token`: treat the keyring cache write as best-effort. The mint already succeeded and the caller has a valid token in hand; a failed `set_password` just means we'll re-mint on the next invocation. The previous `?` propagation meant a transient keyring error would abort the whole release even though everything needed for it to succeed was in memory. This matches the worked example in `alien/CLAUDE.md` under "When `warn!` Is Appropriate": > "Use `warn!` only for non-critical issues that don't affect > correctness... Core operation continues and will work correctly, > just slower" No behaviour change in the success path. No new failure modes on the error path beyond what's already there.

Adds `pub scopes: Vec<String>` to `Subject` so per-project authorization tuples (format `"<workspace_id>/<project_id>"`) can flow from a token validator that knows about them down to the OCI push handler. Default- empty so the field is invisible to validators that don't populate it (`TokenDbValidator`, `PermissiveAuthValidator`) — OSS standalone behaviour is preserved bit-for-bit. `require_push_auth` in the registry proxy gains a scope check: after the existing role-based `can_push_image` gate passes, if `subject.scopes` is non-empty, at least one entry must end with `/<project_id>` (suffix-match rather than full tuple — project IDs are globally-unique ULIDs, and the caller's scope carries their workspace_id while `subject.workspace_id` can be a different workspace's id when the call is routed through a shared manager). Empty `subject.scopes` short-circuits the check, again preserving OSS behaviour. `upstream_repository_prefix()` on `LocalArtifactRegistry` returned a hardcoded `"artifacts/default"` — that was always wrong but masked because the proxy routing table never had to extract a real project_id when nothing did per-project authz. With the scope check above, a wrong prefix breaks routing in a way that's now observable. Return `self.binding_name.clone()` (i.e. just `"artifacts"`) so the prefix matches the actual on-disk layout and the routing table extracts the real project_id from `artifacts/<project_id>/...`. Root-cause fix. Test fixtures: every Subject literal in oss_authz / permissive_auth / token_db_validator and the three tests/* files gets `scopes: Vec::new()` — pure field-threading, no behaviour change.

The manager only validates a small set of credential types — long-lived API keys and platform-issued RS256 tokens. The user OAuth JWT from `alien login` isn't one of them, so the CLI's release flow was 401-ing at the first manager call. Fix: at the top of the platform-mode release flow, exchange the user OAuth JWT for a short-lived, project- and-manager-scoped token via the platform's token endpoint, and use that for every subsequent manager call. `mint_registry_push_token` hits `POST {base}/v1/managers/{manager_id}/token` with body `{ purpose: "registry-push", project: <id> }` and the user OAuth bearer. The returned token's scopes/managerId are checked locally on every reuse (`token_matches_target` — globally-unique ULIDs let the project_id check be suffix-only), and a new keyring entry (`MANAGER_TOKEN_USER`) holds the cached token separately from access_token/refresh_token so `logout` cleans both. `get_or_mint_registry_push_token` is the entry point: serve from cache if it's a hit for this target and not within 30s of expiry, otherwise mint fresh and store. Mint failures propagate as `AuthenticationFailed` with the response status + body so platform-side rejection reasons reach the user. In `execution_context.rs`, the platform-mode branch mints once immediately after `resolve` and uses the resulting token both for the `/v1/artifact-registry/*` repo setup and for the `/v2/...` OCI push (both legs of the release flow). `ResolveResponse` now carries `manager_id` so the mint knows which manager the user is targeting. Future platform-mode commands (commands invoke, etc.) should mint their own audience rather than reuse this one — keeps the audience name accurate to scope and prevents a leak of one credential from acting at unintended endpoints.

The PR added two security-critical predicates that were exercised only by code paths the test suite didn't have a way to reach: the manager's push handler skipped the scope check whenever `subject.scopes` was empty (which every test subject was), and the CLI's `token_matches_target` cache-reuse check was never run on a constructed JWT. A regression in either branch — an off-by-one in the suffix match, the scope list flipped to a different shape, the project_id lookup returning the wrong value — would have shipped silently. Extract the scope-check predicate from `require_push_auth` into a pure `scopes_cover_project(scopes, project_id)` helper so it's unit-testable without standing up a 17-field `AppState`. The five new tests cover: empty-scopes fall-through (OSS standalone preserved), matching scope, non-matching scope, one-of-many matching, and the suffix-substring confusion case (`prj_abc` must not accept a scope ending `prj_abcd`). Add `token_matches_target` tests in the CLI: matching, manager-id mismatch, project-id mismatch, missing claim, missing manager-id claim, malformed JWT, and scopes claim of the wrong type. All failure modes must return false so the caller re-mints rather than presenting an unusable token to the manager. No behaviour change in either function.

…rites Two small fixes addressing Greptile review on PR #48, both validated against the codebase's established patterns and `alien/CLAUDE.md`: `mint_registry_push_token`: URL-encode `manager_id` when interpolating it into the request path. Every other path-interpolation in the crate already does this — see `execution_context.rs::resolve_url` (platform, project, workspace), the gcp clients (service-account names, GCS object names), and the aws clients. Manager IDs are ULIDs today and contain no URL-special characters, but encoding here keeps the call site correct-by-construction rather than correct-by-coincidence — a future change to the ID format won't silently mis-route the request. `get_or_mint_registry_push_token`: treat the keyring cache write as best-effort. The mint already succeeded and the caller has a valid token in hand; a failed `set_password` just means we'll re-mint on the next invocation. The previous `?` propagation meant a transient keyring error would abort the whole release even though everything needed for it to succeed was in memory. This matches the worked example in `alien/CLAUDE.md` under "When `warn!` Is Appropriate": > "Use `warn!` only for non-critical issues that don't affect > correctness... Core operation continues and will work correctly, > just slower" No behaviour change in the success path. No new failure modes on the error path beyond what's already there.

…n platform mode Two CLI fixes that share an execution_context.rs touch surface but are otherwise independent. execution_context.rs::create_or_get_artifact_repo All three response objects (initial GET, POST, retry GET) were consumed by `.json()` in the success branch only; on non-success the body was dropped. When the manager responded with an authentication failure or another structured error, the user saw "Failed to create artifact repository '...' for platform '...' on manager (HTTP 401)" with no further detail — the actual reason (e.g. "Platform JWT not bound to this manager") lived in the response body that we never read. Read each response's body once via `.text().await.unwrap_or_default()` immediately after capturing the status; parse it as JSON in the success branch, include it verbatim in the error message on failure. Pattern matches `auth.rs::mint_registry_push_token:547` for the same reason. `.unwrap_or_default()` keeps behaviour consistent when the body itself errors — we still get the status, just no detail. commands/commands.rs::commands_task `server_sdk_client()`'s Platform-mode arm returns `Err("server_sdk_client() is not available in platform mode. Use sdk_client() instead.")` because the manager URL + JWT aren't known until the project is resolved. The advice in the error message is wrong — `sdk_client()` returns the platform-API client, not the manager client we actually need to invoke commands. Branch in commands_task: for Platform mode, resolve the project then call `ctx.resolve_manager(...)` to get a ManagerContext whose `client` is authenticated with the minted manager JWT and whose `auth_token` carries that same JWT for the separate CommandsClient call later in `invoke_command`. Standalone/Dev modes keep using `ctx.server_sdk_client()` + `ctx.auth_token()` — those carry the manager URL + API key statically on ExecutionMode and don't need per-project resolution. Unblocks step 5 of the onboarding quickstart (`alien commands invoke --deployment X --command Y`) in Platform mode without changing any of the protocol shapes.

Greptile flagged my earlier A5 refactor of `create_or_get_artifact_repo` for swapping the canonical `.json::<T>().await.into_alien_error().context(...)?` pattern (used by the same file's `resolve_url` block at line 378-381 and by `auth.rs:541`) for a bespoke text-first + best-effort parse shape. The new shape silently fell through to the next step on 2xx-status with an unparseable body — a reverse-proxy or WAF returning an HTML 200 would hide the real diagnostic behind a confusing downstream error. `alien/CLAUDE.md`'s "Fail Fast, Don't Fall Back Silently" section explicitly calls out this pattern as a bad pattern; my refactor was the textbook anti-example. Restore fail-loud semantics on all three response-parse sites (`get_resp`, `create_resp`, `get_resp2`) using `serde_json::from_str(&body).into_alien_error().context(...)?` instead of `if let Ok(body) = ...`. The text-first read pattern stays — it's load-bearing for the final error message at the bottom of the function which includes `create_body` for the manager's structured rejection reason. Valid-JSON-without-`name`-field still falls through (intentional: that's the manager's "no such repo" shape and the next step is the correct response). Pre-existing code (mint helpers, auth flow, scope check) untouched — only the 3 response-parse blocks I introduced in the A5 commit.

ItamarZand88 · 2026-05-24T16:25:43Z

Superseded by #49 — same code, clean history (no force-push trail, no orphan commits with OSS-boundary leaks that earlier rounds introduced).

greptile-apps Bot reviewed May 23, 2026

View reviewed changes

Comment thread crates/alien-cli/src/auth.rs

ItamarZand88 mentioned this pull request May 24, 2026

chore(rust): release crates #3

Closed

ItamarZand88 added 6 commits May 24, 2026 18:44

ItamarZand88 force-pushed the itamar/alien-139-cli-mint-registry-push branch from 9ce9fc4 to 1177047 Compare May 24, 2026 15:46

ItamarZand88 closed this May 24, 2026

ItamarZand88 deleted the itamar/alien-139-cli-mint-registry-push branch May 24, 2026 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CLI mints scoped tokens for release; per-project scope on OCI push (ALIEN-139)#48

feat: CLI mints scoped tokens for release; per-project scope on OCI push (ALIEN-139)#48
ItamarZand88 wants to merge 6 commits into
alon/alien-138-platform-launchfrom
itamar/alien-139-cli-mint-registry-push

ItamarZand88 commented May 23, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 23, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

ItamarZand88 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ItamarZand88 commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits

Verification

Backward compatibility

Out of scope (separate tickets)

Test plan

Uh oh!

greptile-apps Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

ItamarZand88 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ItamarZand88 commented May 23, 2026 •

edited

Loading

greptile-apps Bot commented May 23, 2026 •

edited

Loading