Skip to content

feat: CLI mints scoped tokens for release; per-project scope on OCI push (ALIEN-139)#48

Closed
ItamarZand88 wants to merge 6 commits into
alon/alien-138-platform-launchfrom
itamar/alien-139-cli-mint-registry-push
Closed

feat: CLI mints scoped tokens for release; per-project scope on OCI push (ALIEN-139)#48
ItamarZand88 wants to merge 6 commits into
alon/alien-138-platform-launchfrom
itamar/alien-139-cli-mint-registry-push

Conversation

@ItamarZand88
Copy link
Copy Markdown
Contributor

@ItamarZand88 ItamarZand88 commented May 23, 2026

Summary

Two OSS-side changes that together let alien release work against a manager when the caller is authenticated with an OAuth bearer, plus follow-up CLI hardening:

  1. CLI exchanges the user OAuth JWT for a short-lived, project-and-manager-scoped token before talking to any manager endpoint, and uses that token for the artifact-registry setup + OCI push.
  2. alien-manager gains a per-project scopes field on Subject and enforces it on the OCI push path — a token whose scopes don't cover the target project's ID is rejected at /v2/.../blobs/uploads/.
  3. CLI follow-ups (a) surface the manager's response body in error messages so the user sees the actual reason (auth failure, scope mismatch, etc.) instead of just an HTTP status, and (b) unblock alien commands invoke in platform mode (previously errored out before reaching the manager).

OSS standalone behaviour is preserved bit-for-bit: validators that don't populate scopes leave it empty, and the new scope check no-ops on empty scopes — so existing single-tenant deployments still work without any token-format changes.

Commits

feat(manager): per-project scope on Subject, enforced on OCI push Subject.scopes: Vec<String> field; require_push_auth suffix-matches /<project_id> against the JWT's scope list when non-empty. Also fixes LocalArtifactRegistry::upstream_repository_prefix() which was returning a hardcoded "artifacts/default" instead of the binding name — that was masked before because no per-project authz extracted the real project_id from the routing table.
feat(cli): exchange OAuth JWT for manager-scoped token on release mint_registry_push_token + get_or_mint_registry_push_token in crates/alien-cli/src/auth.rs. Cached in keyring under MANAGER_TOKEN_USER (separate slot from the OAuth tokens so logout cleans both). execution_context.rs mints once at the top of the platform-mode release flow and reuses the token for /v1/artifact-registry/* repo setup + /v2/... OCI push.
test: cover the per-project scope check on the OCI push path Extracts scopes_cover_project from require_push_auth so the security-critical predicate is unit-testable without standing up a 17-field AppState. 5 tests cover empty/matching/non-matching/multi-scope/leading-slash-substring-guard cases, plus 7 tests for token_matches_target covering match, manager mismatch, project mismatch, missing claim, malformed JWT, etc.
refactor(cli): match codebase conventions on URL encoding and cache writes (1) urlencoding::encode(manager_id) in the mint URL — matches the established pattern in execution_context.rs::resolve_url and the gcp/aws clients. (2) store_manager_token is now best-effort (warn! on keyring write failure instead of aborting the release) — matches the worked example in alien/CLAUDE.md under "When warn! Is Appropriate".
fix(cli): surface manager response body and unblock commands invoke in platform mode ✨ NEW (1) create_or_get_artifact_repo reads each response's body once via .text().unwrap_or_default(); surfaces the manager's structured error in the final Failed to create artifact repository message instead of dropping it. (2) commands_task branches: in platform mode, resolves the project + manager to get an authenticated client + bearer (previously server_sdk_client()'s platform arm errored out, blocking step 5 of onboarding); standalone/dev keep the existing path.

Verification

  • cargo test -p alien-manager108 tests passing across 6 binaries (was 103 before the scope-check predicate tests landed)
  • cargo test -p alien-manager --test registry_proxy_test — 7 tests including push-auth, all green
  • cargo test -p alien-cli --features platform --lib oauth_flow::tests — 7 tests for token_matches_target, all green
  • cargo build -p alien-cli --features platform — clean
  • Live end-to-end: alien release --experimental from a worker-template project succeeds and produces a release ID

Backward compatibility

  • OSS standalone managers run unchanged. Validators that don't set Subject.scopes leave it empty, and the new scope check is gated on !subject.scopes.is_empty().
  • The CLI's token-mint flow only activates in #[cfg(feature = "platform")] ExecutionMode::Platform. Standalone CLI use (ALIEN_MANAGER_URL=...) is unchanged.
  • The platform-mode branch in commands_task is gated under #[cfg(feature = "platform")]; non-platform builds still go through server_sdk_client() directly.

Out of scope (separate tickets)

  • Subject.scopes: Vec<String> is loosely typed — format "<workspace_id>/<project_id>" enforced only by doc comment. A typed permits_project_push(project_id) helper on Subject would kill the open-coded ends_with matching at call sites.
  • The platform-mode alien commands invoke flow currently re-uses the registry-push token. A separate, more-specific audience would match the operation's intent better — track in a follow-up using the documentation skill on the platform side.

Test plan

  • Reviewer runs cargo test -p alien-manager and confirms 108/108.
  • Reviewer runs cargo build -p alien-cli --features platform and confirms clean.
  • Reviewer confirms Subject.scopes default-empty behaviour preserves OSS standalone: grep for Subject { literals in OSS validators shows scopes: Vec::new() everywhere.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 23, 2026

Greptile Summary

This PR wires up scoped-token authentication for alien release in platform mode: the CLI now exchanges its OAuth JWT for a short-lived, manager-and-project-scoped JWT before talking to any manager endpoint, and the manager enforces that JWT's scopes list against the target project on the OCI push path.

  • CLI (auth.rs, execution_context.rs): adds mint_registry_push_token / get_or_mint_registry_push_token with keyring caching and expiry-aware cache reuse; create_or_get_artifact_repo now surfaces manager response bodies in error messages; commands_task gains a platform-mode branch to resolve the manager before constructing the SDK client.
  • Manager (registry_proxy.rs, subject.rs): adds Subject.scopes: Vec<String> and a scopes_cover_project helper; require_push_auth rejects pushes whose token scope doesn't cover the target project; OSS validators set scopes: Vec::new() and the check no-ops on empty scopes, preserving backward compatibility.
  • Tests: 5 unit tests for scopes_cover_project and 7 for token_matches_target cover all boundary cases including substring-suffix collisions and malformed JWTs.

Confidence Score: 4/5

Safe to merge for AWS-based platform deployments; the hardcoded "aws" platform string in the new commands invoke platform-mode path should be verified against non-AWS deployments before considering those use cases fully supported.

The core token-exchange flow, scope enforcement on the OCI push path, and OSS backward compatibility are all correctly implemented and well-tested. The only open question is whether resolve_manager(..., "aws") in commands_task behaves correctly for GCP or other projects — the same pattern exists in deployments.rs so it may be intentional, but it limits the commands invoke path to AWS in the same way deployments ls is already limited.

crates/alien-cli/src/commands/commands.rs — the hardcoded "aws" platform argument on the new resolve_manager call.

Important Files Changed

Filename Overview
crates/alien-cli/src/auth.rs Adds mint_registry_push_token, get_or_mint_registry_push_token, token_matches_target, store_manager_token, load_manager_token, plus a MANAGER_TOKEN_USER keyring slot; token caching logic and expiry handling are correct; comprehensive unit tests for matching/mismatch/malformed cases.
crates/alien-cli/src/commands/commands.rs Unblocks alien commands invoke in platform mode by resolving the manager before the SDK client is created; resolve_manager is called with a hardcoded "aws" platform string which may not hold for non-AWS projects.
crates/alien-cli/src/execution_context.rs Mints the manager-scoped JWT before artifact-repo setup and OCI push; adds manager_id to ResolveResponse; error messages now surface manager response bodies; flow is clean.
crates/alien-manager/src/auth/subject.rs Adds scopes: Vec<String> field with correct Default semantics; all existing OSS validators explicitly set scopes: Vec::new() preserving backward compatibility.
crates/alien-manager/src/routes/registry_proxy.rs Adds scopes_cover_project helper and integrates it into require_push_auth; empty-scopes passthrough preserves OSS behavior; leading-slash guard prevents substring collisions; 5 unit tests cover all boundary cases.
crates/alien-manager/tests/registry_proxy_test.rs Updates test_subject() with the new scopes field; integration tests still use Vec::new() (OSS path); the security-critical scoped-rejection path is covered by scopes_cover_project unit tests in the production module.

Sequence Diagram

sequenceDiagram
    participant CLI
    participant PlatformAPI as Platform API
    participant Keyring
    participant Manager

    CLI->>PlatformAPI: "GET /v1/resolve?platform=X&project=Y&workspace=Z"
    PlatformAPI-->>CLI: manager_url, manager_id, project_id

    CLI->>Keyring: load cached manager token
    Keyring-->>CLI: cached token or none

    alt cached token is valid and matches manager+project
        CLI->>CLI: reuse cached token
    else expired or wrong target
        CLI->>PlatformAPI: POST /v1/managers/ID/token (purpose: registry-push, project)
        PlatformAPI-->>CLI: scoped manager token
        CLI->>Keyring: store token (best-effort)
    end

    CLI->>Manager: GET /v1/artifact-registry [Bearer scoped token]
    Manager-->>CLI: repo info

    CLI->>Manager: POST /v2/.../blobs/uploads [Bearer scoped token]
    Manager->>Manager: role check via can_push_image
    Manager->>Manager: scope check via scopes_cover_project
    Manager-->>CLI: OCI push accepted
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/alien-cli/src/commands/commands.rs:84-85
**Hardcoded `"aws"` platform on manager resolution**

`resolve_manager` is called with a hardcoded `"aws"` value, which is used verbatim in the `/v1/resolve?platform=aws&...` query. For a project hosted on GCP (or any other non-AWS provider), the platform API would receive an incorrect hint, potentially routing to the wrong manager or returning an error. The `release` flow avoids this by reading the platform from the project's `platforms` config (`&platforms[0]`). Note that `deployments.rs:280` has the same hardcode, so this is a pre-existing pattern — but it is worth confirming whether the platform API's `/v1/resolve` endpoint silently ignores the `platform` query param for commands/deployments operations, or whether this would fail for GCP projects.

Reviews (5): Last reviewed commit: "fix(cli): fail loud on success-status re..." | Re-trigger Greptile

Comment thread crates/alien-cli/src/auth.rs
ItamarZand88 added a commit that referenced this pull request May 24, 2026
…rites

Two small fixes addressing Greptile review on PR #48, both validated
against the codebase's established patterns and `alien/CLAUDE.md`:

`mint_registry_push_token`: URL-encode `manager_id` when interpolating
it into the request path. Every other path-interpolation in the crate
already does this — see `execution_context.rs::resolve_url` (platform,
project, workspace), the gcp clients (service-account names, GCS
object names), and the aws clients. Manager IDs are ULIDs today and
contain no URL-special characters, but encoding here keeps the call
site correct-by-construction rather than correct-by-coincidence — a
future change to the ID format won't silently mis-route the request.

`get_or_mint_registry_push_token`: treat the keyring cache write as
best-effort. The mint already succeeded and the caller has a valid
token in hand; a failed `set_password` just means we'll re-mint on
the next invocation. The previous `?` propagation meant a transient
keyring error would abort the whole release even though everything
needed for it to succeed was in memory. This matches the worked
example in `alien/CLAUDE.md` under "When `warn!` Is Appropriate":

> "Use `warn!` only for non-critical issues that don't affect
>  correctness... Core operation continues and will work correctly,
>  just slower"

No behaviour change in the success path. No new failure modes on the
error path beyond what's already there.
Adds `pub scopes: Vec<String>` to `Subject` so per-project authorization
tuples (format `"<workspace_id>/<project_id>"`) can flow from a token
validator that knows about them down to the OCI push handler. Default-
empty so the field is invisible to validators that don't populate it
(`TokenDbValidator`, `PermissiveAuthValidator`) — OSS standalone behaviour
is preserved bit-for-bit.

`require_push_auth` in the registry proxy gains a scope check: after the
existing role-based `can_push_image` gate passes, if `subject.scopes` is
non-empty, at least one entry must end with `/<project_id>` (suffix-match
rather than full tuple — project IDs are globally-unique ULIDs, and the
caller's scope carries their workspace_id while `subject.workspace_id`
can be a different workspace's id when the call is routed through a
shared manager). Empty `subject.scopes` short-circuits the check, again
preserving OSS behaviour.

`upstream_repository_prefix()` on `LocalArtifactRegistry` returned a
hardcoded `"artifacts/default"` — that was always wrong but masked
because the proxy routing table never had to extract a real project_id
when nothing did per-project authz. With the scope check above, a wrong
prefix breaks routing in a way that's now observable. Return
`self.binding_name.clone()` (i.e. just `"artifacts"`) so the prefix
matches the actual on-disk layout and the routing table extracts the
real project_id from `artifacts/<project_id>/...`. Root-cause fix.

Test fixtures: every Subject literal in oss_authz / permissive_auth /
token_db_validator and the three tests/* files gets `scopes: Vec::new()`
— pure field-threading, no behaviour change.
The manager only validates a small set of credential types — long-lived
API keys and platform-issued RS256 tokens. The user OAuth JWT from
`alien login` isn't one of them, so the CLI's release flow was
401-ing at the first manager call. Fix: at the top of the platform-mode
release flow, exchange the user OAuth JWT for a short-lived, project-
and-manager-scoped token via the platform's token endpoint, and use
that for every subsequent manager call.

`mint_registry_push_token` hits `POST {base}/v1/managers/{manager_id}/token`
with body `{ purpose: "registry-push", project: <id> }` and the user
OAuth bearer. The returned token's scopes/managerId are checked locally
on every reuse (`token_matches_target` — globally-unique ULIDs let the
project_id check be suffix-only), and a new keyring entry
(`MANAGER_TOKEN_USER`) holds the cached token separately from
access_token/refresh_token so `logout` cleans both.

`get_or_mint_registry_push_token` is the entry point: serve from cache
if it's a hit for this target and not within 30s of expiry, otherwise
mint fresh and store. Mint failures propagate as `AuthenticationFailed`
with the response status + body so platform-side rejection reasons reach
the user.

In `execution_context.rs`, the platform-mode branch mints once
immediately after `resolve` and uses the resulting token both for the
`/v1/artifact-registry/*` repo setup and for the `/v2/...` OCI push
(both legs of the release flow). `ResolveResponse` now carries
`manager_id` so the mint knows which manager the user is targeting.
Future platform-mode commands (commands invoke, etc.) should mint their
own audience rather than reuse this one — keeps the audience name
accurate to scope and prevents a leak of one credential from acting at
unintended endpoints.
The PR added two security-critical predicates that were exercised only
by code paths the test suite didn't have a way to reach: the manager's
push handler skipped the scope check whenever `subject.scopes` was
empty (which every test subject was), and the CLI's `token_matches_target`
cache-reuse check was never run on a constructed JWT. A regression in
either branch — an off-by-one in the suffix match, the scope list flipped
to a different shape, the project_id lookup returning the wrong value —
would have shipped silently.

Extract the scope-check predicate from `require_push_auth` into a pure
`scopes_cover_project(scopes, project_id)` helper so it's unit-testable
without standing up a 17-field `AppState`. The five new tests cover:
empty-scopes fall-through (OSS standalone preserved), matching scope,
non-matching scope, one-of-many matching, and the suffix-substring
confusion case (`prj_abc` must not accept a scope ending `prj_abcd`).

Add `token_matches_target` tests in the CLI: matching, manager-id
mismatch, project-id mismatch, missing claim, missing manager-id claim,
malformed JWT, and scopes claim of the wrong type. All failure modes
must return false so the caller re-mints rather than presenting an
unusable token to the manager.

No behaviour change in either function.
…rites

Two small fixes addressing Greptile review on PR #48, both validated
against the codebase's established patterns and `alien/CLAUDE.md`:

`mint_registry_push_token`: URL-encode `manager_id` when interpolating
it into the request path. Every other path-interpolation in the crate
already does this — see `execution_context.rs::resolve_url` (platform,
project, workspace), the gcp clients (service-account names, GCS
object names), and the aws clients. Manager IDs are ULIDs today and
contain no URL-special characters, but encoding here keeps the call
site correct-by-construction rather than correct-by-coincidence — a
future change to the ID format won't silently mis-route the request.

`get_or_mint_registry_push_token`: treat the keyring cache write as
best-effort. The mint already succeeded and the caller has a valid
token in hand; a failed `set_password` just means we'll re-mint on
the next invocation. The previous `?` propagation meant a transient
keyring error would abort the whole release even though everything
needed for it to succeed was in memory. This matches the worked
example in `alien/CLAUDE.md` under "When `warn!` Is Appropriate":

> "Use `warn!` only for non-critical issues that don't affect
>  correctness... Core operation continues and will work correctly,
>  just slower"

No behaviour change in the success path. No new failure modes on the
error path beyond what's already there.
…n platform mode

Two CLI fixes that share an execution_context.rs touch surface but are
otherwise independent.

execution_context.rs::create_or_get_artifact_repo
  All three response objects (initial GET, POST, retry GET) were
  consumed by `.json()` in the success branch only; on non-success the
  body was dropped. When the manager responded with an authentication
  failure or another structured error, the user saw "Failed to create
  artifact repository '...' for platform '...' on manager (HTTP 401)"
  with no further detail — the actual reason (e.g. "Platform JWT not
  bound to this manager") lived in the response body that we never
  read.

  Read each response's body once via `.text().await.unwrap_or_default()`
  immediately after capturing the status; parse it as JSON in the
  success branch, include it verbatim in the error message on failure.
  Pattern matches `auth.rs::mint_registry_push_token:547` for the same
  reason. `.unwrap_or_default()` keeps behaviour consistent when the
  body itself errors — we still get the status, just no detail.

commands/commands.rs::commands_task
  `server_sdk_client()`'s Platform-mode arm returns
  `Err("server_sdk_client() is not available in platform mode. Use
  sdk_client() instead.")` because the manager URL + JWT aren't known
  until the project is resolved. The advice in the error message is
  wrong — `sdk_client()` returns the platform-API client, not the
  manager client we actually need to invoke commands.

  Branch in commands_task: for Platform mode, resolve the project then
  call `ctx.resolve_manager(...)` to get a ManagerContext whose
  `client` is authenticated with the minted manager JWT and whose
  `auth_token` carries that same JWT for the separate CommandsClient
  call later in `invoke_command`. Standalone/Dev modes keep using
  `ctx.server_sdk_client()` + `ctx.auth_token()` — those carry the
  manager URL + API key statically on ExecutionMode and don't need
  per-project resolution.

  Unblocks step 5 of the onboarding quickstart
  (`alien commands invoke --deployment X --command Y`) in Platform
  mode without changing any of the protocol shapes.
Greptile flagged my earlier A5 refactor of `create_or_get_artifact_repo`
for swapping the canonical `.json::<T>().await.into_alien_error().context(...)?`
pattern (used by the same file's `resolve_url` block at line 378-381
and by `auth.rs:541`) for a bespoke text-first + best-effort parse
shape. The new shape silently fell through to the next step on
2xx-status with an unparseable body — a reverse-proxy or WAF returning
an HTML 200 would hide the real diagnostic behind a confusing
downstream error. `alien/CLAUDE.md`'s "Fail Fast, Don't Fall Back
Silently" section explicitly calls out this pattern as a bad pattern;
my refactor was the textbook anti-example.

Restore fail-loud semantics on all three response-parse sites
(`get_resp`, `create_resp`, `get_resp2`) using
`serde_json::from_str(&body).into_alien_error().context(...)?` instead
of `if let Ok(body) = ...`. The text-first read pattern stays — it's
load-bearing for the final error message at the bottom of the function
which includes `create_body` for the manager's structured rejection
reason. Valid-JSON-without-`name`-field still falls through (intentional:
that's the manager's "no such repo" shape and the next step is the
correct response).

Pre-existing code (mint helpers, auth flow, scope check) untouched —
only the 3 response-parse blocks I introduced in the A5 commit.
@ItamarZand88 ItamarZand88 force-pushed the itamar/alien-139-cli-mint-registry-push branch from 9ce9fc4 to 1177047 Compare May 24, 2026 15:46
@ItamarZand88
Copy link
Copy Markdown
Contributor Author

Superseded by #49 — same code, clean history (no force-push trail, no orphan commits with OSS-boundary leaks that earlier rounds introduced).

@ItamarZand88 ItamarZand88 deleted the itamar/alien-139-cli-mint-registry-push branch May 24, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant