feat(blobmanager): add managed CAS backend via S3 Access Points#3121
feat(blobmanager): add managed CAS backend via S3 Access Points#3121jiparis wants to merge 5 commits into
Conversation
Introduce a new `AWS-S3-ACCESS-POINT` CAS backend that targets a single shared bucket via per-tenant S3 Access Points. Each upload/download mints scoped temporary credentials via `sts:AssumeRole` with a session policy narrowed to the tenant's AP ARN and key prefix, and a session name derived from the authenticated requesting org carried in `ctx` (`s3accesspoint.WithRequestingOrg`). Both upstream binaries pick up a new optional `blob_backends.s3_access_point` config block (`base_role_arn`, `region`, `session_duration`); when the block is absent the provider stays unregistered and behaviour is identical to before. The pod's ambient AWS identity (IRSA / instance profile / env vars) is used to call STS — no static credentials live in config. Per-tenant data (AP ARN, region override, key prefix) is stored as a JSON blob in the secrets manager and read via `FromCredentials`, so the existing `backend.Provider` interface is unchanged. Add `OrgID` to the CAS robotaccount JWT claims so artifact-cas can enrich its context with the requesting org before invoking the backend; existing providers ignore the key. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev> Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
… dev mode
Two related refinements to the AWS-S3-ACCESS-POINT provider.
1. The per-tenant key prefix is now derived at request time from the
authenticated requesting org carried in ctx via WithRequestingOrg,
rather than read from a `KeyPrefix` field in the secrets-manager
blob. The prefix and the AssumeRole `RoleSessionName` now share
their single source of truth, so a tampered Credentials blob can no
longer reroute a tenant's writes into another tenant's namespace.
The Credentials struct shrinks to {AccessPointARN, Region}. The
session policy and the bucket-level key both use `<orgUUID>` as the
prefix; the AP resource policy's Resource ARN must be
`${apARN}/object/<orgUUID>/*` to match.
2. Add a `dev_mode_use_ambient_credentials` Config flag (proto +
wire-plumbed in both binaries) that bypasses `sts:AssumeRole` and
routes S3 calls through whatever ambient AWS identity the SDK's
default credential chain produced. Local dev no longer requires an
IAM role + trust policy setup. The missing-org fail-closed check
still fires in dev mode so callers that forget WithRequestingOrg
surface the same bug locally that they would in production. A loud
warning is logged at startup. DEV ONLY — never enable in
multi-tenant deployments.
Assisted-by: Claude Code
Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>
Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
…wire output For Managed=true CAS backends, replace Location with "managed by Chainloop" and Provider with "Chainloop" everywhere the controlplane emits a CASBackend outside its trust boundary: * API responses (bizCASBackendToPb), so `chainloop cas-backend ls` no longer prints the AWS account ID, region, or AP name. * Audit-log events on the NATS bus (CASBackendCreated, CASBackendUpdated, CASBackendDeleted, CASBackendPermanentDeleted, CASBackendStatusChanged), so downstream consumers can't surface the same details to tenants either. The DB and biz layer continue to carry the real ARN and provider ID unchanged, so PerformValidation, the platform reconciler, and any forensic join by CASBackendID still work. Two helpers (displayLocation, displayProvider) keep the sanitization rule in one place. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev> Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
AI Session Analysis
|
| Status | Policy | Material | Messages |
|---|---|---|---|
| ✅ Passed | ai-config-ai-agents-allowed |
ai-coding-session-234a03 |
- |
ai-config-no-dangerous-commands |
ai-coding-session-234a03 |
Forbidden bash pattern /git[^|]push[^|]--force/ matched command: git push --force-with-lease origin jiparis/managed-cas-s3-access-points 2>&1 | tail -8 | |
ai-config-no-secrets |
ai-coding-session-234a03 |
Potential secret (Quoted API key/password) found in session content [turn=662, source=tool_result, line=78, value=secret: ...mV0"] | |
| ✅ Passed | ai-config-mcp-servers-allowed |
ai-coding-session-234a03 |
- |
Powered by Chainloop and Chainloop Trace
Kusari Analysis Results:
No pinned version dependency changes, code issues or exposed secrets detected! Note View full detailed analysis result for more information on the output and the checks that were run.
Found this helpful? Give it a 👍 or 👎 reaction! |
There was a problem hiding this comment.
2 issues found across 34 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="app/controlplane/internal/conf/controlplane/config/v1/conf.proto">
<violation number="1" location="app/controlplane/internal/conf/controlplane/config/v1/conf.proto:152">
P2: Enforce `base_role_arn` when dev mode is disabled; the current schema allows invalid production config that will fail only at runtime.</violation>
</file>
<file name="app/controlplane/internal/service/cascredential.go">
<violation number="1" location="app/controlplane/internal/service/cascredential.go:152">
P1: Use the authenticated requesting org when minting CAS credentials; deriving `OrgID` from `backend.OrganizationID` can incorrectly scope managed S3 access-point sessions to backend ownership instead of caller identity.</violation>
</file>
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Fix all with cubic
Re-trigger cubic
Two follow-ups from the PR review on chainloop-dev#3121: * The CAS JWT minted by cascredential.go, attestation.go and casredirect.go now embeds OrgID from the authenticated caller (entities.CurrentOrg / robotAccount.OrgID) instead of backend.OrganizationID. For managed S3 Access Point backends this OrgID drives the AssumeRole session name and the AP-policy aws:userid match; deriving it from the resolved row would weaken the cross-tenant guarantee if a future bug ever let a caller resolve a backend they don't own. * The S3AccessPoint proto message now carries a buf.validate CEL constraint that requires base_role_arn when dev_mode_use_ambient_credentials is false, surfacing the misconfiguration at config-load time rather than at first upload. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev> Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2
|
Kusari PR Analysis rerun based on - cfa4ff4 performed at: 2026-05-15T19:34:39Z - link to updated analysis |
A `go mod tidy` while developing the s3accesspoint provider regressed several deps: * go-git/v6 downgraded alpha.3 -> alpha.2 (CVE-2026-45022, commit signature spoofing) * go-billy/v5 downgraded 5.9.0 -> 5.8.0 (CVE-2026-44973 path traversal, CVE-2026-44740 symlink-loop DoS) * go-billy/v6 swapped to an older snapshot * go-git/v5 downgraded 5.19.0 -> 5.18.0 * unrelated olekukonko/* and golang.org/x/* version churn that broke CI's go-module tidy check Restoring go.mod and go.sum to match origin/main resolves both the Kusari CVE alerts and the CI failures. aws-sdk-go-v2/service/sts (needed by the s3accesspoint provider) is already an indirect at v1.41.9 on main, so no go.mod change is required for the new code to build. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>
cfa4ff4 to
7457ed2
Compare
| // independently here so the artifact-cas binary doesn't depend on the | ||
| // controlplane's protobuf package. Keep field numbering in sync across | ||
| // both definitions. | ||
| message BlobBackends { |
There was a problem hiding this comment.
I'd call it ManagedCASBackends
| // caller can reuse it for the subsequent Upload/Download calls. Callers | ||
| // MUST use the returned context, not the original one. | ||
| func (s *commonService) loadBackendForClaims(ctx context.Context, claims *casJWT.Claims) (context.Context, backend.UploaderDownloader, error) { | ||
| ctx = s3accesspoint.WithRequestingOrg(ctx, claims.OrgID) |
There was a problem hiding this comment.
this should probably be a middleware instead, it doesn't seem explicitly related to loading the backend no?
Summary
AWS-S3-ACCESS-POINTCAS backend that targets a single shared bucket via per-tenant S3 Access Points. Each request mints a scoped session viasts:AssumeRolewith a session policy andRoleSessionNamederived from the authenticated requesting org carried inctx.blob_backends.s3_access_pointconfig block (base_role_arn,region,session_duration,dev_mode_use_ambient_credentials) to bothcontrolplaneandartifact-cas. When absent the provider stays unregistered, so existing on-prem deployments are unaffected.org-idclaim) soartifact-cascan enrich its context vias3accesspoint.WithRequestingOrgbefore resolving the backend. Other providers ignore the key.Managed=truerows, redacts AWS implementation details from any wire output: the AP ARN (Location) becomes"managed by Chainloop"and the provider ID (Provider) becomes"Chainloop"in both API responses and audit-event payloads. The DB and biz layer keep the real values.AI Assistance
This change was developed with Claude Code; per-commit
Assisted-by:trailers record the specific commits.Closes #3114