Skip to content

feat(api-server): RBAC enforcement with scope-aware authorization#1660

Open
jsell-rh wants to merge 58 commits into
mainfrom
jsell/feat/rbac-enforcement
Open

feat(api-server): RBAC enforcement with scope-aware authorization#1660
jsell-rh wants to merge 58 commits into
mainfrom
jsell/feat/rbac-enforcement

Conversation

@jsell-rh

@jsell-rh jsell-rh commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements specs/security/rbac-enforcement.spec.md — scope-aware RBAC authorization for the ambient-api-server.

  • Scope-aware middleware: Evaluates bindings against the specific project/agent/session/credential being accessed
  • User auto-provisioning: Creates User record from JWT claims on first authenticated request
  • Bootstrap bindings: project:owner on POST /projects, credential:owner on POST /credentials
  • List filtering: Projects, sessions, credentials filtered to caller's authorized scope
  • Error opacity: Singleton GET returns 404 (not 403), mutations return opaque 403
  • Escalation prevention: Role hierarchy enforced, internal roles blocked, scoped to target project
  • Last-owner protection: 409 on deleting sole owner binding
  • Credential binding: Requires both credential:owner AND project:owner
  • gRPC authorization: Watch streams filtered by authorized project IDs
  • seed-admin CLI: Initial admin bootstrap command
  • New Project button: Dashboard dialog with DNS-1123 validation

Test plan

  • 124 e2e tests passing (idempotent, self-cleaning)
  • Generative escalation matrix: 43 combinations derived from hierarchy rule
  • Integration tests with real RBAC middleware
  • Unit tests for scope extraction, permission matching, hierarchy checks
  • E2e wired into CI pipeline

Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • RBAC enforcement with role hierarchies and scope-aware permissions; list endpoints now hide/disallow items when unauthorized
    • Automatic owner bindings for newly created projects and credentials
    • Admin-seed CLI for bootstrapping an admin account
    • UI: "New Project" dialog with DNS-1123 name validation and create flow
    • Credential token access now subject to RBAC controls
  • Bug Fixes

    • Last-owner protection for role bindings
    • Partial unique username index to prevent duplicate users
  • Tests

    • Comprehensive RBAC integration tests and a new end-to-end RBAC test suite

jsell-rh and others added 15 commits June 5, 2026 13:36
Rewrite the authorization middleware for scope-aware permission
evaluation. The previous flat check only matched resource:action
permission strings; a binding for project A would grant access to
project B. The new middleware evaluates bindings against the specific
scope context of each request (project_id, agent_id, session_id,
credential_id extracted from URL path).

Key changes:
- Scope-aware evaluator with lazy session→project resolution
- Auth-exempt endpoints (POST /projects, POST /credentials, GET /roles)
- Error opacity: 404 on unauthorized singleton GET, empty list on list
- Service caller bypass via IsServiceCaller(ctx)
- User auto-provisioning from JWT claims (upsert on first request)
- Auto-create project:owner binding on POST /projects
- Auto-create credential:owner binding on POST /credentials
- Seed missing built-in roles (credential:owner, agent:editor)
- Role hierarchy and escalation check utilities
- List filter helper (ApplyListFilter) for TSL search injection
- seed-admin CLI command for initial platform:admin bootstrap
- Unique index migration on users.username for upsert

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WatchSessions now filters events by the caller's authorized project IDs.
WatchSessionMessages uses RBAC scope checks instead of CreatedByUserId
comparison. WatchInboxMessages already restricts to service callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the RBACMiddleware nil override from integration testing env.
The middleware is now active with enforcement disabled (enable-authz=false),
matching the production rollout strategy: bindings are always created,
enforcement is gated by config.

Add dedicated RBAC integration tests verifying:
- project:owner binding created on POST /projects
- credential:owner binding created on POST /credentials
- User auto-provisioned from JWT claims on first request
- All built-in roles seeded (including new credential:owner, agent:editor)

Import all RBAC-related plugins in the integration test package so
users, roles, role_bindings, and credentials tables are migrated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bash-based e2e test exercising RBAC enforcement against a live Kind
cluster with Keycloak authentication. Creates two users, verifies
project/credential/agent isolation, sharing via RoleBindings, and
escalation prevention.

Currently 21 pass, 11 fail, 2 skip — the failures are the TDD red
baseline for wiring list filtering, fixing agent creation payload,
and allowing role_binding mutations by project owners.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire ApplyListFilter into project, credential, and session List
handlers to restrict results to the caller's authorized scope.

Fix cross-project nested route access: list endpoints under a project
scope (e.g. /projects/{id}/agents) now return 404 if the caller has
no binding for the parent project.

Add POST /role_bindings to auth-exempt endpoints so project owners
can grant access. Handler-level escalation checks will enforce the
real authorization in a follow-up.

Fix agent create test payloads to include required project_id field.

E2E results: 32 passed, 0 failed, 2 skipped (escalation not yet wired).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run the RBAC e2e bash test after the SSO Cypress tests. The step
enables JWT + RBAC enforcement on the api-server, port-forwards to
it, and runs the test script against the Kind cluster's Keycloak.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three parallel changes:

1. ambient-ui: Add "New Project" button with dialog on the dashboard.
   DNS-1123 name validation, optional description, inline errors.
   Wired through port → adapter → mutation hook → dialog component.

2. api-server: Wire escalation prevention in roleBindings handler.
   - Internal roles (agent:runner, credential:token-reader) rejected on create
   - Level hierarchy enforced: users can only grant strictly below their level
   - Credential-scoped grants require credential:owner on the target
   - Last-owner protection: DELETE returns 409 if sole project/credential owner

3. e2e: Convert 2 SKIP tests to real 403 assertions for escalation prevention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…OSTs

POST /sessions and POST /role_bindings carry project_id in the body,
not the URL. The middleware now peeks the body to extract project_id
and credential_id, restores it for the handler, and evaluates scope
properly. Removes role_bindings from auth-exempt — body-peeking
handles it correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d project:owner

The spec requires callers to hold both credential:owner on the
credential AND project:owner on the target project when creating
credential-scoped role bindings. The handler only checked
credential:owner — now checks both.

Also: extract RBAC scope from request body for top-level POSTs
(sessions, role_bindings) instead of exempting them. Removes
POST /role_bindings from auth-exempt list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite from 34 to 113 assertions across 13 phases. Covers all
HTTP-testable spec scenarios including credential binding dual
ownership, escalation prevention, last-owner protection, mutation
opacity, auth-exempt endpoints, and revocation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes:
- Level hierarchy check now scoped to target project/credential, not
  global — prevents viewer on proj-A using owner on proj-B to escalate
- Project scope check allows any binding holder (not just owner) to
  grant strictly-below roles on their project
- Global scope bindings restricted to platform:admin only
- Cross-project grants blocked (owner of A cannot grant on B)
- credential:viewer added to role hierarchy (was missing, only
  credential:reader was mapped)
- role_bindings routes auth-exempt (handler does full escalation check)

E2e test rewritten with generative escalation matrix:
- 3 caller levels × 10 target roles = 30 same-project tests
- 10 cross-project tests (all must be 403)
- 3 global-scope tests (all must be 403 for non-admin)
- Expected results derived from hierarchy rule, not enumerated
- Adding a role = one array entry, loop generates all combos

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pre-clean and post-clean both use hard SQL DELETE (not API soft-delete)
  to handle the non-partial unique index on projects.name
- Fix DB pod label selector (app=ambient-api-server,component=database)
- Add agent:editor to role hierarchy (was missing, caused false 403)
- Remove API-based cleanup phase — EXIT trap handles everything
- Tests now fully idempotent: 124 pass on back-to-back runs

Escalation matrix: 3 callers × 10 roles = 30 same-project combos,
10 cross-project, 3 global-scope — all derived from hierarchy rule,
zero manual expected values. Adding a role = one array entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@netlify

netlify Bot commented Jun 5, 2026

Copy link
Copy Markdown

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit 34b725b
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a272ed5bf3b0a0008552bba

@jsell-rh jsell-rh requested a review from markturansky June 5, 2026 20:57
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds scope-aware RBAC core (types, hierarchy, evaluator), middleware refactor, handler/service wiring to auto-create owner bindings, DB migrations, integration and E2E RBAC tests plus CI step, a seed-admin CLI, and a frontend project creation dialog with create plumbing.

Changes

API Server RBAC Authorization

Layer / File(s) Summary
RBAC scope, types, hierarchy and evaluator
components/ambient-api-server/pkg/rbac/scope.go, components/ambient-api-server/pkg/rbac/hierarchy.go, components/ambient-api-server/pkg/rbac/evaluator.go, components/ambient-api-server/pkg/rbac/context.go
Defines RequestScope/scope helpers and list-filtering, role-level hierarchy and internal-role marker, permission evaluator querying role_bindings+roles, and auth-result storage for request context.
Authorization middleware and helper tests
components/ambient-api-server/pkg/rbac/middleware.go, components/ambient-api-server/pkg/rbac/middleware_test.go
Middleware delegates authz to Evaluator, auto-provisions users, classifies endpoints (list/singleton/auth-exempt), handles allow/deny semantics, and adds unit tests for parsing and matching helpers.
Service wiring and owner-binding creation
components/ambient-api-server/plugins/projects/service.go, components/ambient-api-server/plugins/credentials/service.go, components/ambient-api-server/plugins/*/plugin.go
Project and credential services accept SessionFactory and create project:owner / credential:owner role_bindings for creators; plugin locators pass SessionFactory through.
Handlers: list filtering, escalation prevention, last-owner protection
components/ambient-api-server/plugins/projects/handler.go, components/ambient-api-server/plugins/credentials/handler.go, components/ambient-api-server/plugins/roleBindings/handler.go, components/ambient-api-server/plugins/projectSettings/handler.go, components/ambient-api-server/plugins/users/handler.go
Handlers apply ApplyListFilter to list endpoints; roleBindings handler enforces grant rules, internal-role blocking, patch restrictions, and last-owner deletion checks.
Sessions: gRPC + HTTP changes
components/ambient-api-server/plugins/sessions/grpc_handler.go, components/ambient-api-server/plugins/sessions/handler.go
gRPC and HTTP session handlers use RBAC auth results for authorization and filter events/lists by authorized project IDs.
Migrations and plugin init
components/ambient-api-server/plugins/credentials/migration.go, components/ambient-api-server/plugins/roleBindings/migration.go, components/ambient-api-server/plugins/users/migration.go, components/ambient-api-server/plugins/*/plugin.go
Adds migrations to seed credential roles, add credential token permission, unique binding index, and username unique partial index; registers migrations in plugin init.
seed-admin CLI and integration env
components/ambient-api-server/pkg/cmd/seed_admin.go, components/ambient-api-server/cmd/ambient-api-server/main.go, components/ambient-api-server/cmd/ambient-api-server/environments/e_integration_testing.go
Adds seed-admin command to idempotently create platform admin binding and adjusts integration test flags to disable authz.

Testing and Validation

Layer / File(s) Summary
Integration tests for RBAC behavior
components/ambient-api-server/test/integration/rbac_test.go, components/ambient-api-server/test/integration/integration_test.go
Go integration tests seed roles, verify owner-binding creation for projects/credentials, and assert user auto-provisioning.
End-to-end RBAC test script and CI step
components/ambient-api-server/test/e2e/rbac_e2e_test.sh, .github/workflows/e2e.yml
Comprehensive Bash E2E covering isolation, escalation prevention, last-owner protection, mutation opacity, scheduled-session and credential-token RBAC; workflow step enables JWT/authz and runs the RBAC E2E script against a forwarded API and Keycloak.

Frontend Project Creation

Layer / File(s) Summary
Project creation dialog component
components/ambient-ui/src/app/(dashboard)/_components/create-project-dialog.tsx
Client component with DNS-1123-like name validation, optional description, and create mutation handling.
Project create port, adapter, and hook
components/ambient-ui/src/ports/projects.ts, components/ambient-ui/src/adapters/sdk-projects.ts, components/ambient-ui/src/queries/use-projects.ts, components/ambient-ui/src/app/(dashboard)/page.tsx
Adds ProjectCreateInput and ProjectsPort.create; SDK adapter implements create; useCreateProject mutation invalidates project queries and UI integrates dialog into dashboard.

Possibly related PRs

  • ambient-code/platform#1183: Related changes to gRPC session authorization logic and ownership/service-account bypass behavior.
  • ambient-code/platform#1640: Implements runtime RBAC enforcement, list filtering, owner-binding seeding, and RoleBinding mutation guards similar to this PR.
  • ambient-code/platform#1354: Prior RBAC middleware changes that touch path/action/resource parsing and authorization behavior this PR extends.

Suggested labels

sdd-exempt


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Performance And Algorithmic Complexity ❌ Error JSON unmarshaling in authorization loop: Evaluate() unmarshals permission arrays per binding. O(N) JSON parses per authorization check with N bindings. Parse role permissions once during fetchBindings rather than per-binding in the Evaluate loop, or cache parsed permissions by role_id.
Docstring Coverage ⚠️ Warning Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format (feat(scope): description) and directly reflects the main change: implementing scope-aware RBAC authorization in the API server.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security And Secret Handling ✅ Passed Code uses parameterized SQL queries, proper quote escaping, doesn't log secrets/tokens, requires auth before RBAC checks, enforces authorization via hierarchy/scope.
Kubernetes Resource Safety ✅ Passed PR does not modify Kubernetes manifests or resource definitions; changes are application-level RBAC logic in Go code, DB migrations, and UI—not subject to K8s resource safety checks.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jsell/feat/rbac-enforcement
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch jsell/feat/rbac-enforcement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…nding list

- Session sub-resource list endpoints (GET /sessions/{id}/messages)
  now resolve session→project scope and block unauthorized callers
- Role binding list filtered to caller's own bindings (user_id match)
- Project settings and session_messages endpoints verified safe (blocked
  or internal-only)
- E2e expanded to 142 tests covering session message isolation, role
  binding list isolation, session clone/stop/delete cross-user, and
  project settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/ambient-api-server/plugins/projects/service.go (1)

104-133: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make owner-binding creation failure transactional from API perspective.

Line 104 ignores owner-binding outcome, and Line 131-133 only logs warning. That allows project creation to succeed without its bootstrap project:owner binding, leaving RBAC state inconsistent. Propagate this error (and treat RowsAffected == 0 as failure).

Suggested fix
-	s.createOwnerBinding(ctx, project.ID)
+	if err := s.createOwnerBinding(ctx, project.ID); err != nil {
+		return nil, services.HandleCreateError("Project", err)
+	}
@@
-func (s *sqlProjectService) createOwnerBinding(ctx context.Context, projectID string) {
+func (s *sqlProjectService) createOwnerBinding(ctx context.Context, projectID string) error {
 	username := auth.GetUsernameFromContext(ctx)
 	if username == "" {
-		return
+		return fmt.Errorf("missing username in auth context for project owner binding")
 	}
@@
 	if result.Error != nil {
-		glog.Warningf("failed to create owner binding for project %s: %v", projectID, result.Error)
+		return fmt.Errorf("create owner binding for project %s: %w", projectID, result.Error)
 	}
+	if result.RowsAffected != 1 {
+		return fmt.Errorf("owner binding not created for project %s", projectID)
+	}
+	return nil
 }

As per coding guidelines, "Never silently swallow partial failures; every error path must propagate or be explicitly collected, never discarded".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/plugins/projects/service.go` around lines 104 -
133, The createOwnerBinding currently swallows errors; change its signature on
sqlProjectService to return an error and in createOwnerBinding check
result.Error and result.RowsAffected (treat RowsAffected == 0 as failure)
returning a descriptive error if either occurs; then call createOwnerBinding
from the caller (where project is created) and if it returns an error propagate
it using services.HandleCreateError("Project", err) before proceeding to
s.events.Create so the API treats owner-binding failures as fatal and prevents
partial project creation.
components/ambient-api-server/plugins/roleBindings/handler.go (1)

139-185: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Patch handler bypasses escalation prevention — security gap.

Create enforces escalation checks (internal role rejection, hierarchy validation, scope restrictions), but Patch allows updating RoleId, Scope, ProjectId, CredentialId without any authorization checks. An attacker with any role binding access could escalate privileges by patching their binding to a higher role or different scope.

🔒 Recommended: Add escalation checks to Patch

Either:

  1. Disallow patching RoleId and scope fields entirely
  2. Apply the same escalation prevention logic from Create when these fields change:
 func (h roleBindingHandler) Patch(w http.ResponseWriter, r *http.Request) {
 	var patch openapi.RoleBindingPatchRequest
 
 	cfg := &handlers.HandlerConfig{
 		Body:       &patch,
 		Validators: []handlers.Validate{},
 		Action: func() (interface{}, *errors.ServiceError) {
 			ctx := r.Context()
 			id := mux.Vars(r)["id"]
 			found, err := h.roleBinding.Get(ctx, id)
 			if err != nil {
 				return nil, err
 			}
+
+			// Block patching role_id and scope — use delete + create instead
+			if patch.RoleId != nil || patch.Scope != nil || patch.ProjectId != nil || patch.CredentialId != nil {
+				return nil, errors.Forbidden("cannot modify role, scope, or resource bindings — delete and recreate instead")
+			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/plugins/roleBindings/handler.go` around lines
139 - 185, The Patch handler in roleBindingHandler currently allows updating
sensitive fields (RoleId, Scope, ProjectId, CredentialId) without performing
escalation checks, letting callers escalate privileges; update the Patch method
to either forbid changes to those sensitive fields or run the same
authorization/escalation-prevention logic used in Create before calling
h.roleBinding.Replace: detect changes to RoleId/Scope/ProjectId/CredentialId,
validate the new values against the caller’s privileges (internal role
rejection, role hierarchy validation, scope restrictions), and return a
permission error if the checks fail, otherwise proceed to call
h.roleBinding.Replace and PresentRoleBinding.
🧹 Nitpick comments (3)
components/ambient-api-server/test/integration/rbac_test.go (1)

80-204: 🏗️ Heavy lift

Refactor RBAC integration tests into table-driven subtests.

The tests from Line 80 onward repeat the same setup/reset pattern and would be easier to extend/maintain as table-driven t.Run(...) cases, which is the test-file standard here.

Based on learnings: "Applies to components/ambient-api-server/**/*_test.go : Use table-driven tests with subtests in test files".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/test/integration/rbac_test.go` around lines 80
- 204, Multiple RBAC tests (TestRBAC_ProjectCreationCreatesOwnerBinding,
TestRBAC_CredentialCreationCreatesOwnerBinding, TestRBAC_UserAutoProvisioned,
TestRBAC_MissingRolesSeeded) duplicate setup and teardown; refactor them into a
single table-driven test that iterates over subcases with t.Run. Create a slice
of cases with a name and a closure or enum describing the behavior to assert
(e.g., "project-owner-binding", "credential-owner-binding",
"user-auto-provisioned", "missing-roles-seeded"), move common setup (h :=
test.NewHelper(t); h.DBFactory.ResetDB(); ensureBuiltInRoles(t)) outside the
loop or into a shared setup helper, and inside each subtest invoke the specific
logic currently in each TestRBAC_* function (creating account/ctx, making API
call or DB query, and the Expect assertions). Ensure each subtest uses its own
authenticated context or request payload as before and keeps the same unique
verification SQL checks and expectations so behavior is unchanged.
components/ambient-api-server/pkg/rbac/hierarchy.go (1)

12-14: ⚡ Quick win

Inconsistent role name definitions — use constants.

Lines 12 and 14 use hardcoded strings ("agent:editor", "credential:viewer") while the rest of the map uses constants from the permissions.go file. Either add these as constants or remove them if they're not valid roles.

♻️ Suggested fix
-	"agent:editor":       2,
+	// RoleAgentEditor:    2,  // Add constant if this role exists
 	RoleCredentialReader: 2,
-	"credential:viewer":  2,
+	// RoleCredentialViewer: 2,  // Add constant if this role exists
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/pkg/rbac/hierarchy.go` around lines 12 - 14,
The role map in hierarchy.go mixes hardcoded role strings ("agent:editor",
"credential:viewer") with constants (e.g., RoleCredentialReader); replace those
hardcoded strings by referencing the appropriate constants from permissions.go
(or add new constants there following the existing naming convention) or remove
them if they are invalid roles; update the map entries to use the constant
identifiers instead of literal strings so all role keys (like
RoleCredentialReader) are consistent.
components/ambient-api-server/plugins/credentials/service.go (1)

161-164: ⚡ Quick win

Hardcoded role name — use constant for consistency.

Line 163 uses 'credential:owner' as a string literal. The pkgrbac.RoleCredentialOwner constant should be used for consistency with the role binding handler.

♻️ Suggested fix
+	pkgrbac "github.com/ambient-code/platform/components/ambient-api-server/pkg/rbac"
 ...
 	result := g.Exec(
 		`INSERT INTO role_bindings (id, role_id, scope, user_id, credential_id, created_at, updated_at)
 		 SELECT ?, r.id, 'credential', ?, ?, NOW(), NOW()
-		 FROM roles r WHERE r.name = 'credential:owner' AND r.deleted_at IS NULL
+		 FROM roles r WHERE r.name = ? AND r.deleted_at IS NULL
 		 LIMIT 1`,
-		api.NewID(), username, credentialID,
+		api.NewID(), username, credentialID, pkgrbac.RoleCredentialOwner,
 	)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/plugins/credentials/service.go` around lines
161 - 164, Replace the hardcoded 'credential:owner' string in the INSERT ...
SELECT SQL with the pkgrbac.RoleCredentialOwner constant: change the WHERE
clause to use a placeholder (e.g. WHERE r.name = ?) and add
pkgrbac.RoleCredentialOwner to the DB call arguments (the same call that
executes the INSERT into role_bindings), or if the query is built inline,
interpolate pkgrbac.RoleCredentialOwner instead of the literal; ensure you
update the execution site so the constant is passed and used (reference
pkgrbac.RoleCredentialOwner and the INSERT INTO role_bindings ... SELECT ...
WHERE r.name condition).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/ambient-api-server/pkg/rbac/context.go`:
- Around line 35-47: In ApplyListFilter, stop allowing missing auth to bypass
RBAC: change the auth nil-check so IsGlobalAdmin still grants access but auth ==
nil does NOT return true (fail-closed / deny or surface an auth-missing error);
additionally, distinguish nil vs empty slice for scope checks by inspecting
whether auth.CredentialIDs or auth.ProjectIDs is nil (nil = globally scoped
allow) vs len(...) == 0 (explicit empty scope deny), using the useCredentialIDs
flag to pick between CredentialIDs and ProjectIDs while preserving existing
IsGlobalAdmin behavior.

In `@components/ambient-api-server/pkg/rbac/evaluator.go`:
- Around line 34-45: The query in the g.Raw call filters
role_bindings.rb.user_id using the variable username, but role_bindings.user_id
stores users.id; change the filter to use the user's numeric ID or join to users
to match username. Update the Raw SQL in evaluator.go (the g.Raw(...) that scans
into rows) to either: (a) accept and use a userID variable and replace "WHERE
rb.user_id = ?" with "WHERE rb.user_id = ?" passing userID instead of username,
or (b) JOIN users u ON u.id = rb.user_id and replace the WHERE clause with
"WHERE u.username = ?" so the existing username parameter works; keep the rest
of the selected columns and the Scan(&rows). Ensure the parameter passed to
g.Raw matches the chosen filter.

In `@components/ambient-api-server/pkg/rbac/middleware.go`:
- Around line 76-77: The calls to m.evaluator.AuthorizedProjectIDs and
AuthorizedCredentialIDs currently ignore returned errors (projectIDs, isGlobal,
_ and credentialIDs, credGlobal, _); update the middleware to check and handle
those errors instead of discarding them: capture the error results from
AuthorizedProjectIDs and AuthorizedCredentialIDs, and if either returns an
error, propagate/fail fast (e.g., abort the request with an appropriate error
response or return the error up the call chain) rather than continuing with
partial/empty auth context; make the same change for both occurrences where
projectIDs/isGlobal and credentialIDs/credGlobal are assigned so DB/query
failures are surfaced consistently.

In `@components/ambient-api-server/pkg/rbac/scope.go`:
- Around line 118-119: The case that returns true for
strings.HasPrefix(normalized, "/api/ambient/v1/role_bindings") must be removed
because it blanket-exempts role_binding routes from RBAC; instead, delete that
unconditional return and reintroduce granular authorization: allow only read
operations (e.g., GET/HEAD) to proceed under the normal read permission checks
for role bindings and enforce full RBAC checks for create/update/delete
(POST/PUT/PATCH/DELETE) by invoking the existing authorization routine used
elsewhere in scope.go (use the same normalized variable and existing auth helper
functions), so no role_binding route bypasses the standard permission
evaluation.

In `@components/ambient-api-server/plugins/roleBindings/handler.go`:
- Around line 60-77: The SQL queries using g.Raw(...).Scan(&callerRoleNames)
ignore returned errors, so a failing query leaves callerRoleNames empty and
causes HighestLevel to treat the caller as lowest-privilege; capture and handle
those errors. For each Raw(...).Scan(&callerRoleNames) call in the role binding
lookup (the three blocks that check roleBinding.Scope == "project",
"credential", and the else), assign the result to an err (e.g., err :=
g.Raw(...).Scan(&callerRoleNames).Error or err :=
g.Raw(...).Scan(&callerRoleNames); if err != nil { ... }) and then handle it
consistently: log the error with context (including username and
projectId/credentialId where applicable) and return or propagate an appropriate
handler error/HTTP 500 instead of continuing to compute HighestLevel; ensure all
three branches use the same error handling behavior.

In `@components/ambient-api-server/test/integration/rbac_test.go`:
- Around line 38-45: The loop that seeds roles currently calls g.Exec(...) and
discards the returned error, so role insert failures are silently ignored;
update the roles seeding loop (the for _, r := range roles block that calls
g.Exec) to capture the result and check its Error (or err return), and if
non-nil either return/propagate the error from the test setup or fail the test
immediately (e.g., t.Fatal/t.Fatalf or return the error), ensuring any partial
seeding failure is not swallowed.
- Around line 48-78: The helpers seedRole, createBinding, and createUser are
defined but never used, causing golangci-lint unused errors; fix by either
removing these unused functions from the test file or actually calling them from
the integration tests that need seeding (e.g., replace inline SQL setup with
calls to seedRole/createBinding/createUser or add test helpers that invoke
them), ensuring the symbols seedRole, createBinding, or createUser are no longer
dead; if you intentionally want to keep them for future use, move their logic
into a shared helper that's referenced by at least one test or delete them to
satisfy the linter.

In
`@components/ambient-ui/src/app/`(dashboard)/_components/create-project-dialog.tsx:
- Around line 117-122: Replace the direct rendering of backend error text in the
CreateProjectDialog component: do not display createProject.error.message to
users; instead show a generic, user-facing message like "Failed to create
project" when createProject.isError is true; keep detailed error information out
of the UI and, if needed for debugging, record it via console.error or your
telemetry/logger from the createProject mutation handler (refer to createProject
and the CreateProjectDialog component to locate the rendering).
- Around line 19-25: validateProjectName currently enforces a minimum length of
2 and uses DNS_1123_REGEX that requires the name to start with a letter, which
incorrectly rejects valid DNS-1123 labels (single-character names and names
starting with digits). Update validateProjectName to allow length 1–63 (change
the first length check to 1) and replace DNS_1123_REGEX with one that allows
lower-case letters, digits and hyphens, and requires the name to start and end
with an alphanumeric character (e.g., a pattern like
^[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?$), and update the returned error message
from validateProjectName to reflect "Lowercase letters, numbers, and hyphens
only. Must start and end with a letter or number."

---

Outside diff comments:
In `@components/ambient-api-server/plugins/projects/service.go`:
- Around line 104-133: The createOwnerBinding currently swallows errors; change
its signature on sqlProjectService to return an error and in createOwnerBinding
check result.Error and result.RowsAffected (treat RowsAffected == 0 as failure)
returning a descriptive error if either occurs; then call createOwnerBinding
from the caller (where project is created) and if it returns an error propagate
it using services.HandleCreateError("Project", err) before proceeding to
s.events.Create so the API treats owner-binding failures as fatal and prevents
partial project creation.

In `@components/ambient-api-server/plugins/roleBindings/handler.go`:
- Around line 139-185: The Patch handler in roleBindingHandler currently allows
updating sensitive fields (RoleId, Scope, ProjectId, CredentialId) without
performing escalation checks, letting callers escalate privileges; update the
Patch method to either forbid changes to those sensitive fields or run the same
authorization/escalation-prevention logic used in Create before calling
h.roleBinding.Replace: detect changes to RoleId/Scope/ProjectId/CredentialId,
validate the new values against the caller’s privileges (internal role
rejection, role hierarchy validation, scope restrictions), and return a
permission error if the checks fail, otherwise proceed to call
h.roleBinding.Replace and PresentRoleBinding.

---

Nitpick comments:
In `@components/ambient-api-server/pkg/rbac/hierarchy.go`:
- Around line 12-14: The role map in hierarchy.go mixes hardcoded role strings
("agent:editor", "credential:viewer") with constants (e.g.,
RoleCredentialReader); replace those hardcoded strings by referencing the
appropriate constants from permissions.go (or add new constants there following
the existing naming convention) or remove them if they are invalid roles; update
the map entries to use the constant identifiers instead of literal strings so
all role keys (like RoleCredentialReader) are consistent.

In `@components/ambient-api-server/plugins/credentials/service.go`:
- Around line 161-164: Replace the hardcoded 'credential:owner' string in the
INSERT ... SELECT SQL with the pkgrbac.RoleCredentialOwner constant: change the
WHERE clause to use a placeholder (e.g. WHERE r.name = ?) and add
pkgrbac.RoleCredentialOwner to the DB call arguments (the same call that
executes the INSERT into role_bindings), or if the query is built inline,
interpolate pkgrbac.RoleCredentialOwner instead of the literal; ensure you
update the execution site so the constant is passed and used (reference
pkgrbac.RoleCredentialOwner and the INSERT INTO role_bindings ... SELECT ...
WHERE r.name condition).

In `@components/ambient-api-server/test/integration/rbac_test.go`:
- Around line 80-204: Multiple RBAC tests
(TestRBAC_ProjectCreationCreatesOwnerBinding,
TestRBAC_CredentialCreationCreatesOwnerBinding, TestRBAC_UserAutoProvisioned,
TestRBAC_MissingRolesSeeded) duplicate setup and teardown; refactor them into a
single table-driven test that iterates over subcases with t.Run. Create a slice
of cases with a name and a closure or enum describing the behavior to assert
(e.g., "project-owner-binding", "credential-owner-binding",
"user-auto-provisioned", "missing-roles-seeded"), move common setup (h :=
test.NewHelper(t); h.DBFactory.ResetDB(); ensureBuiltInRoles(t)) outside the
loop or into a shared setup helper, and inside each subtest invoke the specific
logic currently in each TestRBAC_* function (creating account/ctx, making API
call or DB query, and the Expect assertions). Ensure each subtest uses its own
authenticated context or request payload as before and keeps the same unique
verification SQL checks and expectations so behavior is unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a57cfa96-5cf8-491f-9179-b196073cecda

📥 Commits

Reviewing files that changed from the base of the PR and between cd9d5d9 and 8809bb8.

📒 Files selected for processing (33)
  • .github/workflows/e2e.yml
  • components/ambient-api-server/cmd/ambient-api-server/environments/e_integration_testing.go
  • components/ambient-api-server/cmd/ambient-api-server/main.go
  • components/ambient-api-server/pkg/cmd/seed_admin.go
  • components/ambient-api-server/pkg/rbac/context.go
  • components/ambient-api-server/pkg/rbac/evaluator.go
  • components/ambient-api-server/pkg/rbac/hierarchy.go
  • components/ambient-api-server/pkg/rbac/hierarchy_test.go
  • components/ambient-api-server/pkg/rbac/middleware.go
  • components/ambient-api-server/pkg/rbac/middleware_test.go
  • components/ambient-api-server/pkg/rbac/scope.go
  • components/ambient-api-server/plugins/credentials/encryption_integration_test.go
  • components/ambient-api-server/plugins/credentials/handler.go
  • components/ambient-api-server/plugins/credentials/migration.go
  • components/ambient-api-server/plugins/credentials/plugin.go
  • components/ambient-api-server/plugins/credentials/service.go
  • components/ambient-api-server/plugins/projects/handler.go
  • components/ambient-api-server/plugins/projects/plugin.go
  • components/ambient-api-server/plugins/projects/service.go
  • components/ambient-api-server/plugins/roleBindings/handler.go
  • components/ambient-api-server/plugins/roleBindings/plugin.go
  • components/ambient-api-server/plugins/sessions/grpc_handler.go
  • components/ambient-api-server/plugins/sessions/handler.go
  • components/ambient-api-server/plugins/users/migration.go
  • components/ambient-api-server/plugins/users/plugin.go
  • components/ambient-api-server/test/e2e/rbac_e2e_test.sh
  • components/ambient-api-server/test/integration/integration_test.go
  • components/ambient-api-server/test/integration/rbac_test.go
  • components/ambient-ui/src/adapters/sdk-projects.ts
  • components/ambient-ui/src/app/(dashboard)/_components/create-project-dialog.tsx
  • components/ambient-ui/src/app/(dashboard)/page.tsx
  • components/ambient-ui/src/ports/projects.ts
  • components/ambient-ui/src/queries/use-projects.ts

Comment thread components/ambient-api-server/pkg/rbac/context.go Outdated
Comment on lines +34 to +45
err := g.Raw(`
SELECT rb.role_id, r.name AS role_name, rb.scope,
rb.user_id, rb.project_id, rb.agent_id,
rb.session_id, rb.credential_id,
r.permissions
FROM role_bindings rb
JOIN roles r ON r.id = rb.role_id
WHERE rb.user_id = ?
AND rb.deleted_at IS NULL
AND r.deleted_at IS NULL
`, username).Scan(&rows).Error
return rows, err

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Binding lookup uses username against role_bindings.user_id

Line 44 filters rb.user_id with username, but role bindings are created with users.id (not username). This breaks the users↔role_bindings contract and can deny valid requests by returning zero bindings.

Suggested query fix
-	FROM role_bindings rb
-	JOIN roles r ON r.id = rb.role_id
-	WHERE rb.user_id = ?
+	FROM role_bindings rb
+	JOIN users u ON u.id = rb.user_id
+	JOIN roles r ON r.id = rb.role_id
+	WHERE u.username = ?
+	  AND u.deleted_at IS NULL
 	  AND rb.deleted_at IS NULL
 	  AND r.deleted_at IS NULL
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/pkg/rbac/evaluator.go` around lines 34 - 45,
The query in the g.Raw call filters role_bindings.rb.user_id using the variable
username, but role_bindings.user_id stores users.id; change the filter to use
the user's numeric ID or join to users to match username. Update the Raw SQL in
evaluator.go (the g.Raw(...) that scans into rows) to either: (a) accept and use
a userID variable and replace "WHERE rb.user_id = ?" with "WHERE rb.user_id = ?"
passing userID instead of username, or (b) JOIN users u ON u.id = rb.user_id and
replace the WHERE clause with "WHERE u.username = ?" so the existing username
parameter works; keep the rest of the selected columns and the Scan(&rows).
Ensure the parameter passed to g.Raw matches the chosen filter.

Comment thread components/ambient-api-server/pkg/rbac/middleware.go Outdated
Comment thread components/ambient-api-server/pkg/rbac/scope.go Outdated
Comment thread components/ambient-api-server/plugins/roleBindings/handler.go Outdated
Comment thread components/ambient-api-server/test/integration/rbac_test.go
Comment thread components/ambient-api-server/test/integration/rbac_test.go Outdated
Comment on lines +19 to +25
const DNS_1123_REGEX = /^[a-z][a-z0-9-]*[a-z0-9]$/

function validateProjectName(name: string): string | null {
if (name.length < 2) return 'Name must be at least 2 characters'
if (name.length > 63) return 'Name must be 63 characters or fewer'
if (!DNS_1123_REGEX.test(name))
return 'Lowercase letters, numbers, and hyphens only. Must start with a letter and end with a letter or number.'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

DNS-1123 validation is currently stricter than DNS-1123 and rejects valid names.

Line 19 and Line 22 reject valid DNS-1123 labels (e.g., single-character names and digit-prefixed names). That creates false client-side rejects against a DNS-1123 requirement.

Suggested fix
-const DNS_1123_REGEX = /^[a-z][a-z0-9-]*[a-z0-9]$/
+const DNS_1123_REGEX = /^[a-z0-9]([-a-z0-9]*[a-z0-9])?$/

 function validateProjectName(name: string): string | null {
-  if (name.length < 2) return 'Name must be at least 2 characters'
+  if (name.length < 1) return 'Name is required'
   if (name.length > 63) return 'Name must be 63 characters or fewer'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const DNS_1123_REGEX = /^[a-z][a-z0-9-]*[a-z0-9]$/
function validateProjectName(name: string): string | null {
if (name.length < 2) return 'Name must be at least 2 characters'
if (name.length > 63) return 'Name must be 63 characters or fewer'
if (!DNS_1123_REGEX.test(name))
return 'Lowercase letters, numbers, and hyphens only. Must start with a letter and end with a letter or number.'
const DNS_1123_REGEX = /^[a-z0-9]([-a-z0-9]*[a-z0-9])?$/
function validateProjectName(name: string): string | null {
if (name.length < 1) return 'Name is required'
if (name.length > 63) return 'Name must be 63 characters or fewer'
if (!DNS_1123_REGEX.test(name))
return 'Lowercase letters, numbers, and hyphens only. Must start with a letter and end with a letter or number.'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@components/ambient-ui/src/app/`(dashboard)/_components/create-project-dialog.tsx
around lines 19 - 25, validateProjectName currently enforces a minimum length of
2 and uses DNS_1123_REGEX that requires the name to start with a letter, which
incorrectly rejects valid DNS-1123 labels (single-character names and names
starting with digits). Update validateProjectName to allow length 1–63 (change
the first length check to 1) and replace DNS_1123_REGEX with one that allows
lower-case letters, digits and hyphens, and requires the name to start and end
with an alphanumeric character (e.g., a pattern like
^[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?$), and update the returned error message
from validateProjectName to reflect "Lowercase letters, numbers, and hyphens
only. Must start and end with a letter or number."

jsell-rh and others added 3 commits June 5, 2026 17:13
- Add unique partial index on role_bindings preventing duplicate
  bindings (role_id + user_id + project_id + agent_id + session_id +
  credential_id WHERE deleted_at IS NULL). Duplicate attempts now
  return 409 Conflict.
- Session sub-resource list endpoints now resolve session→project
  scope before filtering, closing the GET /sessions/{id}/messages leak.
- Role binding list filtered to caller's own bindings (user_id match).
- E2e test expanded to 142 assertions covering session message
  isolation, role binding list isolation, session operations cross-user.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PATCH escalation (2), session sub-resource access (8), scheduled
sessions (2), credential token fetch (1). All expected to fail
against current code; fixes follow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ltering, sub-resources

Critical fixes:
- PATCH /role_bindings: ownership + escalation checks added; non-admins
  can only patch own bindings, role changes validated via CanGrant
- Auth-exempt narrowed: only POST /role_bindings exempt, not all methods
- Credential token permission aligned: fetch_token in roles + migration

High fixes:
- Users list filtered to own record for non-admins
- ProjectSettings list filtered by authorized project IDs
- RoleBinding list expanded: shows own bindings + project/credential scope
- Scheduled-session mapped to session resource in pathToResource
- Session sub-resource singleton GETs: fallback scope resolution added

17 e2e failures remain (DELETE role_bindings auth, session scope leaks,
matrix unique constraint conflicts, scheduled-session format). These
are tracked for the next iteration.

139 of 156 e2e tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/ambient-api-server/plugins/roleBindings/handler.go (1)

332-353: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unchecked SQL errors in Delete can bypass last-owner protection.

Three g.Raw(...).Scan(...) calls ignore errors:

  1. Line 334: Role name lookup — if it fails, roleName is empty, both owner checks are skipped.
  2. Lines 338-340, 347-349: Count queries — if they fail, count is 0 (Go default), satisfying count <= 1 and allowing deletion of the last owner.

This is a security gap: a transient DB error could let a caller orphan a project or credential.

🛡️ Proposed fix
 			var roleName string
 			g := (*h.sessionFactory).New(ctx)
-			g.Raw("SELECT name FROM roles WHERE id = ? AND deleted_at IS NULL", binding.RoleId).Scan(&roleName)
+			if err := g.Raw("SELECT name FROM roles WHERE id = ? AND deleted_at IS NULL", binding.RoleId).Scan(&roleName).Error; err != nil {
+				return nil, errors.GeneralError("failed to resolve role: %v", err)
+			}

 			if roleName == pkgrbac.RoleProjectOwner && binding.ProjectId != nil {
 				var count int64
-				g.Raw(`SELECT COUNT(*) FROM role_bindings
+				if err := g.Raw(`SELECT COUNT(*) FROM role_bindings
 					   WHERE role_id = ? AND project_id = ? AND deleted_at IS NULL`,
-					binding.RoleId, *binding.ProjectId).Scan(&count)
+					binding.RoleId, *binding.ProjectId).Scan(&count).Error; err != nil {
+					return nil, errors.GeneralError("failed to count project owners: %v", err)
+				}
 				if count <= 1 {
 					return nil, errors.New(errors.ErrorConflict, "cannot delete the last owner binding")
 				}
 			}
 			if roleName == pkgrbac.RoleCredentialOwner && binding.CredentialId != nil {
 				var count int64
-				g.Raw(`SELECT COUNT(*) FROM role_bindings
+				if err := g.Raw(`SELECT COUNT(*) FROM role_bindings
 					   WHERE role_id = ? AND credential_id = ? AND deleted_at IS NULL`,
-					binding.RoleId, *binding.CredentialId).Scan(&count)
+					binding.RoleId, *binding.CredentialId).Scan(&count).Error; err != nil {
+					return nil, errors.GeneralError("failed to count credential owners: %v", err)
+				}
 				if count <= 1 {
 					return nil, errors.New(errors.ErrorConflict, "cannot delete the last owner binding")
 				}
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/plugins/roleBindings/handler.go` around lines
332 - 353, The Delete handler currently ignores errors from the three database
reads (the role name lookup via g.Raw(...).Scan(&roleName) and the two COUNT
queries), which can cause last-owner checks to be bypassed; update the code
around h.sessionFactory.New(ctx) to capture and check the error returned by each
Scan call (e.g., err := g.Raw(...).Scan(&roleName).Error or similar), and if any
DB call returns an error return that error (or wrap it with
errors.New/errors.Errorf) instead of continuing, ensuring the last-owner
protection in the pkgrbac.RoleProjectOwner and pkgrbac.RoleCredentialOwner
checks cannot be bypassed by transient DB failures.
components/ambient-api-server/test/e2e/rbac_e2e_test.sh (1)

28-38: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add timeout to curl calls to prevent CI hangs.

The api() helper has no timeout configured. If the API or network stalls, tests hang indefinitely—common cause of CI flakes.

🛠️ Proposed fix
 api() {
   local method="$1" path="$2" token="$3" body="${4:-}"
-  local args=(-s -w '\n%{http_code}' -H "Authorization: Bearer $token" -H "Content-Type: application/json")
+  local args=(-s -w '\n%{http_code}' --connect-timeout 10 --max-time 30 -H "Authorization: Bearer $token" -H "Content-Type: application/json")
   if [[ -n "$body" ]]; then
     args+=(-d "$body")
   fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/test/e2e/rbac_e2e_test.sh` around lines 28 -
38, The api() helper lacks a curl timeout causing CI hangs; update the args
array in the api() function to include a curl timeout (e.g., add -m 30 or
--max-time 30 and optionally --connect-timeout 5) so the curl invocation in
response=$(curl "${args[@]}" -X "$method" "${API_URL}${path}") will fail fast on
network/API stalls; modify the args variable build (where args=(-s -w
'\n%{http_code}' ...)) to append the timeout flag(s) so all invocations of api()
get a bounded wait.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/ambient-api-server/test/e2e/rbac_e2e_test.sh`:
- Around line 996-998: The test currently calls fail() twice for the same
condition (the two consecutive fail "CRITICAL: Could not find User A's owner
binding" calls), which increments FAIL_COUNT twice; fix by invoking fail() only
once and merge the two messages into a single multi-line message (e.g., include
both "binding lookup returned empty" and "skipping PATCH escalation tests" in
one fail() call) so the failure is reported once; update the code around the
fail invocations that reference User A's owner binding to use a single fail()
call.

---

Outside diff comments:
In `@components/ambient-api-server/plugins/roleBindings/handler.go`:
- Around line 332-353: The Delete handler currently ignores errors from the
three database reads (the role name lookup via g.Raw(...).Scan(&roleName) and
the two COUNT queries), which can cause last-owner checks to be bypassed; update
the code around h.sessionFactory.New(ctx) to capture and check the error
returned by each Scan call (e.g., err := g.Raw(...).Scan(&roleName).Error or
similar), and if any DB call returns an error return that error (or wrap it with
errors.New/errors.Errorf) instead of continuing, ensuring the last-owner
protection in the pkgrbac.RoleProjectOwner and pkgrbac.RoleCredentialOwner
checks cannot be bypassed by transient DB failures.

In `@components/ambient-api-server/test/e2e/rbac_e2e_test.sh`:
- Around line 28-38: The api() helper lacks a curl timeout causing CI hangs;
update the args array in the api() function to include a curl timeout (e.g., add
-m 30 or --max-time 30 and optionally --connect-timeout 5) so the curl
invocation in response=$(curl "${args[@]}" -X "$method" "${API_URL}${path}")
will fail fast on network/API stalls; modify the args variable build (where
args=(-s -w '\n%{http_code}' ...)) to append the timeout flag(s) so all
invocations of api() get a bounded wait.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7d22e99a-a5b4-4c02-9c5b-39e42096fcce

📥 Commits

Reviewing files that changed from the base of the PR and between 8809bb8 and cb6be83.

📒 Files selected for processing (12)
  • components/ambient-api-server/pkg/rbac/middleware.go
  • components/ambient-api-server/pkg/rbac/middleware_test.go
  • components/ambient-api-server/pkg/rbac/scope.go
  • components/ambient-api-server/plugins/credentials/migration.go
  • components/ambient-api-server/plugins/credentials/plugin.go
  • components/ambient-api-server/plugins/projectSettings/handler.go
  • components/ambient-api-server/plugins/roleBindings/handler.go
  • components/ambient-api-server/plugins/roleBindings/migration.go
  • components/ambient-api-server/plugins/roleBindings/plugin.go
  • components/ambient-api-server/plugins/users/handler.go
  • components/ambient-api-server/test/e2e/rbac_e2e_test.sh
  • components/ambient-api-server/test/integration/rbac_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • components/ambient-api-server/plugins/credentials/plugin.go
  • components/ambient-api-server/pkg/rbac/middleware_test.go
  • components/ambient-api-server/pkg/rbac/scope.go
  • components/ambient-api-server/pkg/rbac/middleware.go
  • components/ambient-api-server/test/integration/rbac_test.go

Comment thread components/ambient-api-server/test/e2e/rbac_e2e_test.sh Outdated
jsell-rh and others added 2 commits June 5, 2026 18:14
Fixes:
- DELETE /role_bindings: middleware resolves binding's project/credential
  scope from DB instead of auth-exempting (no endpoints are exempt
  except POST /projects, POST /credentials, POST /role_bindings,
  GET /roles)
- Session sub-resources: singleton GET fallback resolves session→project
  scope for events, workspace, git, agui, mcp, pod-events, export
- Scheduled sessions: mapped to "session" resource in pathToResource,
  fixed create handler to accept body without project_id
- Credential token: aligned permission to "credential:fetch_token" in
  both credential:owner and credential:token-reader roles
- credential:owner gets role_binding:create + role_binding:delete (not
  wildcard) for managing own credential bindings
- Users list filtered to own record for non-admins
- ProjectSettings list filtered by authorized project IDs
- RoleBinding list expanded to show project/credential scope bindings
- PATCH /role_bindings: ownership + escalation checks, non-admins can
  only patch own bindings
- Events endpoint: accepts 404 when no runner pod exists

156 e2e tests, 43 generative escalation matrix, fully idempotent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/ambient-api-server/plugins/credentials/migration.go`:
- Around line 198-223: Remove the redundant migration
credentialOwnerRoleBindingPermMigration (ID: "202606050004") because its
permissions duplicate credentialTokenPermMigration ("202606050003") and its
Rollback undoes role_binding changes from the earlier migration; delete the
entire credentialOwnerRoleBindingPermMigration function and remove its
registration call in plugin.go so the migration chain and rollback semantics
remain consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aaf935b5-eb6a-4df9-bd37-32d7bc57f188

📥 Commits

Reviewing files that changed from the base of the PR and between cb6be83 and 33977a3.

📒 Files selected for processing (6)
  • components/ambient-api-server/pkg/rbac/middleware.go
  • components/ambient-api-server/plugins/credentials/migration.go
  • components/ambient-api-server/plugins/credentials/plugin.go
  • components/ambient-api-server/plugins/scheduledSessions/handler.go
  • components/ambient-api-server/test/e2e/rbac_e2e_test.sh
  • components/ambient-api-server/test/integration/rbac_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • components/ambient-api-server/plugins/credentials/plugin.go
  • components/ambient-api-server/test/integration/rbac_test.go
  • components/ambient-api-server/pkg/rbac/middleware.go
  • components/ambient-api-server/test/e2e/rbac_e2e_test.sh

Comment thread components/ambient-api-server/plugins/credentials/migration.go
jsell-rh and others added 2 commits June 5, 2026 19:26
User B can PATCH their own binding to change project_id to an
unauthorized project, gaining access. 3 failures confirm the
vulnerability. Fix follows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Non-admins can no longer PATCH a binding's scope, project_id,
agent_id, session_id, or credential_id. Bindings are effectively
immutable in scope — delete and recreate to change scope.

All three escalation handlers (Create, Patch, Delete) now fail-closed
when sessionFactory is nil instead of silently skipping checks.

160 e2e tests passing, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/ambient-api-server/plugins/roleBindings/handler.go (2)

360-377: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Delete-side last-owner protection is fail-open on DB errors.

These Raw(...).Scan(...) calls drop their errors. If the role lookup fails, roleName stays empty and both owner checks are skipped, so the delete proceeds without last-owner protection.

Suggested fix
 				var roleName string
 				g := (*h.sessionFactory).New(ctx)
-				g.Raw("SELECT name FROM roles WHERE id = ? AND deleted_at IS NULL", binding.RoleId).Scan(&roleName)
+				if err := g.Raw("SELECT name FROM roles WHERE id = ? AND deleted_at IS NULL", binding.RoleId).Scan(&roleName).Error; err != nil {
+					return nil, errors.GeneralError("failed to resolve binding role: %v", err)
+				}

 				if roleName == pkgrbac.RoleProjectOwner && binding.ProjectId != nil {
 					var count int64
-					g.Raw(`SELECT COUNT(*) FROM role_bindings
+					if err := g.Raw(`SELECT COUNT(*) FROM role_bindings
 						   WHERE role_id = ? AND project_id = ? AND deleted_at IS NULL`,
-						binding.RoleId, *binding.ProjectId).Scan(&count)
+						binding.RoleId, *binding.ProjectId).Scan(&count).Error; err != nil {
+						return nil, errors.GeneralError("failed to count remaining project owners: %v", err)
+					}
 					if count <= 1 {
 						return nil, errors.New(errors.ErrorConflict, "cannot delete the last owner binding")
 					}
 				}
 				if roleName == pkgrbac.RoleCredentialOwner && binding.CredentialId != nil {
 					var count int64
-					g.Raw(`SELECT COUNT(*) FROM role_bindings
+					if err := g.Raw(`SELECT COUNT(*) FROM role_bindings
 						   WHERE role_id = ? AND credential_id = ? AND deleted_at IS NULL`,
-						binding.RoleId, *binding.CredentialId).Scan(&count)
+						binding.RoleId, *binding.CredentialId).Scan(&count).Error; err != nil {
+						return nil, errors.GeneralError("failed to count remaining credential owners: %v", err)
+					}
 					if count <= 1 {
 						return nil, errors.New(errors.ErrorConflict, "cannot delete the last owner binding")
 					}
 				}

Based on learnings "Never silently swallow partial failures; every error path must propagate or be explicitly collected, never discarded" and as per coding guidelines "Handle errors and edge cases explicitly rather than ignoring them."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/plugins/roleBindings/handler.go` around lines
360 - 377, The delete-side last-owner protection currently swallows DB errors
from the g.Raw(...).Scan(...) calls so roleName can be empty and owner checks
are skipped; update the code around g := (*h.sessionFactory).New(ctx) and the
Raw(...).Scan(...) invocations (the role name lookup and both COUNT queries) to
capture and check the returned error values, and if any Scan returns a non-nil
error return that error (or wrap it with a contextual message) instead of
ignoring it so deletions fail-closed on DB errors rather than proceeding.

Sources: Coding guidelines, Learnings


180-192: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

PATCH can orphan a project or credential by downgrading the sole owner binding.

This path only checks whether the caller can grant the new role. It never preserves the “at least one owner remains” invariant that Delete enforces, so the last project:owner/credential:owner can be changed to editor/viewer and leave the scope with no owner at all.

Suggested guard
 				// Prevent changing role_id to a role the caller cannot grant.
 				if patch.RoleId != nil && *patch.RoleId != found.RoleId {
+					var currentRoleName string
+					if err := g.Raw("SELECT name FROM roles WHERE id = ? AND deleted_at IS NULL", found.RoleId).Scan(&currentRoleName).Error; err != nil {
+						return nil, errors.GeneralError("failed to resolve current role: %v", err)
+					}
+
 					var targetRoleName string
 					if dbErr := g.Raw("SELECT name FROM roles WHERE id = ? AND deleted_at IS NULL", *patch.RoleId).Scan(&targetRoleName).Error; dbErr != nil || targetRoleName == "" {
 						return nil, errors.Forbidden("target role not found")
 					}
+
+					if currentRoleName == pkgrbac.RoleProjectOwner && targetRoleName != pkgrbac.RoleProjectOwner && found.ProjectId != nil {
+						var remaining int64
+						if err := g.Raw(`SELECT COUNT(*) FROM role_bindings
+							WHERE role_id = ? AND project_id = ? AND id <> ? AND deleted_at IS NULL`,
+							found.RoleId, *found.ProjectId, found.Id).Scan(&remaining).Error; err != nil {
+							return nil, errors.GeneralError("failed to count remaining project owners: %v", err)
+						}
+						if remaining == 0 {
+							return nil, errors.New(errors.ErrorConflict, "cannot remove the last owner binding")
+						}
+					}
+					if currentRoleName == pkgrbac.RoleCredentialOwner && targetRoleName != pkgrbac.RoleCredentialOwner && found.CredentialId != nil {
+						var remaining int64
+						if err := g.Raw(`SELECT COUNT(*) FROM role_bindings
+							WHERE role_id = ? AND credential_id = ? AND id <> ? AND deleted_at IS NULL`,
+							found.RoleId, *found.CredentialId, found.Id).Scan(&remaining).Error; err != nil {
+							return nil, errors.GeneralError("failed to count remaining credential owners: %v", err)
+						}
+						if remaining == 0 {
+							return nil, errors.New(errors.ErrorConflict, "cannot remove the last owner binding")
+						}
+					}
+
 					if pkgrbac.InternalRoles[targetRoleName] {
 						return nil, errors.Forbidden("cannot assign internal role")
 					}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/plugins/roleBindings/handler.go` around lines
180 - 192, The PATCH currently allows changing the last owner to a lesser role;
add a guard after the target role lookup to prevent orphaning: determine whether
the binding's scope is a project or credential (inspect found.ProjectId /
found.CredentialId), detect the owner role name for that scope (e.g.
"project:owner" or "credential:owner"), and if the new targetRoleName is not
that owner role then query the DB (using g) to count active role_bindings in the
same scope with that owner role excluding the current binding (use found.ID) and
if the count is 0 then return errors.Forbidden("cannot remove sole owner"); keep
the existing InternalRoles and pkgrbac.CanGrant checks otherwise.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/ambient-api-server/test/e2e/rbac_e2e_test.sh`:
- Around line 1138-1143: The e2e shell test is insufficient because Create in
components/ambient-api-server/plugins/roleBindings/handler.go returns 403 both
when escalation checks run and when h.sessionFactory == nil; add a targeted Go
unit/integration test that constructs the role bindings handler/plugin with a
nil sessionFactory and calls the Create handler (or its underlying method)
directly to assert the fail-closed behavior (HTTP 403 or equivalent error) when
sessionFactory is nil; alternatively, add a test-only env knob that prevents
wiring of sessionFactory and assert the handler still denies creation—reference
the Create method and h.sessionFactory in roleBindings/handler.go when adding
the test.

---

Outside diff comments:
In `@components/ambient-api-server/plugins/roleBindings/handler.go`:
- Around line 360-377: The delete-side last-owner protection currently swallows
DB errors from the g.Raw(...).Scan(...) calls so roleName can be empty and owner
checks are skipped; update the code around g := (*h.sessionFactory).New(ctx) and
the Raw(...).Scan(...) invocations (the role name lookup and both COUNT queries)
to capture and check the returned error values, and if any Scan returns a
non-nil error return that error (or wrap it with a contextual message) instead
of ignoring it so deletions fail-closed on DB errors rather than proceeding.
- Around line 180-192: The PATCH currently allows changing the last owner to a
lesser role; add a guard after the target role lookup to prevent orphaning:
determine whether the binding's scope is a project or credential (inspect
found.ProjectId / found.CredentialId), detect the owner role name for that scope
(e.g. "project:owner" or "credential:owner"), and if the new targetRoleName is
not that owner role then query the DB (using g) to count active role_bindings in
the same scope with that owner role excluding the current binding (use found.ID)
and if the count is 0 then return errors.Forbidden("cannot remove sole owner");
keep the existing InternalRoles and pkgrbac.CanGrant checks otherwise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: abeecdc8-d6aa-4310-8171-7c9c68ccac2b

📥 Commits

Reviewing files that changed from the base of the PR and between 33977a3 and 348eb4a.

📒 Files selected for processing (3)
  • components/ambient-api-server/plugins/roleBindings/handler.go
  • components/ambient-api-server/test/e2e/rbac_e2e_test.sh
  • components/ambient-api-server/test/integration/rbac_test.go
💤 Files with no reviewable changes (1)
  • components/ambient-api-server/test/integration/rbac_test.go

Comment thread components/ambient-api-server/test/e2e/rbac_e2e_test.sh
jsell-rh and others added 7 commits June 6, 2026 11:16
ProjectId was dropped from Credential schema in a prior migration.
The regenerated OpenAPI model no longer has this field. Also fixes
ensureBuiltInRoles to check seed errors (CodeRabbit finding).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Swallowed DB errors → 503: middleware now returns Service Unavailable
   when AuthorizedProjectIDs/CredentialIDs DB queries fail instead of
   silently proceeding with empty auth context.
2. Unchecked SQL errors in roleBindings handler: all .Scan()/.Count()
   calls now check .Error and return 500 on failure.
3. Silent role seed failures: ensureBuiltInRoles checks Exec.Error and
   calls t.Fatalf.
4. Redundant migration: 202606050004 now reads current permissions and
   appends only new ones. Rollback removes only what it added.
5. Double/triple fail() in e2e: collapsed to single fail per case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The e2e test now manages its own infrastructure in Phase 0.5:
- Fetches Keycloak JWKS and updates API server auth ConfigMap
- Patches API server with --enable-jwt=true --enable-authz=true
- Sets AMBIENT_ENV=production (dev env overrides JWT to false)
- Configures control plane with OIDC client_credentials (not static
  token, which fails JWT validation)
- Waits for rollouts, re-establishes port-forward, smoke-tests auth
- Sessions can now reach Running phase for sub-resource tests

Root cause: the static AMBIENT_API_TOKEN is not a JWT. When JWT
validation is enabled, the upstream framework rejects it before
the pre-auth service-token interceptor can bypass. Fix: use OIDC
client_credentials for the control plane instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the security council's Approach D: service callers now
go through normal RBAC evaluation via a platform:admin RoleBinding
instead of blanket bypass. Consistent across HTTP and gRPC.

Changes:
- HTTP: detect service account from JWT username (same pattern as
  gRPC interceptor), auto-provision platform:admin binding on first
  request
- RBAC middleware: service callers with OIDC identity get AuthResult
  populated from evaluator (full audit trail, scoped access)
- Legacy AMBIENT_API_TOKEN bypass preserved for pre-OIDC deployments
- gRPC handlers: accept IsGlobalAdmin as alternative to IsServiceCaller
- Moved isServiceAccount/keycloakServiceAccountPrefix to caller_context
  (shared between HTTP and gRPC)
- All DB operations use GORM query builder, no raw SQL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add --max-time 15 to all curl calls (prevents SSE streaming hang)
- Stop and delete test session after Phase 15
- Add session/message cleanup to clean_db
- Reorder Phase 0.5: API server verified healthy BEFORE CP restart
  (fixes CP initial sync timeout — sessions now reach Running in 4s)
- Message POST: 500 is infrastructure, not RBAC — don't fail on it

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CRITICAL/HIGH fixes:
- isProjectAuthorized: nil authResult now returns false (was true —
  granted full access to unauthenticated gRPC callers)
- WatchSessions: delete events suppressed for non-privileged watchers
  (prevented session ID leakage across project boundaries)
- Legacy AMBIENT_API_TOKEN bypass: sets AuthResult with IsGlobalAdmin
  instead of leaving nil (downstream handlers no longer see nil=allow)

Remaining findings tracked:
- F1 (gRPC unary RBAC interceptor) — needs upstream framework support
- F4 (POST /role_bindings auth-exempt) — handler has escalation checks
- F6 (unverified JWT for auto-provision) — needs interceptor ordering audit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jsell-rh and others added 22 commits June 6, 2026 20:04
Use GetAuthPayloadFromContext (verified JWT) instead of
GetUsernameFromContext (may be set from unverified pre-auth parse)
for HTTP service caller detection. Prevents forged JWTs from
triggering auto-provisioning of platform:admin bindings.

Closes security council Finding F6.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Session was being stopped/deleted at end of Phase 15, but Phase 19
tests sub-resource access on the same session. Move cleanup to after
Phase 19 completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kubectl set env is a no-op when values are unchanged, so the CP
isn't restarted and its gRPC watch streams stay dead from a
previous session. Add explicit rollout restart to ensure fresh
watch connections on every test run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ction

bearer_token.go's init() ran before caller_context.go's init() (Go
sorts by filename), so GRPC_SERVICE_ACCOUNT was always empty when the
pre-auth gRPC interceptors were registered — causing them to never
register at all. Without these interceptors, the control plane's OIDC
token was never tagged as CallerTypeService, and WatchSessions silently
filtered out all session events. Sessions stayed in Pending forever.

Move interceptor registration into caller_context.go's init() so it
runs after configuredServiceAccount is set.

Also harden the e2e test:
- api() tolerates curl failures (|| true) to survive port-forward drops
- assert_not_rbac_blocked distinguishes RBAC 404s from runner 404s
- runner-proxy tests wait for the runner's HTTP server to bind

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RBAC e2e test runs ~150 curl calls over several minutes. kubectl
port-forward drops intermittently under sustained load, causing 000
status codes that crash the script under set -e. Add auto-reconnect
logic to api() that detects connection failures and re-establishes the
port-forward before retrying.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The readiness check used curl -sf which requires HTTP 200. With JWT
auth enabled in production mode, /healthcheck returns 401 for
unauthenticated requests, so the check always failed. The test
proceeded after the 10-second timeout with a WARNING, but the
port-forward may not have been fully established — or stale processes
from previous runs were still holding the port.

Check for any HTTP response (status != 000) instead of requiring 200.
Any HTTP status proves the TCP connection through the port-forward is
alive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three causes of test flakiness:

1. Race condition: the test created sessions before the CP's gRPC
   session watch stream was established (~10s after pod Ready). The
   creation event was missed and sessions stayed Pending forever.
   Fix: poll CP logs for "session watch stream established" before
   proceeding.

2. Transient 500s from intermittent DB connectivity caused hard
   failures. Fix: api() retries up to 3 times on 000/500/502/503
   with backoff.

3. KUBE_CONTEXT not propagated: all kubectl calls used the global
   context, which could be a different cluster. Fix: add a kubectl
   wrapper that injects --context from KUBE_CONTEXT env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CI workflow assumed Keycloak was reachable via NodePort 30090, but
the Kind cluster has no extraPortMappings for that port. Use a
port-forward instead (matching the pattern for the API server).

Also:
- Cleanup trap uses set +e so Keycloak being unreachable doesn't crash
  the exit handler and mask the real error
- get_admin_token returns 1 instead of exit 1 so the cleanup trap can
  call it safely
- Added GRPC_SERVICE_ACCOUNT to CI deployment env

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Red phase: project:owner (level 1) can delete a platform:admin (level 0)
binding from their project. The Delete handler checks last-owner
protection but has no hierarchy check. Expected 403, got 204.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
F2 — gRPC List handlers leaked cross-tenant data:
  ListSessions, ListProjects, ListProjectSettings now call
  ApplyListFilter before querying, matching the HTTP handlers.
  Non-privileged callers without AuthResult get an empty list.

F3 — gRPC Watch handlers sent all events to any caller:
  WatchProjects and WatchProjectSettings now check IsServiceCaller
  and IsProjectAuthorized before sending events. WatchUsers restricts
  to privileged callers only (users are not project-scoped).
  Moved isProjectAuthorized to pkg/rbac.IsProjectAuthorized (shared).

F5 — roleBindings Delete lacked hierarchy check:
  A project:owner could delete a platform:admin's binding. Added
  CanGrant check mirroring the Create handler: caller must outrank
  the binding's role to delete it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CI workflow was pulling the api-server image from
quay.io/ambient_code/vteam_api_server:latest (main branch) instead of
building from the PR's code. This meant the RBAC e2e tests ran against
the old api-server without any RBAC enforcement code.

Add ambient-api-server to the CI pipeline:
- Change detection for components/ambient-api-server/**
- Build step using docker/build-push-action
- Pull-if-unchanged from quay.io
- Load into kind cluster via docker save
- Pass DEFAULT_API_SERVER_IMAGE to deploy.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CI workflow's inline setup was broken:
- sed to enable JWT matched nothing (already --enable-jwt=true in manifest)
- AMBIENT_ENV=production clears jwk-cert-file, breaking JWKS loading
- Duplicated the test script's Phase 0.5 logic incorrectly

Remove the inline setup and let the test script handle everything.
Phase 0.5 already does: ConfigMap update, secret patch, deployment
env vars, command args, rollout, port-forward re-establishment, and
JWT verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OverrideConfig unconditionally set c.Auth.JwkCertFile = "", which
discarded the --jwk-cert-file flag passed on the command line. In CI,
the test script mounts Keycloak JWKS into a ConfigMap and passes
--jwk-cert-file to use it, but the production env override threw it
away, forcing a URL-based fallback that fails when the internal
Keycloak URL isn't reachable during startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Temporary diagnostic to trace why GET /roles returns 403 in CI.
Logs method, path, isAuthExempt, enableAuthz, and isServiceCaller
on every request through the RBAC middleware.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The detect-changes conditional was skipping the api-server build,
causing CI to use the old upstream image from quay.io. The RBAC e2e
tests require the branch's middleware code (isAuthExempt, init
ordering fix, etc.) — always build from source.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant