Skip to content

fix(control-plane): authorize execution note writes(#420)#575

Merged
AbirAbbas merged 5 commits into
Agent-Field:mainfrom
Luffy2208:fix/420-execution-notes-authorization
Jun 1, 2026
Merged

fix(control-plane): authorize execution note writes(#420)#575
AbirAbbas merged 5 commits into
Agent-Field:mainfrom
Luffy2208:fix/420-execution-notes-authorization

Conversation

@Luffy2208
Copy link
Copy Markdown
Contributor

Summary

Fixes an IDOR in the execution notes write endpoint by enforcing execution ownership before appending a note.

File-specific changes:

  • control-plane/internal/handlers/execution_notes.go

    • Resolves the caller agent identity before writing a note.
    • Compares the caller agent ID with the execution owner, execution.AgentNodeID.
    • Returns 403 execution_ownership_mismatch when the caller does not own the execution.
    • Supports DID-authenticated callers by resolving the verified caller DID to an agent ID.
  • control-plane/internal/server/middleware/auth.go

    • Stores API-key caller identity in Gin context using CallerAgentIDKey.
    • Uses X-Caller-Agent-ID first, with X-Agent-Node-ID as fallback.
  • control-plane/internal/handlers/execution_notes_test.go

    • Adds coverage for owner write success.
    • Adds coverage for non-owner API-key write returning 403.
    • Adds coverage for DID-authenticated owner and non-owner behavior.
  • control-plane/internal/server/middleware/auth_test.go

    • Adds coverage that API-key auth populates caller identity in Gin context.
  • control-plane/internal/handlers/coverage_handlers_90_test.go

    • Updates the existing successful note-write coverage test to include matching caller identity.

Testing

  • ./scripts/test-all.sh
  • Additional verification:
    • cd control-plane && go test ./internal/handlers ./internal/server/middleware
    • cd control-plane && go test ./internal/handlers ./internal/server/middleware -coverprofile=/tmp/issue-420.coverprofile
    • cd control-plane && golangci-lint run --new-from-rev=upstream/main ./internal/handlers ./internal/server/middleware
    • Manual E2E curl verification:
      • Agent A adding a note to Agent A’s execution returns 200 OK
      • Agent A adding a note to Agent B’s execution returns 403 Forbidden
      • Agent B’s fresh execution remains with notes: [] after the blocked write

Note: full repo lint currently reports pre-existing unrelated Go lint issues outside this PR’s changed files. Changed-line lint for the touched packages reports 0 issues.

Checklist

  • I updated documentation where applicable.
  • I added or updated tests (or none were needed).
  • I updated CHANGELOG.md (or this change does not warrant a changelog entry).

Screenshots (if UI-related)

Not UI-related.

Related issues

Fixes #420

@Luffy2208 Luffy2208 requested review from a team and AbirAbbas as code owners May 19, 2026 03:44
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 86%, aggregate ≥ 88%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 87.40% 87.30% ↑ +0.10 pp 🟡
sdk-go 92.00% 90.70% ↑ +1.30 pp 🟢
sdk-python 93.73% 93.63% ↑ +0.10 pp 🟢
sdk-typescript 92.80% 92.56% ↑ +0.24 pp 🟢
web-ui 89.93% 90.01% ↓ -0.08 pp 🟡
aggregate 89.00% 89.01% ↓ -0.01 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 87 100.00%
sdk-go 0 ➖ no changes
sdk-python 0 ➖ no changes
sdk-typescript 0 ➖ no changes
web-ui 0 ➖ no changes

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

@Luffy2208 Luffy2208 force-pushed the fix/420-execution-notes-authorization branch from b9fb5f5 to 72f59ee Compare May 19, 2026 04:01
@Luffy2208
Copy link
Copy Markdown
Contributor Author

I fixed the patch coverage failure by adding targeted tests in control-plane/internal/handlers/execution_notes_test.go.

What was added:

  • Tests for missing execution owner returning 403
  • Tests for missing caller identity returning 403
  • Tests for DID resolver failure returning 500
  • Tests for unresolved DID fail-closed behavior returning 403
  • Tests for caller identity resolution from:
    • Gin context CallerAgentIDKey
    • X-Caller-Agent-ID
    • X-Agent-Node-ID
    • DID list fallback
    • DID lookup error fallback
  • Direct coverage for executionNoteAuthorizationError.Error()

Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 PR-AF Review — Needs Major Rework

Automated multi-agent code review · PR-AF built with AgentField

10 findings · 🔴 3 critical · 🟠 4 important · 🔵 0 suggestions · ⚪ 0 nitpicks

PR Overview

Summary

Fixes an IDOR in the execution notes write endpoint by enforcing execution ownership before appending a note.

File-specific changes:

  • control-plane/internal/handlers/execution_notes.go

    • Resolves the caller agent identity before writing a note.
    • Compares the caller agent ID with the execution owner, execution.AgentNodeID.
    • Returns 403 execution_ownership_mismatch when the caller does not own the execution.
    • Supports DID-authenticated callers by resolving the verified caller DID to an agent ID.
  • control-plane/internal/server/middleware/auth.go

    • Stores API-key caller identity in Gin context using CallerAgentIDKey.
    • Uses X-Caller-Agent-ID first, with X-Agent-Node-ID as fallback.
  • control-plane/internal/handlers/execution_notes_test.go

    • Adds coverage for owner write success.
    • Adds coverage for non-owner API-key write returning 403.
    • Adds coverage for DID-authenticated owner and non-owner behavior.
  • control-plane/internal/server/middleware/auth_test.go

    • Adds coverage that API-key auth populates caller identity in Gin context.
  • control-plane/internal/handlers/coverage_handlers_90_test.go

    • Updates the existing successful note-write coverage test to include matching caller identity.

Testing

  • ./scripts/test-all.sh
  • Additional verification:
    • cd control-plane && go test ./internal/handlers ./internal/server/middleware
    • cd control-plane && go test ./internal/handlers ./internal/server/middleware -coverprofile=/tmp/issue-420.coverprofile
    • cd control-plane && golangci-lint run --new-from-rev=upstream/main ./internal/handlers ./internal/server/middleware
    • Manual E2E curl verification:
      • Agent A adding a note to Agent A’s execution returns 200 OK
      • Agent A adding a note to Agent B’s execution returns 403 Forbidden
      • Agent B’s fresh execution remains with notes: [] after the blocked write

Note: full repo lint currently reports pre-existing unrelated Go lint issues outside this PR’s changed files. Changed-line lint for the touched packages reports 0 issues.

Checklist

  • I updated documentation where applicable.
  • I added or updated tests (or none were needed).
  • I updated CHANGELOG.md (or this change does not warrant a changelog entry).

Screenshots (if UI-related)

Not UI-related.

Related issues

Fixes #420

Key Findings

7 issue(s) should be addressed before merge:

  • 🔴 Complete ownership enforcement bypass when APIKey is empty and DID auth is off — the default configuration (control-plane/internal/server/middleware/auth.go:24) — When AuthConfig.APIKey is empty (the default in ALL deployment configurations: config/agentfield.yaml, Docker Compose at deployments/docker/docker-compose.yml, and Helm at `deployments/helm/agen…
  • 🔴 Three-tier identity fallback in execution notes handler has no fail-closed mechanism: raw-header tier silently becomes primary identity source when all upstream auth middleware is configuration-disabled (control-plane/internal/handlers/execution_notes.go:184) — The execution notes handler's executionNoteCallerAgentID implements a three-tier identity resolution cascade: (1) verified DID from DIDAuthMiddleware, (2) CallerAgentIDKey context value from APIKeyAut…
  • 🔴 Fixing the default authentication bypass by enabling DID auth silently activates diverging DID resolution paths — a 'fix F4, expose F1' trap (control-plane/internal/handlers/execution_notes.go:184) — F4 establishes that under the default configuration (APIKey empty AND did_auth_enabled false), the execution notes handler accepts unvalidated raw headers as caller identity — a complete ownership enf…
  • 🟠 GetExecutionNotesHandler leaks execution notes to any authenticated caller with no ownership enforcement (control-plane/internal/handlers/execution_notes.go:235) — The PR fixes an IDOR on the write path (AddExecutionNoteHandler) by enforcing execution ownership, but the read path (GetExecutionNotesHandler) remains completely open: any API-key-authenticated calle…
  • 🟠 DID auth middleware provides no defense against no-auth bypass — attacker can simply omit DID headers (control-plane/internal/server/middleware/did_auth.go:177) — The assignment question asks: "Does the DIDAuthMiddleware at routes_middleware.go:77-88 provide any defense when API key auth is off?" No, not in any meaningful way. If `DID.Enabled && DIDAuthEna…
  • 🟠 Two-tier DID resolution reads semantically-different field names (AgentID vs AgentNodeID) from independent tables with no structural equivalence guarantee (control-plane/internal/handlers/execution_notes.go:206) — The function resolveExecutionNoteAgentIDByDID at line 206 resolves a caller DID to an agent identifier through two independent code paths that read *differently-named columns from different tables
  • 🟠 Type-unsafe CallerAgentIDKey enables silent reversion to attacker-controlled raw-header identity when any middleware writes a non-string value — turning a compile-time type error into a runtime authentication bypass (control-plane/internal/handlers/execution_notes.go:189) — The combination of F2 (CallerAgentIDKey accepts any value type via Gin's c.Set(key string, value any)) and F4 (executionNoteCallerAgentID falls through to raw X-Caller-Agent-ID / X-Agent-Node-ID hea…

Files with findings: control-plane/internal/handlers/execution_notes.go, control-plane/internal/server/middleware/auth.go, control-plane/internal/server/middleware/did_auth.go

All Findings by Severity

🔴 Critical (3)

  • Complete ownership enforcement bypass when APIKey is empty and DID auth is off — the default configuration control-plane/internal/server/middleware/auth.go:24
  • Three-tier identity fallback in execution notes handler has no fail-closed mechanism: raw-header tier silently becomes primary identity source when all upstream auth middleware is configuration-disabled control-plane/internal/handlers/execution_notes.go:184
  • Fixing the default authentication bypass by enabling DID auth silently activates diverging DID resolution paths — a 'fix F4, expose F1' trap control-plane/internal/handlers/execution_notes.go:184

🟠 Important (4)

  • GetExecutionNotesHandler leaks execution notes to any authenticated caller with no ownership enforcement control-plane/internal/handlers/execution_notes.go:235
  • DID auth middleware provides no defense against no-auth bypass — attacker can simply omit DID headers control-plane/internal/server/middleware/did_auth.go:177
  • Two-tier DID resolution reads semantically-different field names (AgentID vs AgentNodeID) from independent tables with no structural equivalence guarantee control-plane/internal/handlers/execution_notes.go:206
  • Type-unsafe CallerAgentIDKey enables silent reversion to attacker-controlled raw-header identity when any middleware writes a non-string value — turning a compile-time type error into a runtime authentication bypass control-plane/internal/handlers/execution_notes.go:189
Review Process Details

Dimensions Analyzed (15):

  • No-Auth Mode Identity Spoofing Bypass — 4 file(s)
  • CallerAgentIDKey Context Semantics Collision — 3 file(s)
  • GET/POST Execution Notes Authorization Asymmetry — 2 file(s)
  • DID Resolution Silent Degradation — Error vs Not-Found Conflation — 3 file(s)
  • Context.Background() in GET Handler Bypasses Request Timeout — 2 file(s)
  • DID resolution field-name divergence: AgentID vs AgentNodeID — 3 file(s)
  • CallerAgentIDKey context type contract: non-string write → silent fallback — 3 file(s)
  • No-auth middleware bypass: mechanical trace from empty APIKey to unauthenticated header reads — 4 file(s)
  • Storage error propagation contract: errors.As dependency on UpdateExecutionRecord fidelity — 2 file(s)
  • Context.Background() drift in GET handler: deadline propagation gap vs POST handler — 2 file(s)
  • No-auth bypass of execution ownership enforcement via raw header fallback — 4 file(s)
  • Triplicated caller resolution logic with diverged priority chains sharing one context namespace — 3 file(s)
  • Auth error classification breaks if production storage provider wraps or replaces closure errors — 2 file(s)
  • APIKeyAuth global broadcast of CallerAgentIDKey changes semantics for ALL authenticated routes — 2 file(s)
  • DID-to-agent ID resolution produces different identifiers depending on resolution path (AgentID vs AgentNodeID field mismatch) — 3 file(s)

Meta-Dimension Lenses (3):

  • Semantic — 5 dimension(s), 92% coverage confidence
  • Mechanical — 5 dimension(s), 92% coverage confidence
  • Systemic — 5 dimension(s), 92% coverage confidence

Cross-Reference & Adversary Analysis:

  • 6 compound finding(s) synthesized
  • 12 finding(s) adversarially tested: 12 confirmed, 0 challenged
Pipeline Stats
Metric Value
Duration 4643.7s
Agent invocations 65
Coverage iterations 0
Estimated cost N/A (provider does not report cost)
Budget exhausted No
PR type bugfix
Complexity standard

Review ID: rev_b6e41625c18a


Comment thread control-plane/internal/handlers/execution_notes.go
Comment thread control-plane/internal/handlers/execution_notes.go
Comment thread control-plane/internal/handlers/execution_notes.go
Comment thread control-plane/internal/handlers/execution_notes.go
Comment thread control-plane/internal/handlers/execution_notes.go
Comment thread control-plane/internal/handlers/execution_notes.go Outdated
Comment thread control-plane/internal/handlers/execution_notes.go
@santoshkumarradha
Copy link
Copy Markdown
Member

@Luffy2208, just a heads-up that we're currently evaluating the review quality of https://github.com/Agent-Field/pr-af. If you notice any of these automated findings are off-base, noisy, or unhelpful, please let us know! Your feedback would be super helpful.

AbirAbbas and others added 2 commits June 1, 2026 09:59
Closes the no-auth IDOR bypass flagged in PR Agent-Field#575 review. The execution
note write handler resolved caller identity from raw X-Caller-Agent-ID /
X-Agent-Node-ID request headers as a last resort. Under the default config
(no API key, DID auth off) that header read was the *sole* identity source,
so any caller could spoof ownership and append notes to another agent's
execution.

- executionNoteCallerAgentID now trusts only a verified DID (DIDAuthMiddleware)
  or the authenticated middleware context populated by APIKeyAuth after a
  successful key check. The raw-header fallback is removed.
- AddExecutionNoteHandler takes an ownershipEnforced flag. Ownership is
  enforced whenever an auth method is active; when the server is fully
  unauthenticated there is no trustworthy identity, so the guard is skipped
  (app.note() keeps working in local/dev) and a startup warning is logged.
- noteOwnershipEnforced() derives the flag from API-key / DID-auth config and
  drives the warning in applyGlobalMiddleware.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…olver

Defense-in-depth for the DID->agent resolution used by execution-note
ownership (PR Agent-Field#575 review). The resolver accepted any non-error DID document
and any matching agent_dids row regardless of revocation status, so a revoked
did:key whose registry entry was marked revoked (auth still passes, it is
self-verifying) could resolve to an agent identity.

- Skip DIDDocumentRecord results where IsRevoked() is true.
- Skip AgentDIDInfo entries with status "revoked".
- Document that DIDDocumentRecord.AgentID and AgentDIDInfo.AgentNodeID are the
  same value under differently-named fields, kept equivalent at registration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AbirAbbas
Copy link
Copy Markdown
Contributor

merging this pr to avoid it getting stale, pushed up the fixes, just waiting on CI

@AbirAbbas AbirAbbas enabled auto-merge June 1, 2026 14:53
@AbirAbbas AbirAbbas added this pull request to the merge queue Jun 1, 2026
Merged via the queue into Agent-Field:main with commit 4df64a1 Jun 1, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] Execution notes write endpoint has no authorization (IDOR)

3 participants