Skip to content

fix: block revoked-tag agent calls + friendly empty logs state (TC-034, TC-035)#373

Merged
santoshkumarradha merged 5 commits intomainfrom
fix/qa-tc034-tc035
Apr 9, 2026
Merged

fix: block revoked-tag agent calls + friendly empty logs state (TC-034, TC-035)#373
santoshkumarradha merged 5 commits intomainfrom
fix/qa-tc034-tc035

Conversation

@AbirAbbas
Copy link
Copy Markdown
Contributor

Summary

Fixes two QA findings from PR #330 review:

  • TC-034 (Critical/Security): The direct execute API path (prepareExecution in execute.go) did not check lifecycle_status before dispatching agent calls. Agents with revoked tags (pending_approval status) were still callable via this path, despite being correctly blocked on the reasoner and DID resolution paths. Added a lifecycle_status == pending_approval check that returns 503 Service Unavailable with agent_pending_approval error, consistent with ExecuteReasonerHandler and ExecuteSkillHandler.

  • TC-035 (Low/UX): The Process Logs panel (NodeProcessLogsPanel) showed raw API error text ("node has no base_url", "404") when a newly registered agent had never executed a workflow. These expected empty-state errors now fall through to the existing friendly empty state message ("No log lines yet...") instead of rendering a destructive error alert.

Test plan

  • Go control plane builds cleanly (go build ./...)
  • TypeScript compiles cleanly (tsc --noEmit)
  • Existing handler tests pass (go test ./internal/handlers/)
  • NodeProcessLogsPanel.coverage.test.tsx passes
  • Manual: revoke tags on an agent, call via /api/v1/agents/{id}/execute → expect 503
  • Manual: register new agent with no workflows, open Process Logs panel → expect friendly empty state

🤖 Generated with Claude Code

…ty logs state

TC-034: The direct execute API path (prepareExecution) did not check
agent lifecycle_status before dispatching calls. Agents with revoked
tags (status=pending_approval) were still callable via this path,
despite being correctly blocked on the reasoner/DID paths. Added a
lifecycle_status check that returns 503 Service Unavailable, consistent
with the existing check in ExecuteReasonerHandler.

TC-035: When a newly registered agent has no base_url or returns 404
for logs, the Process Logs panel showed raw error text. Now these
expected "no logs yet" scenarios fall through to the friendly empty
state message instead of rendering a destructive error alert.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@AbirAbbas AbirAbbas requested a review from a team as a code owner April 9, 2026 17:17
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 86%, aggregate ≥ 88%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 87.30% 87.30% → +0.00 pp 🟡
sdk-go 90.70% 90.70% → +0.00 pp 🟢
sdk-python 93.63% 93.63% ↑ +0.00 pp 🟢
sdk-typescript 92.56% 92.56% → +0.00 pp 🟢
web-ui 90.02% 90.01% ↑ +0.01 pp 🟢
aggregate 89.02% 89.01% ↑ +0.01 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 21 100.00%
sdk-go 0 ➖ no changes
sdk-python 0 ➖ no changes
sdk-typescript 0 ➖ no changes
web-ui 29 100.00%

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

AbirAbbas and others added 3 commits April 9, 2026 13:54
Adds TestExecuteHandler_PendingApprovalAgent to verify that agents with
lifecycle_status=pending_approval return 503 Service Unavailable when
called via the direct execute API path. Satisfies patch coverage gate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…NodeLogsError (TC-034, TC-035)

Addresses deep-review findings on PR #373.

TC-034 (Go / security response contract):
- Extend executionPreconditionError with an optional errorCode field
  so preconditions can carry a stable machine-readable code.
- writeExecutionError now promotes errorCode to top-level `error` and
  moves the human text to `message` when set, matching the contract
  already used by reasoners.go / skills.go / permission middleware
  ({"error": "agent_pending_approval", "message": "..."}).
- prepareExecution sets errorCode: "agent_pending_approval" on the
  revoked-tag 503. Async execute inherits the fix for free since both
  paths share prepareExecution.
- Without this, clients pattern-matching on
  error == "agent_pending_approval" silently regressed on the direct
  execute path — they would see only a generic agent_error message.
- Adds a test asserting ErrorAs + 503 + ErrorCode + the wire-level
  response shape, and that no execution record is persisted before
  the guard fires. Closes the 0% patch-coverage gap CI was flagging
  for execute.go:1045-1051.

TC-035 (UI / fragile string matching):
- Promote node-logs fetch errors to a typed NodeLogsError class
  carrying .status and the stable .code from the response body.
- NodeProcessLogsPanel branches on (status === 404 ||
  code === "agent_unreachable") instead of regex-matching the human
  message — robust to backend phrasing changes and avoids swallowing
  unrelated errors that happen to contain "404".
- Logs swallowed empty-state events via console.debug in dev so
  developers still see them in devtools.
- Adds two panel tests covering the 404 and agent_unreachable
  branches; verifies they render the friendly empty state and that
  the generic-error branch still shows the destructive alert.

Tested: go test ./internal/handlers/ (all pass); web panel vitest
(4 tests pass); tsc --noEmit clean; eslint clean on touched files.
Upstream's TestExecuteHandler_PendingApprovalAgent asserted that the
human-readable text "awaiting tag approval" appeared in the top-level
`error` field. That shape is no longer correct under the stable-code
contract: `error` now carries the machine code
("agent_pending_approval") and the human text moves to `message` —
matching reasoners.go / skills.go / permission middleware.

Update the assertion to verify both fields of the new contract.
@santoshkumarradha
Copy link
Copy Markdown
Member

Follow-up — sibling bypass filed as #374

During the deep review of this PR I flagged a pre-existing sibling bypass at the serverless re-registration path: RegisterServerlessAgentHandler (control-plane/internal/handlers/nodes.go:1410-1445) unconditionally sets LifecycleStatus: AgentStatusReady and calls storageProvider.RegisterAgent directly, bypassing the admin-revocation preservation logic used by the standard registration path at nodes.go:540-556.

Impact: an admin-revoked serverless agent (lifecycle pending_approval, ApprovedTags cleared) can clear its own revocation by calling POST /api/v1/nodes/register-serverless again. This PR closes the call-time gap (TC-034) — #374 is needed to close the registration-time gap so the revocation mechanism can't be trivially reset.

Same threat surface as TC-034, same severity classification. Tracking separately to keep this PR focused on TC-034 / TC-035. Fix PR to follow shortly and will be linked here.

cc @AbirAbbas

…orks

NodeProcessLogsPanel now does `e instanceof NodeLogsError` in its error
branch (to distinguish 404 / agent_unreachable empty states from real
errors). The existing NodeProcessLogsPanel.test.tsx mock for
@/services/api did not export NodeLogsError, so the reference was
undefined at runtime and `instanceof` threw a TypeError. That TypeError
was caught inside the component's error handling and prevented the
destructive-alert render path from running — failing the
"shows a destructive alert when the tail request fails" test even
though the fallback logic itself was correct.

Mirror the minimal shape-compatible NodeLogsError mock already used by
NodeProcessLogsPanel.coverage.test.tsx. No production code changes.

Tested: vitest run src/test/components/NodeProcessLogsPanel.test.tsx
src/test/components/NodeProcessLogsPanel.coverage.test.tsx → 6/6 pass.
santoshkumarradha added a commit that referenced this pull request Apr 9, 2026
#374)

RegisterServerlessAgentHandler unconditionally set LifecycleStatus=ready
on re-registration and called RegisterAgent directly, bypassing the
admin-revocation preservation logic used by the standard registration
path (nodes.go:540-556) and the 503 block at nodes.go:1051-1056.

An admin-revoked serverless agent could clear its own pending_approval
state by simply calling POST /api/v1/nodes/register-serverless again —
defeating the revocation mechanism that #373 landed.

- Reject re-registration of admin-revoked agents with 503
  {"error": "agent_pending_approval", "message": "..."}, matching the
  stable contract used elsewhere in the handler.
- Preserve existingNode.LifecycleStatus and ApprovedTags on non-revoked
  re-registrations so the UPSERT does not clobber approval state.
- Leave first-registration behavior unchanged.

Adds regression tests covering the revoked-reject path and the
approved-tags-preservation path.

Closes #374.
@santoshkumarradha
Copy link
Copy Markdown
Member

Fix for #374 opened as #375 — independent PR against main, doesn't block this one. cc @AbirAbbas

Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep review completed. Verified TC-034 and TC-035 fixes with independent cross-cutting audit.

Fixes validated:

  • execute.go:1045 pending_approval guard placed before any execution-record creation, webhook registration, or target-type resolution — no early side-effects for blocked calls.
  • selectVersionedAgent (execute.go:~1555) already filters out pending_approval in both the healthy path and the fallback path — no version-rotation bypass.
  • status_manager.go correctly refuses to transition pending_approval agents back to online via heartbeat/health-check alone.
  • Re-registration path in nodes.go:915-990 correctly preserves admin-revoked pending_approval on re-register.

Additional fixes applied during review (follow-up commits on this branch):

  1. Response-shape contract (blocker caught by review): the original fix returned {"error": "<human text>", "error_category": "agent_error"}, breaking the stable contract used by reasoners.go / skills.go / permission middleware, which return {"error": "agent_pending_approval", "message": "..."}. Any client pattern-matching on error == "agent_pending_approval" would silently regress on /execute. Fixed by adding an optional errorCode field to executionPreconditionError and promoting it to the top-level error field in writeExecutionError when set. Backward-compatible for other precondition errors.

  2. Patch coverage gap (0% on execute.go:1045-1051): added a direct prepareExecution test asserting the error object (status, category, code, message), the wire-level JSON contract via writeExecutionError, and that no execution record is persisted before the guard fires. Also updated TestExecuteHandler_PendingApprovalAgent to assert the new contract.

  3. Brittle regex in Process Logs panel (the original TC-035 fix used /no base_url|404/i on e.message): replaced with a typed NodeLogsError class in services/api.ts carrying .status and .code (parsed from the response body's stable error field). NodeProcessLogsPanel now branches on e instanceof NodeLogsError && (status === 404 || code === "agent_unreachable") — robust to backend phrasing changes and matches the actual backend contract at control-plane/internal/handlers/ui/node_logs.go:49. Dev-mode console.debug preserved so devtools still surface swallowed empty states. Added two panel coverage tests (404 empty state, agent_unreachable empty state) and fixed the existing NodeProcessLogsPanel.test.tsx mock to export NodeLogsError so its instanceof check works.

Adjacent concerns flagged during review but explicitly OUT OF SCOPE for this PR (should be separate follow-ups):

  • RegisterServerlessAgentHandler stamps newNode.LifecycleStatus = ready directly and calls RegisterAgent without going through the pending_approval preservation logic. A revoked serverless agent may be able to re-register out of pending_approval. This is a separate security gap, not introduced or addressed by this PR.
  • asyncExecutionJob.process() has no re-check before invoking — currently safe because prepareExecution runs synchronously before enqueue, but worth a defensive re-check if the queue ever persists across server restarts.

All CI checks green. LGTM ✓

@santoshkumarradha santoshkumarradha merged commit 855e501 into main Apr 9, 2026
18 checks passed
@santoshkumarradha santoshkumarradha deleted the fix/qa-tc034-tc035 branch April 9, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants