fix: block revoked-tag agent calls + friendly empty logs state (TC-034, TC-035)#373
fix: block revoked-tag agent calls + friendly empty logs state (TC-034, TC-035)#373santoshkumarradha merged 5 commits intomainfrom
Conversation
…ty logs state TC-034: The direct execute API path (prepareExecution) did not check agent lifecycle_status before dispatching calls. Agents with revoked tags (status=pending_approval) were still callable via this path, despite being correctly blocked on the reasoner/DID paths. Added a lifecycle_status check that returns 503 Service Unavailable, consistent with the existing check in ExecuteReasonerHandler. TC-035: When a newly registered agent has no base_url or returns 404 for logs, the Process Logs panel showed raw error text. Now these expected "no logs yet" scenarios fall through to the friendly empty state message instead of rendering a destructive error alert. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
✅ Patch gate passedEvery surface whose lines were touched by this PR has patch coverage at or above the threshold. |
Adds TestExecuteHandler_PendingApprovalAgent to verify that agents with lifecycle_status=pending_approval return 503 Service Unavailable when called via the direct execute API path. Satisfies patch coverage gate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…NodeLogsError (TC-034, TC-035) Addresses deep-review findings on PR #373. TC-034 (Go / security response contract): - Extend executionPreconditionError with an optional errorCode field so preconditions can carry a stable machine-readable code. - writeExecutionError now promotes errorCode to top-level `error` and moves the human text to `message` when set, matching the contract already used by reasoners.go / skills.go / permission middleware ({"error": "agent_pending_approval", "message": "..."}). - prepareExecution sets errorCode: "agent_pending_approval" on the revoked-tag 503. Async execute inherits the fix for free since both paths share prepareExecution. - Without this, clients pattern-matching on error == "agent_pending_approval" silently regressed on the direct execute path — they would see only a generic agent_error message. - Adds a test asserting ErrorAs + 503 + ErrorCode + the wire-level response shape, and that no execution record is persisted before the guard fires. Closes the 0% patch-coverage gap CI was flagging for execute.go:1045-1051. TC-035 (UI / fragile string matching): - Promote node-logs fetch errors to a typed NodeLogsError class carrying .status and the stable .code from the response body. - NodeProcessLogsPanel branches on (status === 404 || code === "agent_unreachable") instead of regex-matching the human message — robust to backend phrasing changes and avoids swallowing unrelated errors that happen to contain "404". - Logs swallowed empty-state events via console.debug in dev so developers still see them in devtools. - Adds two panel tests covering the 404 and agent_unreachable branches; verifies they render the friendly empty state and that the generic-error branch still shows the destructive alert. Tested: go test ./internal/handlers/ (all pass); web panel vitest (4 tests pass); tsc --noEmit clean; eslint clean on touched files.
Upstream's TestExecuteHandler_PendingApprovalAgent asserted that the
human-readable text "awaiting tag approval" appeared in the top-level
`error` field. That shape is no longer correct under the stable-code
contract: `error` now carries the machine code
("agent_pending_approval") and the human text moves to `message` —
matching reasoners.go / skills.go / permission middleware.
Update the assertion to verify both fields of the new contract.
Follow-up — sibling bypass filed as #374During the deep review of this PR I flagged a pre-existing sibling bypass at the serverless re-registration path: Impact: an admin-revoked serverless agent (lifecycle Same threat surface as TC-034, same severity classification. Tracking separately to keep this PR focused on TC-034 / TC-035. Fix PR to follow shortly and will be linked here. cc @AbirAbbas |
…orks NodeProcessLogsPanel now does `e instanceof NodeLogsError` in its error branch (to distinguish 404 / agent_unreachable empty states from real errors). The existing NodeProcessLogsPanel.test.tsx mock for @/services/api did not export NodeLogsError, so the reference was undefined at runtime and `instanceof` threw a TypeError. That TypeError was caught inside the component's error handling and prevented the destructive-alert render path from running — failing the "shows a destructive alert when the tail request fails" test even though the fallback logic itself was correct. Mirror the minimal shape-compatible NodeLogsError mock already used by NodeProcessLogsPanel.coverage.test.tsx. No production code changes. Tested: vitest run src/test/components/NodeProcessLogsPanel.test.tsx src/test/components/NodeProcessLogsPanel.coverage.test.tsx → 6/6 pass.
#374) RegisterServerlessAgentHandler unconditionally set LifecycleStatus=ready on re-registration and called RegisterAgent directly, bypassing the admin-revocation preservation logic used by the standard registration path (nodes.go:540-556) and the 503 block at nodes.go:1051-1056. An admin-revoked serverless agent could clear its own pending_approval state by simply calling POST /api/v1/nodes/register-serverless again — defeating the revocation mechanism that #373 landed. - Reject re-registration of admin-revoked agents with 503 {"error": "agent_pending_approval", "message": "..."}, matching the stable contract used elsewhere in the handler. - Preserve existingNode.LifecycleStatus and ApprovedTags on non-revoked re-registrations so the UPSERT does not clobber approval state. - Leave first-registration behavior unchanged. Adds regression tests covering the revoked-reject path and the approved-tags-preservation path. Closes #374.
|
Fix for #374 opened as #375 — independent PR against |
santoshkumarradha
left a comment
There was a problem hiding this comment.
Deep review completed. Verified TC-034 and TC-035 fixes with independent cross-cutting audit.
Fixes validated:
execute.go:1045pending_approval guard placed before any execution-record creation, webhook registration, or target-type resolution — no early side-effects for blocked calls.selectVersionedAgent(execute.go:~1555) already filters out pending_approval in both the healthy path and the fallback path — no version-rotation bypass.status_manager.gocorrectly refuses to transition pending_approval agents back to online via heartbeat/health-check alone.- Re-registration path in
nodes.go:915-990correctly preserves admin-revoked pending_approval on re-register.
Additional fixes applied during review (follow-up commits on this branch):
-
Response-shape contract (blocker caught by review): the original fix returned
{"error": "<human text>", "error_category": "agent_error"}, breaking the stable contract used byreasoners.go/skills.go/ permission middleware, which return{"error": "agent_pending_approval", "message": "..."}. Any client pattern-matching onerror == "agent_pending_approval"would silently regress on/execute. Fixed by adding an optionalerrorCodefield toexecutionPreconditionErrorand promoting it to the top-levelerrorfield inwriteExecutionErrorwhen set. Backward-compatible for other precondition errors. -
Patch coverage gap (0% on execute.go:1045-1051): added a direct
prepareExecutiontest asserting the error object (status, category, code, message), the wire-level JSON contract viawriteExecutionError, and that no execution record is persisted before the guard fires. Also updatedTestExecuteHandler_PendingApprovalAgentto assert the new contract. -
Brittle regex in Process Logs panel (the original TC-035 fix used
/no base_url|404/ione.message): replaced with a typedNodeLogsErrorclass inservices/api.tscarrying.statusand.code(parsed from the response body's stableerrorfield).NodeProcessLogsPanelnow branches one instanceof NodeLogsError && (status === 404 || code === "agent_unreachable")— robust to backend phrasing changes and matches the actual backend contract atcontrol-plane/internal/handlers/ui/node_logs.go:49. Dev-modeconsole.debugpreserved so devtools still surface swallowed empty states. Added two panel coverage tests (404 empty state, agent_unreachable empty state) and fixed the existingNodeProcessLogsPanel.test.tsxmock to exportNodeLogsErrorso itsinstanceofcheck works.
Adjacent concerns flagged during review but explicitly OUT OF SCOPE for this PR (should be separate follow-ups):
RegisterServerlessAgentHandlerstampsnewNode.LifecycleStatus = readydirectly and callsRegisterAgentwithout going through the pending_approval preservation logic. A revoked serverless agent may be able to re-register out of pending_approval. This is a separate security gap, not introduced or addressed by this PR.asyncExecutionJob.process()has no re-check before invoking — currently safe becauseprepareExecutionruns synchronously before enqueue, but worth a defensive re-check if the queue ever persists across server restarts.
All CI checks green. LGTM ✓
Summary
Fixes two QA findings from PR #330 review:
TC-034 (Critical/Security): The direct execute API path (
prepareExecutioninexecute.go) did not checklifecycle_statusbefore dispatching agent calls. Agents with revoked tags (pending_approvalstatus) were still callable via this path, despite being correctly blocked on the reasoner and DID resolution paths. Added alifecycle_status == pending_approvalcheck that returns 503 Service Unavailable withagent_pending_approvalerror, consistent withExecuteReasonerHandlerandExecuteSkillHandler.TC-035 (Low/UX): The Process Logs panel (
NodeProcessLogsPanel) showed raw API error text ("node has no base_url","404") when a newly registered agent had never executed a workflow. These expected empty-state errors now fall through to the existing friendly empty state message ("No log lines yet...") instead of rendering a destructive error alert.Test plan
go build ./...)tsc --noEmit)go test ./internal/handlers/)NodeProcessLogsPanel.coverage.test.tsxpasses/api/v1/agents/{id}/execute→ expect 503🤖 Generated with Claude Code