Skip to content

test: add 24 critical-path test files across control plane, SDKs#352

Merged
AbirAbbas merged 30 commits intomainfrom
chore/test-coverage-improvements
Apr 8, 2026
Merged

test: add 24 critical-path test files across control plane, SDKs#352
AbirAbbas merged 30 commits intomainfrom
chore/test-coverage-improvements

Conversation

@santoshkumarradha
Copy link
Copy Markdown
Member

Summary

24 new test files closing the highest-leverage gaps surfaced by a fresh test-coverage audit. Focus on functional / unit / integration tests on paths most likely to break user-facing behavior when AI writes code against this repo.

The internal TEST_COVERAGE_AUDIT.md was removed in the first commit on this branch — it had become stale and was not meant to ship in the repo.

Coverage added

Go control plane (8 files)

Services + middleware + server

  • services/did_web_service_test.go — DID parsing, generation, resolution, round-trip
  • services/ui_service_test.go — client subscription, dedupe, heartbeat, concurrent register/close
  • services/executions_ui_service_test.go — grouping, duration aggregation, status summary, filtering
  • server/config_db_test.go — storage section preservation, DB overlay, YAML round-trip
  • server/middleware/permission_test.go — caller DID precedence, body restoration, fail-closed, target parsing
  • server/middleware/connector_capability_test.go — disabled / read-only / nil-map handling

Handlers (large untested files)

  • handlers/reasoners_test.go — execution routing, header propagation, persistence, serverless payload
  • handlers/memory_events_test.go — WS upgrade, pattern filter, scope filter, disconnect cleanup, burst publish

Python SDK (4 files)

  • test_did_manager_error_paths.py — register_agent under timeout / 5xx / bad JSON / auth headers
  • test_vc_generator_error_paths.py — VC generation error paths
  • test_tool_calling_error_paths.py — malformed args, max turns, tool not found, mixed valid/invalid
  • test_agent_graceful_shutdown.py — idempotent stop, pending tasks, notify failures, cleanup

Go SDK (4 files)

  • agent/registration_integration_test.go — register handshake, fallback, approval polling, races
  • agent/verification_test.go — LocalVerifier refresh, did:key resolution, concurrent access (race-clean)
  • agent/memory_backend_test.go — scope-aware headers, error propagation, query params
  • harness/provider_error_integration_test.go — provider crash / timeout / malformed JSONL / env-var unset / missing binary

TypeScript SDK (8 files)

  • agentfield_client.test.ts — REST verbs, error envelope, headers, DID signing, timeouts
  • agent_lifecycle.test.ts — serve/shutdown, heartbeat scheduling, registration payloads, failures
  • execution_context_async.test.ts — AsyncLocalStorage propagation across nested + parallel runs
  • memory_client_scopes.test.ts — scope resolution, metadata passthrough, 404 contract
  • workflow_reporter_dag.test.ts — progress events, transitions, failure propagation
  • tool_calling_errors.test.ts — malformed args, missing tool, max turns, discovery filters
  • harness_runner_resilience.test.ts — transient retry classification, backoff, cost aggregation
  • agent_router_dispatch.test.ts — skill vs reasoner routing, schema validation, 404

Methodology

  1. Five parallel discovery agents produced fresh per-area gap briefs (the April 5 audit was stale — many flagged files now have tests).
  2. Six parallel codex workers wrote tests grouped by file ownership (no cross-worker overlap on source files).
  3. Each suite verified locally; broken assertions and flaky timing fixed:
    • config_db_test.go round-trip equality (nil-vs-empty slice via YAML normalization)
    • reasoners_test.go ambiguous-selector compile error from over-embedded fake storage
    • verification_test.go deadlocked errCh capacity (8 → 40)
    • registration_integration_test.go over-strict ErrorIs assertion against a fixed-string source error

Test plan

  • cd control-plane && go test ./internal/services/... ./internal/server/... ./internal/handlers/... — green
  • cd sdk/go && go test ./agent/ ./harness/ (new tests) — green
  • cd sdk/python && python3 -m pytest tests/test_did_manager_error_paths.py tests/test_vc_generator_error_paths.py tests/test_tool_calling_error_paths.py tests/test_agent_graceful_shutdown.py — 23 passed, 5 intentional skips
  • cd sdk/typescript && npx vitest run <8 new test files> — 42 passed across 8 suites

Intentional skips (Python)

Five subtests are skipped with source bug: markers documenting real defects discovered while writing the tests. These are targets for follow-up fixes in the implementation, not test bugs:

  • `execute_tool_call_loop` raises when tool call omits `function.arguments`
  • tool timeouts do not break the loop early
  • `Agent.stop()` is not implemented
  • graceful shutdown does not track or cancel in-flight tasks
  • graceful shutdown does not enforce timeout-based task cancellation

Notes

  • No source files were modified — only new `test.go` / `test.py` / `*.test.ts` files were added.
  • No new dependencies were added to go.mod, pyproject.toml, or package.json.
  • Total: ~3,600 lines of test code across 24 files.

@santoshkumarradha santoshkumarradha requested review from a team and AbirAbbas as code owners April 7, 2026 12:41
@santoshkumarradha santoshkumarradha marked this pull request as draft April 7, 2026 12:41
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Performance

SDK Memory Δ Latency Δ Tests Status
Python 7.9 KB -13% 0.39 µs +11%
Go 211 B -25% 0.57 µs -43%
TS 428 B +22% 2.70 µs +35%

Regression detected:

  • TypeScript memory: 350 B → 428 B (+22%)

@santoshkumarradha
Copy link
Copy Markdown
Member Author

santoshkumarradha commented Apr 7, 2026

Source bugs filed

The 5 pytest.skip("source bug: ...") markers in this PR have been filed as tracked issues:

# Test Issue
1 test_tool_calling_error_paths.py::test_malformed_tool_call_missing_arguments_is_reported_and_loop_continues #353
2 test_tool_calling_error_paths.py::test_tool_execution_timeout_breaks_loop_early #354
3 test_agent_graceful_shutdown.py::test_agent_stop_is_idempotent #355
4 test_agent_graceful_shutdown.py::test_graceful_shutdown_cancels_in_flight_tasks_within_deadline #356
5 test_agent_graceful_shutdown.py::test_graceful_shutdown_force_cancels_tasks_after_timeout #357

Each issue contains the file path, repro snippet, expected behavior, and acceptance criteria. When the underlying source bug is fixed, the corresponding skipped test should be unskipped and will pass.

@AbirAbbas
Copy link
Copy Markdown
Contributor

AbirAbbas commented Apr 7, 2026

heads up, this branch is forked off v0.1.65-rc.3 and is currently 13 commits behind main. Key changes that landed since:

Some of the issues we've been seeing (SSE connection handling, execution hangs) have already been fixed on main but surface here because of the stale base. A rebase onto latest main should resolve those.

…verlay

Adds white-box unit tests for previously-untested control plane files:

services/
- did_web_service: ParseDIDWeb / GenerateDIDWeb round-trip and resolution
- ui_service: client subscription, dedupe, heartbeat, concurrent register/close
- executions_ui_service: grouping, duration aggregation, status summary, filtering

server/
- config_db: storage section preservation, DB overlay merge, YAML round-trip,
  invalid-payload handling

server/middleware/
- permission: caller DID precedence, request body restoration, fail-closed,
  pending-approval target, target param parsing
- connector_capability: disabled / read-only / nil-map handling, method gating

Also adds .plandb.db to .gitignore.
reasoners.go (~700 LOC, previously untested):
- malformed reasoner-id parsing
- node lookup, offline / unhealthy paths
- workflow execution record persistence on success and failure
- header propagation to proxied agent (X-Workflow-ID, X-Run-ID, etc.)
- serverless payload encoding

memory_events.go (WS + SSE memory subscriptions):
- WebSocket upgrade success and rejection
- Pattern filter matching, scope/scopeId filtering
- Client disconnect cleanup (no goroutine leak)
- Burst publish handling under slow reader
…roviders

agent/registration_integration_test.go
- happy-path register against httptest control plane
- 404 fallback to legacy /api/v1/nodes/register
- approval-pending exits cleanly when parent context ends
- empty AgentFieldURL produces a clear error
- concurrent RegisterNode does not race

agent/verification_test.go (LocalVerifier)
- Refresh populates policies, revocations, registered DIDs, admin pubkey
- Refresh failure preserves prior cache
- NeedsRefresh respects refreshInterval
- concurrent Refresh + CheckRevocation safe under -race
- did:key public key resolution and graceful malformed-input handling

agent/memory_backend_test.go (ControlPlaneMemoryBackend)
- scope-aware headers (workflow / session / global)
- 404 → not-found sentinel; 500 propagated cleanly
- Delete uses POST /api/v1/memory/delete
- list builds correct query params

harness/provider_error_integration_test.go
- provider crash with no stderr
- timeout under context deadline
- malformed JSONL middle line tolerated
- env var Env{KEY:""} unsets in subprocess
- missing binary returns FailureCrash with helpful message
…tdown

Python SDK has good happy-path coverage; these add failure-mode tests:

test_did_manager_error_paths.py
- network timeout / 5xx / truncated JSON during register_agent
- X-API-Key header forwarded when configured
- agent continues functioning after registration failure (silent degrade)

test_vc_generator_error_paths.py
- generate_execution_vc / create_workflow_vc under timeout / 5xx / bad JSON
- disabled generator makes no HTTP calls

test_tool_calling_error_paths.py
- malformed tool args, invalid arg types, mixed valid/invalid in one turn
- max_turns enforcement
- tool not found does not crash the loop

test_agent_graceful_shutdown.py
- idempotent re-entrant stop
- pending in-flight task handling
- notification failure during shutdown
- resource cleanup

Five subtests are intentionally skipped with 'source bug:' markers documenting
real defects discovered while writing the tests (Agent.stop() unimplemented,
graceful_shutdown does not track in-flight tasks, etc.). These are targets for
follow-up fixes in the implementation, not test bugs.
…atures

The TS SDK had only ~6 real test files for ~50 source files. This adds
behavior tests for the most-critical surfaces:

Core client
- agentfield_client: REST verbs, error envelope parsing, header propagation,
  DID-signed requests, timeout behavior
- agent_lifecycle: serve()/shutdown(), heartbeat scheduling, registration
  payload, registration-failure handling
- execution_context_async: AsyncLocalStorage propagation across nested and
  parallel runs, isolation guarantees
- memory_client_scopes: workflow/session/global scope resolution, metadata
  passthrough headers, 404→undefined contract

Features
- workflow_reporter_dag: progress() / state transitions / failure propagation
- tool_calling_errors: malformed JSON args, missing tool, max turns, max
  tool calls, discovery filters
- harness_runner_resilience: transient retry classification, backoff,
  cost aggregation across attempts
- agent_router_dispatch: skill vs reasoner routing, schema validation, 404

42 tests across 8 suites, all green via vitest.
- memory_events_test.go: The SSE handler does not flush response headers
  until it writes the first event, so http.Client.Do blocks indefinitely
  when no event is published before the request begins. Run the request
  in a goroutine, wait for the subscription to register, then publish.
- reasoners_test.go: Drop X-Agent-Node-ID propagation assertion. The
  serverless execution path does not forward this caller header to the
  downstream agent request, so the original assertion was incorrect.
The SSE handler in memory_events.go defers header flushing until the first
matching event is written, and uses the deprecated CloseNotify() for client
disconnect detection. Both behaviors interact poorly with httptest in CI:
http.Client.Do blocks until the handler writes, and the test never
completes within the CI test deadline.

The other tests in this file (WS happy path, invalid-pattern cleanup,
backpressure disconnect, upgrade rejection) already cover the same code
paths, so skipping just this one is a clean win.

Tracked source fix: #358
… is fixed

The earlier deadlock was fixed in 7c81c53 by running the request in a
goroutine and publishing after the subscription registers. The follow-up
skip in c8992cd was redundant — the restructured test passes locally
and in CI. Source-side flush refactor still tracked in #358.
@santoshkumarradha santoshkumarradha force-pushed the chore/test-coverage-improvements branch from c293147 to 572743a Compare April 8, 2026 03:35
@santoshkumarradha
Copy link
Copy Markdown
Member Author

santoshkumarradha commented Apr 8, 2026

Update: this PR is now properly rebased onto the latest main and includes the control-plane, web UI, and SDK test/coverage work we had been carrying on the related coverage branches.

The stale-base issues from v0.1.65-rc.3 are resolved here, the post-#345 / post-#359 test drift has been fixed, and the full GitHub Actions matrix is now green again, including linux-tests, control-plane-image, both functional test jobs, and the SDK CI jobs.

@AbirAbbas this should now reflect the current codebase and be in good shape for review.

@AbirAbbas AbirAbbas added this pull request to the merge queue Apr 8, 2026
Merged via the queue into main with commit cf922f9 Apr 8, 2026
35 checks passed
santoshkumarradha added a commit that referenced this pull request Apr 8, 2026
Main #350 ("Chore/UI audit phase1 quick wins") deleted ~14k lines of UI
components (HealthBadge, NodeDetailPage, NodesPage, AllReasonersPage,
EnhancedDashboardPage, ExecutionDetailPage, RedesignedExecutionDetailPage,
ObservabilityWebhookSettingsPage, EnhancedExecutionsTable, NodesVirtualList,
SkillsList, ReasonersSkillsTable, CompactExecutionsTable, AgentNodesTable,
LoadingSkeleton, AppLayout, EnhancedModal, ApproveWithContextDialog,
EnhancedWorkflowFlow, EnhancedWorkflowHeader, EnhancedWorkflowOverview,
EnhancedWorkflowEvents, EnhancedWorkflowIdentity, EnhancedWorkflowData,
WorkflowsTable, CompactWorkflowsTable, etc.).

35 test files added by PR #352 and waves 1/2 import these now-deleted
modules and break the build. They're removed here because:
- The components they exercise no longer exist on main.
- main's CI is currently red on the same import errors (control-plane-image
  + Functional Tests both fail at tsc -b on GeneralComponents.test.tsx and
  NodeDetailPage.test.tsx). This commit fixes that regression as a side
  effect.
- Two further tests (NewSettingsPage, RunsPage) failed at the vitest level
  on the post-#350 main but were never reached by main's CI because tsc
  errored first; they're removed too.

Web UI vitest now: 80 files / 353 tests / all green.
Coverage will be recovered against main's new component layout in a
follow-up commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
santoshkumarradha added a commit that referenced this pull request Apr 8, 2026
Third batch of test additions from parallel codex + gemini-2.5-pro headless
workers, focused on packages affected by main's #350 UI cleanup and main's
new internal/skillkit package.

Go control plane (per-package line coverage now):
  cli:           68.3 -> 82.1   (cli regressed earlier; recovered)
  handlers/ui:   71.2 -> 80.2   (target hit)
  skillkit:       0.0 -> 80.2   (new package from main #367)
  storage:       73.6 -> 79.5   (de-duplicated ptrTime helper)

Aggregate Go control plane: 78.13% -> 82.38%  (>= 80%)

Web UI (vitest, against post-#350 component layout):
  - Restored RunsPage and NewSettingsPage tests rewritten against the
    refactored sources (the original #352 versions failed against new main
    and were removed in commit 03dd44e).
  - New tests for: AppLayout, AppSidebar, RecentActivityStream, ExecutionForm
    branches, RunLifecycleMenu, dropdown-menu, status-pill, ui-modals,
    notification, TimelineNodeCard, CompactWorkflowInputOutput,
    ExecutionScatterPlot, useDashboardTimeRange, use-mobile.

Aggregate Web UI lines: 69.71% -> 81.14%  (>= 80%)

============================
COMBINED REPO COVERAGE: 81.60%
============================

435 / 435 vitest tests passing across 97 files.
All Go packages compiling and passing go test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants