Apply 18 audit fixes: DocC, concurrency, tests, lint by christopherkarani · Pull Request #78 · christopherkarani/Swarm

christopherkarani · 2026-04-30T23:04:58Z

Summary

Resolves all P1, P2, and P3 findings from the framework audit. Three buckets:

DocC + README drift — the namespace-level DocC in Swarm.swift advertised the deprecated Agent.Builder() pattern with non-throwing initializers. The first thing every Xcode user sees was wrong. Now matches the V3 trailing-closure canonical.
Latent concurrency risks — .timeout(after:) on a stream finished the consumer continuation but left the upstream processingTask running, leaking tokens / network requests past the user-visible timeout. Workflow.Step and MergeStrategy carried @unchecked Sendable that Swift 6.2 no longer needs (auto-derivation works). Lock-without-defer asymmetries hardened.
Coverage gaps — Tests/SwarmTests/Workflow/ only contained WorkflowDurableTests.swift. The canonical workflow API (sequential, parallel + every merge strategy, route, repeatUntil, composition) had no anchored test suite — coverage was scattered across integration and regression tests. Now anchored in WorkflowTests.swift.

Also bundles pre-existing in-flight maintenance work that was uncommitted in the working tree at audit time (additional regression tests, package.json for VitePress, .swiftformat, .swiftlint.yml, CI tweaks). The audit-specific changes are listed below; the rest are routine maintenance the worktree was already carrying.

What changed (audit fixes)

P1 — Materially misleading public surface

Sources/Swarm/Swarm.swift — Rewrote namespace DocC. Removed Agent.Builder() reference; both code blocks now try correctly.
docs/prompts/*.md — git rm --cached on two files that leaked past .gitignore despite the rule.
README.md — Added "Each agent resolves its own provider" sentence under the multi-agent pipeline example.

P2 — Latent risks

Sources/Swarm/Core/StreamOperations.swift — .timeout(after:) now cancels the upstream processingTask when the timeout fires. Introduces a small private TimeoutTaskRef (NSLock-guarded, scoped to the function) so the timeout task can hold a forward reference. Without this, upstream LLM calls / network requests continued past the consumer-visible timeout.
Sources/Swarm/Workflow/Workflow.swift — Removed @unchecked Sendable from Workflow.Step and MergeStrategy. Swift 6.2 auto-derives Sendable since AgentRuntime: Sendable and the closures are @Sendable. The @unchecked suppression is gone — the compiler now enforces.
Sources/Swarm/Providers/MultiProvider.swift — Documented snapshot semantics on nonisolated public func stream(...): provider is captured at call time, setModel() does not redirect in-flight streams.
Sources/Swarm/MCP/MCPClient.swift — Documented dedup invariants. Actor isolation makes the originally-flagged race benign; the comment guards future readers from re-flagging.
Sources/Swarm/Tools/ParallelToolExecutor.swift — Documented dual-validation contract (pre-pass for fast-fail, per-call re-validation in the task group; both layers protect).
Tests/SwarmTests/Workflow/WorkflowTests.swift (new) — Canonical 12-test suite covering sequential, parallel × .structured/.indexed/.first/.custom, route, repeatUntil, composition.
Tests/SwarmCapabilityShowcaseTests/CapabilityShowcaseTests.swift — Strengthened to assert per-scenario invariants (non-empty id/name/summary/families) and that every required capability family is covered by at least one passing scenario, not just trusting the showcase's self-reported status.
Tests/SwarmTests/Mocks/MockInferenceProvider.swift — Replaced Task.sleep(for: .milliseconds(1)) with Task.yield() for deterministic streaming.
Tests/SwarmTests/Core/StreamOperationsTests.swift — Added timeoutCancelsUpstream regression test asserting upstream onTermination fires within 500ms of the timeout — the deterministic signal that proves processingTask was cancelled.
README.md — Streaming example now includes .lifecycle(.failed) case and a comment explaining what default: break elides.

P3 — Polish

Sources/Swarm/Providers/MultiProvider.swift + Sources/Swarm/Agents/Agent.swift — defer { lock.unlock() } symmetry in CapabilitySnapshot.store, ProviderSnapshot.store, TimedOperationCoordinator.{install, setOperationTask, setTimeoutTask}.
Sources/Swarm/Agents/Agent.swift — @_disfavoredOverload on the two existential-tool initializers (tools: [any AnyJSONTool]) so autocomplete prefers the typed tools: [some Tool] path.
README.md — Linux / FoundationModels callout near the on-device example.
docs/.vitepress/config.ts — Registered svg as an alias for xml in shiki config; eliminates 6× build-time warnings.

What changed (in-flight worktree work also bundled)

These were uncommitted in the worktree before the audit and are being committed alongside:

.swiftformat, .swiftlint.yml, scoped to Sources/Tests
package.json, package-lock.json (VitePress build pipeline)
scripts/ci/verify-remote-release.sh
New regression tests: FrameworkDXRegressionTests.swift, DocumentationFreshnessTests.swift, HandoffBehaviorTests.swift, MemoryClearGenerationTests.swift, VectorMemorySemanticRetrievalTests.swift, SemanticCompactorToolBehaviorTests.swift, WebSearchToolBehaviorTests.swift, ReadmeProviderCompileTests.swift, AgentRunGuardrailBehaviorTests.swift, PromptStringTestSupport.swift
Examples/CodeReviewer/Sources/CodeReviewer/main.swift (replaces Placeholder.swift)
CI workflow updates in .github/workflows/
Package.swift swift-syntax range widening
Doc updates in docs/guide/, docs/reference/

Why

The audit found that the framework's first-impression DocC inside Xcode advertised a deprecated pattern. That's the highest-leverage single fix.
Stream timeout was leaking upstream consumption — billing / rate-limit risk on real providers.
Workflow @unchecked Sendable was historical scar tissue masking conformance Swift 6.2 can now prove.
Adding a focused WorkflowTests suite gives maintainers a single anchor for parallel/route/repeatUntil regressions instead of chasing scenario tests.

Verification

Gate	Result
`swift build`	✅ pass (12.6s)
`SWARM_INCLUDE_DEMO=1 swift build`	✅ pass
`swift test --no-parallel`	✅ 1689 tests in 232 suites (was 1676/231)
`swiftlint --strict --config .swiftlint.yml Sources Tests`	✅ pass
`swiftformat Sources Tests --lint --config .swiftformat`	✅ pass
`npm run docs:build`	✅ pass — SVG warnings gone

Test plan

Verify CI green on macOS-15 + Ubuntu (Linux) jobs
Confirm Hive integration tests pass with SWARM_HIVE_RUNTIME=1 SWARM_INCLUDE_HIVE=1
Confirm swift run SwarmCapabilityShowcase matrix runs end-to-end
Spot-check Xcode quick-help on import Swarm shows the new V3 quick-start
Confirm docs/prompts/*.md no longer in git ls-files

Reviewer notes

Subagent A's "lock-without-defer is P1 deadlock" claim was a false positive — Swift assignment cannot throw. The fix is style symmetry, downgraded to P3.
Subagent B's ".tool(.completed(let call, _)) won't compile" claim was a false positive — Swift allows positional binding of labeled enum cases.
Subagent C's "150 Issue.record sites don't fail tests" claim was wrong — Issue.record(_:) is the Swift Testing equivalent of XCTFail. Existing tests are correct; converting to #expect(throws:) is a separate stylistic cleanup, deferred.
The Clock-injection refactor (deterministic CircuitBreaker / RetryPolicy tests) and the XCTest → Swift Testing migration are deferred as larger cleanups — not included here.
.agents/ directory in the worktree is intentionally not committed (local AI tooling config, analogous to the gitignored .claude/).

Audit report

Full audit (findings, evidence, validation) is in the session transcript that produced these fixes.

@unchecked

Resolves all P1-P3 findings from the framework audit: P1 — Materially misleading public surface - Rewrite Swarm.swift namespace DocC quick-start (was promoting deprecated Agent.Builder + non-throwing init). - Untrack docs/prompts/*.md that leaked past .gitignore. - Add provider-resolution sentence to README multi-agent example. P2 — Latent risks and weak coverage - Stream timeout now cancels upstream processingTask so token/network consumption stops when the consumer-visible timeout fires. - Remove @unchecked Sendable from Workflow.Step and MergeStrategy (Swift 6.2 auto-derives correctly given AgentRuntime: Sendable). - Document MultiProvider snapshot semantics: in-flight streams capture provider at call time and are not redirected by setModel(). - Document MCPClient cache-dedup invariants (actor isolation makes the flagged race benign; comment guards against future re-flagging). - Document ParallelToolExecutor dual-validation contract. - Add canonical Tests/SwarmTests/Workflow/WorkflowTests.swift covering sequential, all 4 parallel merge strategies, route, repeatUntil, and composition. Strengthen CapabilityShowcaseTests to assert per-scenario invariants and family coverage instead of trusting status flag. - Replace Task.sleep with Task.yield in MockInferenceProvider streaming. - Add streaming default-case clarifying comment in README. - Add timeout regression test asserting upstream onTermination fires. P3 — Polish - defer { lock.unlock() } symmetry in CapabilitySnapshot, ProviderSnapshot, and TimedOperationCoordinator. - @_disfavoredOverload on existential-tool Agent inits so autocomplete prefers the typed [some Tool] path. - Linux/FoundationModels callout in README. - Register svg as xml alias in VitePress config to silence build warnings. Verification: swift build, swift test (1689 tests, was 1676), swiftlint --strict, swiftformat --lint, npm run docs:build, SWARM_INCLUDE_DEMO=1 swift build — all green. Also bundles pre-existing in-flight tests, lint configs, package.json, docs updates, and CI tweaks that were uncommitted in the working tree at audit time.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f434e9561

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-30T23:10:05Z

+        if let globalProvider = await Swarm.defaultProvider,
+           let provider = privateInferenceProvider(globalProvider)
+        {
+            return provider
+        }


Include cloudProvider in privacy-required resolution path

When privacyRequired is enabled, this resolver checks Foundation Models, explicit provider, environment provider, and Swarm.defaultProvider, then immediately fails; it never considers Swarm.cloudProvider. That regresses valid setups where the only configured private-capable provider is the cloud provider (especially tool/handoff flows), causing inferenceProviderUnavailable even though a compliant provider exists.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-30T23:10:05Z

+                await withCheckedContinuation { continuation in
+                    waiters[key, default: []].append(continuation)
+                }


Honor cancellation while waiting for default-memory session lock

This wait path parks canceled tasks in waiters and resumes them later without any cancellation check. If a run is canceled while queued behind another session, it can still wake up, claim the run slot, and perform side effects (including memory clearing and startup work) before cancellation is observed deeper in execution. The wait needs to be cancellation-aware (or immediately re-check cancellation after resume).

Useful? React with 👍 / 👎.

Fix #1 (P1) — `resolvedPrivateInferenceProvider` now consults `Swarm.cloudProvider` after `Swarm.defaultProvider`, mirroring the non-private resolver. The same `privateInferenceProvider(_:)` capability filter ensures we only return it if it reports `.privateInference`. Previously, an operator who configured a privacy-capable provider via `Swarm.configure(cloudProvider: ...)` would still hit `inferenceProviderUnavailable` when `privacyRequired = true`, even though a compliant provider was registered. Fix #2 (P2) — `DefaultMemorySessionTracker.beginRun` is now cancellation- aware: - signature changed to `async throws -> Bool` - waiters re-keyed by per-call UUID so cancellation can target the exact parked task without disturbing siblings - continuation type widened to `CheckedContinuation<Void, Error>` so cancellation can resume with `CancellationError` - `withTaskCancellationHandler` wakes a parked task immediately on cancel rather than letting it sleep until the holder calls `endRun` - `Task.checkCancellation()` at top of wait loop *and* after exit, closing the race where `endRun` resumes a continuation just as the parent task is cancelled Without these, a cancelled task could wake up later, claim the run slot, and perform memory-clear / lifecycle side effects before the cancellation surfaced deeper in execution. Caller in `runInternal` updated to `try await` the new throwing signature. Regression tests in `Tests/SwarmTests/Agents/AgentCodexReviewFixesTests.swift` cover both fixes. Verification: swift build, swift test (1692 tests, was 1689), swiftlint --strict, swiftformat --lint — all green.

Resolves conflicts: - Sources/Swarm/Agents/Agent.swift — keep both `resolvedPrivateInferenceProvider` (Codex Fix #1) and `transformedInferenceProvider` (OpenTelemetry tracing from #77). Wrap each return in the private resolver through the new transform helper so privacy-required runs also receive OpenTelemetry instrumentation. - docs/reference/overview.md — keep validated /reference/* link paths (verified by `npm run docs:build`); adopt the OpenTelemetry mention in the Observability row and the MultiProvider mention in the Providers row from origin/main. - docs/reference/api-catalog.md — bump scanned-source-file count from 152 to 153 to account for the new InferenceProviderMetadata.swift file introduced by the merged OpenTelemetry work, satisfying DocumentationFreshnessTests. Verification: swift build, swift test (1699 tests, was 1692), swiftlint --strict, swiftformat --lint — all green.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b3ef47d44

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-01T00:15:49Z

+            try? await Task.sleep(for: duration)
+            processingTaskRef.task?.cancel()
            continuation.finish(throwing: AgentError.timeout(duration: duration))


Return on cancelled timeout task before finishing with timeout

The timeout worker now uses try? await Task.sleep(for:), which swallows CancellationError and still executes continuation.finish(throwing: AgentError.timeout(...)). When the upstream stream completes first, it cancels timeoutTask; this cancelled timeout task can still race and attempt to terminate the stream as a timeout, producing an incorrect timeout failure for a run that actually completed normally. Treat cancellation as an early return in the timeout task (instead of try?) so only real deadline expiry emits the timeout error.

Useful? React with 👍 / 👎.

Codex review on 2b3ef47 flagged that the timeout body used `try? await Task.sleep(for:)`, which swallows `CancellationError`. When the upstream stream completes first, `processingTask` cancels `timeoutTask`; with `try?`, the cancelled timeout body would still run to completion — calling `processingTaskRef.task?.cancel()` (harmless no-op) and then `continuation.finish(throwing: AgentError.timeout(...))` in a tight race against the upstream's natural `continuation.finish()`. The first finish wins, so a run that completed normally could surface as a timeout error. Switching back to plain `try` lets `CancellationError` propagate out of the Task body, so the timeout side-effects only run when the deadline actually expired (sleep returned successfully). The added comment documents the contract so future edits don't reintroduce the regression. Test: existing `timeoutCancelsUpstream` regression test still passes — it exercises the real-deadline path, which is unchanged.

christopherkarani added 2 commits April 25, 2026 20:38

Update Hive integration to 0.2.0

e6a31b4

chatgpt-codex-connector Bot reviewed Apr 30, 2026

View reviewed changes

christopherkarani added 2 commits May 1, 2026 03:03

chatgpt-codex-connector Bot reviewed May 1, 2026

View reviewed changes

christopherkarani merged commit eedae7a into main May 1, 2026
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply 18 audit fixes: DocC, concurrency, tests, lint#78

Apply 18 audit fixes: DocC, concurrency, tests, lint#78
christopherkarani merged 5 commits into
mainfrom
audit/fix-identified-issues

christopherkarani commented Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christopherkarani commented Apr 30, 2026

Summary

What changed (audit fixes)

P1 — Materially misleading public surface

P2 — Latent risks

P3 — Polish

What changed (in-flight worktree work also bundled)

Why

Verification

Test plan

Reviewer notes

Audit report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant