Apply 18 audit fixes: DocC, concurrency, tests, lint#78
Conversation
Resolves all P1-P3 findings from the framework audit: P1 — Materially misleading public surface - Rewrite Swarm.swift namespace DocC quick-start (was promoting deprecated Agent.Builder + non-throwing init). - Untrack docs/prompts/*.md that leaked past .gitignore. - Add provider-resolution sentence to README multi-agent example. P2 — Latent risks and weak coverage - Stream timeout now cancels upstream processingTask so token/network consumption stops when the consumer-visible timeout fires. - Remove @unchecked Sendable from Workflow.Step and MergeStrategy (Swift 6.2 auto-derives correctly given AgentRuntime: Sendable). - Document MultiProvider snapshot semantics: in-flight streams capture provider at call time and are not redirected by setModel(). - Document MCPClient cache-dedup invariants (actor isolation makes the flagged race benign; comment guards against future re-flagging). - Document ParallelToolExecutor dual-validation contract. - Add canonical Tests/SwarmTests/Workflow/WorkflowTests.swift covering sequential, all 4 parallel merge strategies, route, repeatUntil, and composition. Strengthen CapabilityShowcaseTests to assert per-scenario invariants and family coverage instead of trusting status flag. - Replace Task.sleep with Task.yield in MockInferenceProvider streaming. - Add streaming default-case clarifying comment in README. - Add timeout regression test asserting upstream onTermination fires. P3 — Polish - defer { lock.unlock() } symmetry in CapabilitySnapshot, ProviderSnapshot, and TimedOperationCoordinator. - @_disfavoredOverload on existential-tool Agent inits so autocomplete prefers the typed [some Tool] path. - Linux/FoundationModels callout in README. - Register svg as xml alias in VitePress config to silence build warnings. Verification: swift build, swift test (1689 tests, was 1676), swiftlint --strict, swiftformat --lint, npm run docs:build, SWARM_INCLUDE_DEMO=1 swift build — all green. Also bundles pre-existing in-flight tests, lint configs, package.json, docs updates, and CI tweaks that were uncommitted in the working tree at audit time.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f434e9561
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if let globalProvider = await Swarm.defaultProvider, | ||
| let provider = privateInferenceProvider(globalProvider) | ||
| { | ||
| return provider | ||
| } |
There was a problem hiding this comment.
Include cloudProvider in privacy-required resolution path
When privacyRequired is enabled, this resolver checks Foundation Models, explicit provider, environment provider, and Swarm.defaultProvider, then immediately fails; it never considers Swarm.cloudProvider. That regresses valid setups where the only configured private-capable provider is the cloud provider (especially tool/handoff flows), causing inferenceProviderUnavailable even though a compliant provider exists.
Useful? React with 👍 / 👎.
| await withCheckedContinuation { continuation in | ||
| waiters[key, default: []].append(continuation) | ||
| } |
There was a problem hiding this comment.
Honor cancellation while waiting for default-memory session lock
This wait path parks canceled tasks in waiters and resumes them later without any cancellation check. If a run is canceled while queued behind another session, it can still wake up, claim the run slot, and perform side effects (including memory clearing and startup work) before cancellation is observed deeper in execution. The wait needs to be cancellation-aware (or immediately re-check cancellation after resume).
Useful? React with 👍 / 👎.
Fix #1 (P1) — `resolvedPrivateInferenceProvider` now consults `Swarm.cloudProvider` after `Swarm.defaultProvider`, mirroring the non-private resolver. The same `privateInferenceProvider(_:)` capability filter ensures we only return it if it reports `.privateInference`. Previously, an operator who configured a privacy-capable provider via `Swarm.configure(cloudProvider: ...)` would still hit `inferenceProviderUnavailable` when `privacyRequired = true`, even though a compliant provider was registered. Fix #2 (P2) — `DefaultMemorySessionTracker.beginRun` is now cancellation- aware: - signature changed to `async throws -> Bool` - waiters re-keyed by per-call UUID so cancellation can target the exact parked task without disturbing siblings - continuation type widened to `CheckedContinuation<Void, Error>` so cancellation can resume with `CancellationError` - `withTaskCancellationHandler` wakes a parked task immediately on cancel rather than letting it sleep until the holder calls `endRun` - `Task.checkCancellation()` at top of wait loop *and* after exit, closing the race where `endRun` resumes a continuation just as the parent task is cancelled Without these, a cancelled task could wake up later, claim the run slot, and perform memory-clear / lifecycle side effects before the cancellation surfaced deeper in execution. Caller in `runInternal` updated to `try await` the new throwing signature. Regression tests in `Tests/SwarmTests/Agents/AgentCodexReviewFixesTests.swift` cover both fixes. Verification: swift build, swift test (1692 tests, was 1689), swiftlint --strict, swiftformat --lint — all green.
Resolves conflicts: - Sources/Swarm/Agents/Agent.swift — keep both `resolvedPrivateInferenceProvider` (Codex Fix #1) and `transformedInferenceProvider` (OpenTelemetry tracing from #77). Wrap each return in the private resolver through the new transform helper so privacy-required runs also receive OpenTelemetry instrumentation. - docs/reference/overview.md — keep validated /reference/* link paths (verified by `npm run docs:build`); adopt the OpenTelemetry mention in the Observability row and the MultiProvider mention in the Providers row from origin/main. - docs/reference/api-catalog.md — bump scanned-source-file count from 152 to 153 to account for the new InferenceProviderMetadata.swift file introduced by the merged OpenTelemetry work, satisfying DocumentationFreshnessTests. Verification: swift build, swift test (1699 tests, was 1692), swiftlint --strict, swiftformat --lint — all green.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b3ef47d44
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try? await Task.sleep(for: duration) | ||
| processingTaskRef.task?.cancel() | ||
| continuation.finish(throwing: AgentError.timeout(duration: duration)) |
There was a problem hiding this comment.
Return on cancelled timeout task before finishing with timeout
The timeout worker now uses try? await Task.sleep(for:), which swallows CancellationError and still executes continuation.finish(throwing: AgentError.timeout(...)). When the upstream stream completes first, it cancels timeoutTask; this cancelled timeout task can still race and attempt to terminate the stream as a timeout, producing an incorrect timeout failure for a run that actually completed normally. Treat cancellation as an early return in the timeout task (instead of try?) so only real deadline expiry emits the timeout error.
Useful? React with 👍 / 👎.
Codex review on 2b3ef47 flagged that the timeout body used `try? await Task.sleep(for:)`, which swallows `CancellationError`. When the upstream stream completes first, `processingTask` cancels `timeoutTask`; with `try?`, the cancelled timeout body would still run to completion — calling `processingTaskRef.task?.cancel()` (harmless no-op) and then `continuation.finish(throwing: AgentError.timeout(...))` in a tight race against the upstream's natural `continuation.finish()`. The first finish wins, so a run that completed normally could surface as a timeout error. Switching back to plain `try` lets `CancellationError` propagate out of the Task body, so the timeout side-effects only run when the deadline actually expired (sleep returned successfully). The added comment documents the contract so future edits don't reintroduce the regression. Test: existing `timeoutCancelsUpstream` regression test still passes — it exercises the real-deadline path, which is unchanged.
Summary
Resolves all P1, P2, and P3 findings from the framework audit. Three buckets:
Swarm.swiftadvertised the deprecatedAgent.Builder()pattern with non-throwing initializers. The first thing every Xcode user sees was wrong. Now matches the V3 trailing-closure canonical..timeout(after:)on a stream finished the consumer continuation but left the upstreamprocessingTaskrunning, leaking tokens / network requests past the user-visible timeout.Workflow.StepandMergeStrategycarried@unchecked Sendablethat Swift 6.2 no longer needs (auto-derivation works). Lock-without-defer asymmetries hardened.Tests/SwarmTests/Workflow/only containedWorkflowDurableTests.swift. The canonical workflow API (sequential, parallel + every merge strategy, route, repeatUntil, composition) had no anchored test suite — coverage was scattered across integration and regression tests. Now anchored inWorkflowTests.swift.Also bundles pre-existing in-flight maintenance work that was uncommitted in the working tree at audit time (additional regression tests,
package.jsonfor VitePress,.swiftformat,.swiftlint.yml, CI tweaks). The audit-specific changes are listed below; the rest are routine maintenance the worktree was already carrying.What changed (audit fixes)
P1 — Materially misleading public surface
Sources/Swarm/Swarm.swift— Rewrote namespace DocC. RemovedAgent.Builder()reference; both code blocks nowtrycorrectly.docs/prompts/*.md—git rm --cachedon two files that leaked past.gitignoredespite the rule.README.md— Added "Each agent resolves its own provider" sentence under the multi-agent pipeline example.P2 — Latent risks
Sources/Swarm/Core/StreamOperations.swift—.timeout(after:)now cancels the upstreamprocessingTaskwhen the timeout fires. Introduces a small privateTimeoutTaskRef(NSLock-guarded, scoped to the function) so the timeout task can hold a forward reference. Without this, upstream LLM calls / network requests continued past the consumer-visible timeout.Sources/Swarm/Workflow/Workflow.swift— Removed@unchecked SendablefromWorkflow.StepandMergeStrategy. Swift 6.2 auto-derivesSendablesinceAgentRuntime: Sendableand the closures are@Sendable. The@uncheckedsuppression is gone — the compiler now enforces.Sources/Swarm/Providers/MultiProvider.swift— Documented snapshot semantics onnonisolated public func stream(...): provider is captured at call time,setModel()does not redirect in-flight streams.Sources/Swarm/MCP/MCPClient.swift— Documented dedup invariants. Actor isolation makes the originally-flagged race benign; the comment guards future readers from re-flagging.Sources/Swarm/Tools/ParallelToolExecutor.swift— Documented dual-validation contract (pre-pass for fast-fail, per-call re-validation in the task group; both layers protect).Tests/SwarmTests/Workflow/WorkflowTests.swift(new) — Canonical 12-test suite covering sequential, parallel ×.structured/.indexed/.first/.custom, route, repeatUntil, composition.Tests/SwarmCapabilityShowcaseTests/CapabilityShowcaseTests.swift— Strengthened to assert per-scenario invariants (non-emptyid/name/summary/families) and that every required capability family is covered by at least one passing scenario, not just trusting the showcase's self-reported status.Tests/SwarmTests/Mocks/MockInferenceProvider.swift— ReplacedTask.sleep(for: .milliseconds(1))withTask.yield()for deterministic streaming.Tests/SwarmTests/Core/StreamOperationsTests.swift— AddedtimeoutCancelsUpstreamregression test asserting upstreamonTerminationfires within 500ms of the timeout — the deterministic signal that proves processingTask was cancelled.README.md— Streaming example now includes.lifecycle(.failed)case and a comment explaining whatdefault: breakelides.P3 — Polish
Sources/Swarm/Providers/MultiProvider.swift+Sources/Swarm/Agents/Agent.swift—defer { lock.unlock() }symmetry inCapabilitySnapshot.store,ProviderSnapshot.store,TimedOperationCoordinator.{install, setOperationTask, setTimeoutTask}.Sources/Swarm/Agents/Agent.swift—@_disfavoredOverloadon the two existential-tool initializers (tools: [any AnyJSONTool]) so autocomplete prefers the typedtools: [some Tool]path.README.md— Linux / FoundationModels callout near the on-device example.docs/.vitepress/config.ts— Registeredsvgas an alias forxmlin shiki config; eliminates 6× build-time warnings.What changed (in-flight worktree work also bundled)
These were uncommitted in the worktree before the audit and are being committed alongside:
.swiftformat,.swiftlint.yml, scoped to Sources/Testspackage.json,package-lock.json(VitePress build pipeline)scripts/ci/verify-remote-release.shFrameworkDXRegressionTests.swift,DocumentationFreshnessTests.swift,HandoffBehaviorTests.swift,MemoryClearGenerationTests.swift,VectorMemorySemanticRetrievalTests.swift,SemanticCompactorToolBehaviorTests.swift,WebSearchToolBehaviorTests.swift,ReadmeProviderCompileTests.swift,AgentRunGuardrailBehaviorTests.swift,PromptStringTestSupport.swiftExamples/CodeReviewer/Sources/CodeReviewer/main.swift(replacesPlaceholder.swift).github/workflows/Package.swiftswift-syntax range wideningdocs/guide/,docs/reference/Why
@unchecked Sendablewas historical scar tissue masking conformance Swift 6.2 can now prove.WorkflowTestssuite gives maintainers a single anchor for parallel/route/repeatUntil regressions instead of chasing scenario tests.Verification
swift buildSWARM_INCLUDE_DEMO=1 swift buildswift test --no-parallelswiftlint --strict --config .swiftlint.yml Sources Testsswiftformat Sources Tests --lint --config .swiftformatnpm run docs:buildTest plan
SWARM_HIVE_RUNTIME=1 SWARM_INCLUDE_HIVE=1swift run SwarmCapabilityShowcase matrixruns end-to-endimport Swarmshows the new V3 quick-startdocs/prompts/*.mdno longer ingit ls-filesReviewer notes
.tool(.completed(let call, _))won't compile" claim was a false positive — Swift allows positional binding of labeled enum cases.Issue.recordsites don't fail tests" claim was wrong —Issue.record(_:)is the Swift Testing equivalent ofXCTFail. Existing tests are correct; converting to#expect(throws:)is a separate stylistic cleanup, deferred..agents/directory in the worktree is intentionally not committed (local AI tooling config, analogous to the gitignored.claude/).Audit report
Full audit (findings, evidence, validation) is in the session transcript that produced these fixes.