Skip to content

Apply 18 audit fixes: DocC, concurrency, tests, lint#78

Merged
christopherkarani merged 5 commits into
mainfrom
audit/fix-identified-issues
May 1, 2026
Merged

Apply 18 audit fixes: DocC, concurrency, tests, lint#78
christopherkarani merged 5 commits into
mainfrom
audit/fix-identified-issues

Conversation

@christopherkarani
Copy link
Copy Markdown
Owner

Summary

Resolves all P1, P2, and P3 findings from the framework audit. Three buckets:

  1. DocC + README drift — the namespace-level DocC in Swarm.swift advertised the deprecated Agent.Builder() pattern with non-throwing initializers. The first thing every Xcode user sees was wrong. Now matches the V3 trailing-closure canonical.
  2. Latent concurrency risks.timeout(after:) on a stream finished the consumer continuation but left the upstream processingTask running, leaking tokens / network requests past the user-visible timeout. Workflow.Step and MergeStrategy carried @unchecked Sendable that Swift 6.2 no longer needs (auto-derivation works). Lock-without-defer asymmetries hardened.
  3. Coverage gapsTests/SwarmTests/Workflow/ only contained WorkflowDurableTests.swift. The canonical workflow API (sequential, parallel + every merge strategy, route, repeatUntil, composition) had no anchored test suite — coverage was scattered across integration and regression tests. Now anchored in WorkflowTests.swift.

Also bundles pre-existing in-flight maintenance work that was uncommitted in the working tree at audit time (additional regression tests, package.json for VitePress, .swiftformat, .swiftlint.yml, CI tweaks). The audit-specific changes are listed below; the rest are routine maintenance the worktree was already carrying.

What changed (audit fixes)

P1 — Materially misleading public surface

  • Sources/Swarm/Swarm.swift — Rewrote namespace DocC. Removed Agent.Builder() reference; both code blocks now try correctly.
  • docs/prompts/*.mdgit rm --cached on two files that leaked past .gitignore despite the rule.
  • README.md — Added "Each agent resolves its own provider" sentence under the multi-agent pipeline example.

P2 — Latent risks

  • Sources/Swarm/Core/StreamOperations.swift.timeout(after:) now cancels the upstream processingTask when the timeout fires. Introduces a small private TimeoutTaskRef (NSLock-guarded, scoped to the function) so the timeout task can hold a forward reference. Without this, upstream LLM calls / network requests continued past the consumer-visible timeout.
  • Sources/Swarm/Workflow/Workflow.swift — Removed @unchecked Sendable from Workflow.Step and MergeStrategy. Swift 6.2 auto-derives Sendable since AgentRuntime: Sendable and the closures are @Sendable. The @unchecked suppression is gone — the compiler now enforces.
  • Sources/Swarm/Providers/MultiProvider.swift — Documented snapshot semantics on nonisolated public func stream(...): provider is captured at call time, setModel() does not redirect in-flight streams.
  • Sources/Swarm/MCP/MCPClient.swift — Documented dedup invariants. Actor isolation makes the originally-flagged race benign; the comment guards future readers from re-flagging.
  • Sources/Swarm/Tools/ParallelToolExecutor.swift — Documented dual-validation contract (pre-pass for fast-fail, per-call re-validation in the task group; both layers protect).
  • Tests/SwarmTests/Workflow/WorkflowTests.swift (new) — Canonical 12-test suite covering sequential, parallel × .structured/.indexed/.first/.custom, route, repeatUntil, composition.
  • Tests/SwarmCapabilityShowcaseTests/CapabilityShowcaseTests.swift — Strengthened to assert per-scenario invariants (non-empty id/name/summary/families) and that every required capability family is covered by at least one passing scenario, not just trusting the showcase's self-reported status.
  • Tests/SwarmTests/Mocks/MockInferenceProvider.swift — Replaced Task.sleep(for: .milliseconds(1)) with Task.yield() for deterministic streaming.
  • Tests/SwarmTests/Core/StreamOperationsTests.swift — Added timeoutCancelsUpstream regression test asserting upstream onTermination fires within 500ms of the timeout — the deterministic signal that proves processingTask was cancelled.
  • README.md — Streaming example now includes .lifecycle(.failed) case and a comment explaining what default: break elides.

P3 — Polish

  • Sources/Swarm/Providers/MultiProvider.swift + Sources/Swarm/Agents/Agent.swiftdefer { lock.unlock() } symmetry in CapabilitySnapshot.store, ProviderSnapshot.store, TimedOperationCoordinator.{install, setOperationTask, setTimeoutTask}.
  • Sources/Swarm/Agents/Agent.swift@_disfavoredOverload on the two existential-tool initializers (tools: [any AnyJSONTool]) so autocomplete prefers the typed tools: [some Tool] path.
  • README.md — Linux / FoundationModels callout near the on-device example.
  • docs/.vitepress/config.ts — Registered svg as an alias for xml in shiki config; eliminates 6× build-time warnings.

What changed (in-flight worktree work also bundled)

These were uncommitted in the worktree before the audit and are being committed alongside:

  • .swiftformat, .swiftlint.yml, scoped to Sources/Tests
  • package.json, package-lock.json (VitePress build pipeline)
  • scripts/ci/verify-remote-release.sh
  • New regression tests: FrameworkDXRegressionTests.swift, DocumentationFreshnessTests.swift, HandoffBehaviorTests.swift, MemoryClearGenerationTests.swift, VectorMemorySemanticRetrievalTests.swift, SemanticCompactorToolBehaviorTests.swift, WebSearchToolBehaviorTests.swift, ReadmeProviderCompileTests.swift, AgentRunGuardrailBehaviorTests.swift, PromptStringTestSupport.swift
  • Examples/CodeReviewer/Sources/CodeReviewer/main.swift (replaces Placeholder.swift)
  • CI workflow updates in .github/workflows/
  • Package.swift swift-syntax range widening
  • Doc updates in docs/guide/, docs/reference/

Why

  • The audit found that the framework's first-impression DocC inside Xcode advertised a deprecated pattern. That's the highest-leverage single fix.
  • Stream timeout was leaking upstream consumption — billing / rate-limit risk on real providers.
  • Workflow @unchecked Sendable was historical scar tissue masking conformance Swift 6.2 can now prove.
  • Adding a focused WorkflowTests suite gives maintainers a single anchor for parallel/route/repeatUntil regressions instead of chasing scenario tests.

Verification

Gate Result
swift build ✅ pass (12.6s)
SWARM_INCLUDE_DEMO=1 swift build ✅ pass
swift test --no-parallel 1689 tests in 232 suites (was 1676/231)
swiftlint --strict --config .swiftlint.yml Sources Tests ✅ pass
swiftformat Sources Tests --lint --config .swiftformat ✅ pass
npm run docs:build ✅ pass — SVG warnings gone

Test plan

  • Verify CI green on macOS-15 + Ubuntu (Linux) jobs
  • Confirm Hive integration tests pass with SWARM_HIVE_RUNTIME=1 SWARM_INCLUDE_HIVE=1
  • Confirm swift run SwarmCapabilityShowcase matrix runs end-to-end
  • Spot-check Xcode quick-help on import Swarm shows the new V3 quick-start
  • Confirm docs/prompts/*.md no longer in git ls-files

Reviewer notes

  • Subagent A's "lock-without-defer is P1 deadlock" claim was a false positive — Swift assignment cannot throw. The fix is style symmetry, downgraded to P3.
  • Subagent B's ".tool(.completed(let call, _)) won't compile" claim was a false positive — Swift allows positional binding of labeled enum cases.
  • Subagent C's "150 Issue.record sites don't fail tests" claim was wrongIssue.record(_:) is the Swift Testing equivalent of XCTFail. Existing tests are correct; converting to #expect(throws:) is a separate stylistic cleanup, deferred.
  • The Clock-injection refactor (deterministic CircuitBreaker / RetryPolicy tests) and the XCTest → Swift Testing migration are deferred as larger cleanups — not included here.
  • .agents/ directory in the worktree is intentionally not committed (local AI tooling config, analogous to the gitignored .claude/).

Audit report

Full audit (findings, evidence, validation) is in the session transcript that produced these fixes.

Resolves all P1-P3 findings from the framework audit:

P1 — Materially misleading public surface
- Rewrite Swarm.swift namespace DocC quick-start (was promoting deprecated
  Agent.Builder + non-throwing init).
- Untrack docs/prompts/*.md that leaked past .gitignore.
- Add provider-resolution sentence to README multi-agent example.

P2 — Latent risks and weak coverage
- Stream timeout now cancels upstream processingTask so token/network
  consumption stops when the consumer-visible timeout fires.
- Remove @unchecked Sendable from Workflow.Step and MergeStrategy
  (Swift 6.2 auto-derives correctly given AgentRuntime: Sendable).
- Document MultiProvider snapshot semantics: in-flight streams capture
  provider at call time and are not redirected by setModel().
- Document MCPClient cache-dedup invariants (actor isolation makes the
  flagged race benign; comment guards against future re-flagging).
- Document ParallelToolExecutor dual-validation contract.
- Add canonical Tests/SwarmTests/Workflow/WorkflowTests.swift covering
  sequential, all 4 parallel merge strategies, route, repeatUntil, and
  composition. Strengthen CapabilityShowcaseTests to assert per-scenario
  invariants and family coverage instead of trusting status flag.
- Replace Task.sleep with Task.yield in MockInferenceProvider streaming.
- Add streaming default-case clarifying comment in README.
- Add timeout regression test asserting upstream onTermination fires.

P3 — Polish
- defer { lock.unlock() } symmetry in CapabilitySnapshot, ProviderSnapshot,
  and TimedOperationCoordinator.
- @_disfavoredOverload on existential-tool Agent inits so autocomplete
  prefers the typed [some Tool] path.
- Linux/FoundationModels callout in README.
- Register svg as xml alias in VitePress config to silence build warnings.

Verification: swift build, swift test (1689 tests, was 1676), swiftlint
--strict, swiftformat --lint, npm run docs:build, SWARM_INCLUDE_DEMO=1
swift build — all green.

Also bundles pre-existing in-flight tests, lint configs, package.json,
docs updates, and CI tweaks that were uncommitted in the working tree
at audit time.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f434e9561

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1003 to +1007
if let globalProvider = await Swarm.defaultProvider,
let provider = privateInferenceProvider(globalProvider)
{
return provider
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include cloudProvider in privacy-required resolution path

When privacyRequired is enabled, this resolver checks Foundation Models, explicit provider, environment provider, and Swarm.defaultProvider, then immediately fails; it never considers Swarm.cloudProvider. That regresses valid setups where the only configured private-capable provider is the cloud provider (especially tool/handoff flows), causing inferenceProviderUnavailable even though a compliant provider exists.

Useful? React with 👍 / 👎.

Comment thread Sources/Swarm/Agents/Agent.swift Outdated
Comment on lines +748 to +750
await withCheckedContinuation { continuation in
waiters[key, default: []].append(continuation)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor cancellation while waiting for default-memory session lock

This wait path parks canceled tasks in waiters and resumes them later without any cancellation check. If a run is canceled while queued behind another session, it can still wake up, claim the run slot, and perform side effects (including memory clearing and startup work) before cancellation is observed deeper in execution. The wait needs to be cancellation-aware (or immediately re-check cancellation after resume).

Useful? React with 👍 / 👎.

Fix #1 (P1) — `resolvedPrivateInferenceProvider` now consults
`Swarm.cloudProvider` after `Swarm.defaultProvider`, mirroring the
non-private resolver. The same `privateInferenceProvider(_:)` capability
filter ensures we only return it if it reports `.privateInference`.
Previously, an operator who configured a privacy-capable provider via
`Swarm.configure(cloudProvider: ...)` would still hit
`inferenceProviderUnavailable` when `privacyRequired = true`, even though
a compliant provider was registered.

Fix #2 (P2) — `DefaultMemorySessionTracker.beginRun` is now cancellation-
aware:

- signature changed to `async throws -> Bool`
- waiters re-keyed by per-call UUID so cancellation can target the exact
  parked task without disturbing siblings
- continuation type widened to `CheckedContinuation<Void, Error>` so
  cancellation can resume with `CancellationError`
- `withTaskCancellationHandler` wakes a parked task immediately on cancel
  rather than letting it sleep until the holder calls `endRun`
- `Task.checkCancellation()` at top of wait loop *and* after exit, closing
  the race where `endRun` resumes a continuation just as the parent task
  is cancelled

Without these, a cancelled task could wake up later, claim the run slot,
and perform memory-clear / lifecycle side effects before the cancellation
surfaced deeper in execution.

Caller in `runInternal` updated to `try await` the new throwing signature.

Regression tests in `Tests/SwarmTests/Agents/AgentCodexReviewFixesTests.swift`
cover both fixes.

Verification: swift build, swift test (1692 tests, was 1689), swiftlint
--strict, swiftformat --lint — all green.
Resolves conflicts:
- Sources/Swarm/Agents/Agent.swift — keep both `resolvedPrivateInferenceProvider`
  (Codex Fix #1) and `transformedInferenceProvider` (OpenTelemetry tracing
  from #77). Wrap each return in the private resolver through the new
  transform helper so privacy-required runs also receive OpenTelemetry
  instrumentation.
- docs/reference/overview.md — keep validated /reference/* link paths
  (verified by `npm run docs:build`); adopt the OpenTelemetry mention in
  the Observability row and the MultiProvider mention in the Providers row
  from origin/main.
- docs/reference/api-catalog.md — bump scanned-source-file count from 152
  to 153 to account for the new InferenceProviderMetadata.swift file
  introduced by the merged OpenTelemetry work, satisfying
  DocumentationFreshnessTests.

Verification: swift build, swift test (1699 tests, was 1692), swiftlint
--strict, swiftformat --lint — all green.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b3ef47d44

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 448 to 450
try? await Task.sleep(for: duration)
processingTaskRef.task?.cancel()
continuation.finish(throwing: AgentError.timeout(duration: duration))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return on cancelled timeout task before finishing with timeout

The timeout worker now uses try? await Task.sleep(for:), which swallows CancellationError and still executes continuation.finish(throwing: AgentError.timeout(...)). When the upstream stream completes first, it cancels timeoutTask; this cancelled timeout task can still race and attempt to terminate the stream as a timeout, producing an incorrect timeout failure for a run that actually completed normally. Treat cancellation as an early return in the timeout task (instead of try?) so only real deadline expiry emits the timeout error.

Useful? React with 👍 / 👎.

Codex review on 2b3ef47 flagged that the timeout body used
`try? await Task.sleep(for:)`, which swallows `CancellationError`. When
the upstream stream completes first, `processingTask` cancels
`timeoutTask`; with `try?`, the cancelled timeout body would still run
to completion — calling `processingTaskRef.task?.cancel()` (harmless
no-op) and then `continuation.finish(throwing: AgentError.timeout(...))`
in a tight race against the upstream's natural `continuation.finish()`.
The first finish wins, so a run that completed normally could surface
as a timeout error.

Switching back to plain `try` lets `CancellationError` propagate out of
the Task body, so the timeout side-effects only run when the deadline
actually expired (sleep returned successfully). The added comment
documents the contract so future edits don't reintroduce the regression.

Test: existing `timeoutCancelsUpstream` regression test still passes —
it exercises the real-deadline path, which is unchanged.
@christopherkarani christopherkarani merged commit eedae7a into main May 1, 2026
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant