Skip to content

Testing

Alan Wizemann edited this page Apr 20, 2026 · 4 revisions

Testing

Honest disclosure: Scarf's test coverage is minimal. The scarfTests/ and scarfUITests/ targets exist but contain placeholder tests only. The project has historically relied on dogfooding — the maintainer runs Scarf against their own daily Hermes install and against a remote dogfooding host.

This page documents what's in place and where contributions would help most.

Frameworks

When tests are added, the standard is Swift Testing (@Suite / @Test macros), not XCTest. Per the project conventions (CLAUDE.md):

  • Use @Suite and @Test macros for all new tests.
  • Protocol-oriented services for testability — the ServerTransport protocol is the obvious mocking seam.
  • No timing-dependent tests: use polling with early exit, not Task.sleep + assertion.
  • Singleton state isolation: call cleanup methods + await Task.yield() before assertions.
  • No print() in production code — use os.Logger. print() is fine in #Preview and test helpers.

Running

xcodebuild test -project scarf/scarf.xcodeproj -scheme scarf

Or in Xcode: ⌘U.

What would be high-value to add

If you're looking for a contribution, these are the gaps that would matter most:

  1. ServerTransport mock + LocalTransport smoke tests — every service depends on transport, so a MockTransport unlocks unit-testing all of them.
  2. HermesEnvService round-trip tests — non-destructive .env editing has tricky comment / blank-line preservation; would benefit from regression coverage.
  3. ACPClient JSON-RPC framing tests — feed canned JSON-RPC byte streams and assert events emitted.
  4. HermesPathSet path-resolution tests — local vs. remote home, binary hint precedence.
  5. HermesConfig decoding tests — load representative config.yaml fixtures and check field mapping.
  6. SSHTransport shell-quoting testsshellQuote and remotePathArg are correctness-critical and pure functions.

Manual verification flows

For any change that touches behavior, here's the manual checklist the maintainer runs before tagging a release:

  • Open a local window — Dashboard loads, Sessions browser populates, Memory editor opens.
  • Open a remote window — same Dashboard / Sessions / Memory, but against the dogfooding host.
  • Send a Rich Chat message — receives a streamed response, reasoning shows if the model emits it.
  • Edit and save a memory file — change appears in Hermes on next agent turn.
  • Run a Cron job — appears in the Cron view, has correct delivery channel.
  • Toggle a tool in Tools — hermes tools enable/disable runs and the dot color updates.

Why so little automated coverage?

The app is a thin GUI over Hermes — most behavior depends on (a) the OS file system, (b) SQLite, (c) SSH, (d) a long-running subprocess speaking JSON-RPC. Mocking these well is non-trivial and historically the cost has been higher than the bug rate justified. The transport protocol now makes it cheaper; the gap is finally worth filling.


Last updated: 2026-04-20 — Scarf v2.0.1

Clone this wiki locally