A personal, always-on macOS AI assistant in the shape of the Iron Man HUD — native Swift brain, R3F particle-ring face, voice-first, locally-grounded.
Jarvis is a hybrid native + embedded-web macOS app that runs as an ambient
menu-bar presence. A Swift host owns the operating system (mic, camera,
clipboard, AppleScript, hotkeys, persistent storage) and an agent loop that
streams Claude Opus 4.7 with Ollama as a first-class local fallback.
A WKWebView mounts a React + React Three Fiber HUD — a single pulsing
particle ring plus a chat panel — purely as the rendering surface. Tools
ride on MCP (Model Context Protocol). Voice (wake-word, STT, TTS) and
memory (SQLite + FTS5 + sqlite-vec, mem0-style extraction) are fully
local: no cloud TTS/STT, no remote embeddings.
Scope: built for one user on one machine — the author. There is no multi-tenancy, no auth, and no commercial framing. The repo is public so the code is portable across the author's machines and legible to AI coding agents; outside readers are welcome to lift the agent loop, typed JSON Bus, MCP runtime, voice pipeline, R3F HUD scaffold, or the eighteen boundary-gate linters if useful — but the product is not pitched at a general audience.
Status: alpha / under active development.
developis0.1.0. The nine GSD phases shipped; the project is now in an audit-and-stabilize window plus a v1.0 typed-API migration. See Roadmap and GitHub Issues for what is done versus in flight.
Screenshots are TBD — when there is something worth showing, it lands in
assets/screenshots/ per the shot list in that
directory's README.md.
- HUD particle ring across the four agent states (idle / listening / thinking / speaking)
- Status panel (menu-bar → Status…) with the live boot-health probe set
- Dev Overlay tabs (Turns / Tools / Bus / Logs)
- Non-dismissible degradation banner (critical-severity probe failure)
A two-process, one-bridge shape: Swift owns truth, the webview is a pure view, the Bus is the only legal way they talk. Subsystems live in single-purpose SPM packages; 18 boundary-gate scripts keep the isolation honest.
flowchart TB
User[User<br/>voice / text / hotkey]
subgraph Swift["Swift host (AppKit + SwiftUI)"]
AppDel[AppDelegate<br/>install DAG + lifecycle]
Orch[AgentOrchestrator<br/>tool-loop + streaming]
Bridge[WebviewBridge<br/>typed JSON Bus]
Voice[Voice<br/>wake / STT / TTS]
Vision[Vision<br/>camera + frame-attach]
Memory[Memory<br/>SQLite + vec + mem0]
MCP[MCPRuntime<br/>helpers + in-proc tools]
Replay[Replay<br/>turn-by-turn audit log]
Boot[BootHealth<br/>NOT FAKED probes]
end
LLM["LLMProvider<br/>Anthropic Opus 4.7 / Ollama"]
subgraph Web["WKWebView (React + R3F)"]
Ring[Particle ring]
Chat[Chat panel]
DevOv[Dev Overlay]
end
User --> Voice
User --> Bridge
AppDel --> Vision
AppDel --> Orch
AppDel --> Voice
AppDel --> Boot
Voice --> Orch
Orch <--> LLM
Orch --> MCP
Orch --> Memory
Orch --> Vision
Orch --> Replay
Orch --> Bridge
Boot --> Bridge
Bridge <-->|Bus protocol| Web
The deep architecture doc is ARCHITECTURE.md (subsystem
deep-dives, anti-patterns, navigation cheat sheet). The v1.0 typed-API
contract that is currently being migrated to lives at
.planning/architecture/JARVIS-API-DESIGN-v0.2.md;
the rollout phases are in
.planning/architecture/JARVIS-API-MIGRATION-PLAN.md.
Every subsystem is one SPM package under packages/. The Bus
sits in the middle as the only legal Swift↔JS channel.
| Subsystem | What it does | Package |
|---|---|---|
| Agent loop | Claude Opus 4.7 + Ollama via LLMProvider; tool-call loop; streaming events to a single broadcaster |
packages/AgentCore |
| Bus | Typed JSON message bus over WKScriptMessageHandler; protocol-versioned; harness-parity-checked |
packages/Bus |
| Voice | openWakeWord (hey_jarvis) + Silero VAD v6.2.1 + SpeechAnalyzer STT + AVSpeech / Orpheus TTS |
packages/Voice |
| Vision | AVCaptureSession camera capture + frame-attach to a turn; T2 presence pipeline deferred | packages/Vision |
| Memory | SQLite with FTS5 + sqlite-vec (statically linked) + mem0 ADD/UPDATE/NOOP extractor on Qwen 2.5-Coder |
packages/Memory |
| MCP | Model Context Protocol runtime — separately codesigned helper apps under Contents/Helpers/ plus in-process tools |
packages/MCP |
| Replay | Per-turn audit log to a local SQLite — user input, LLM output, tool calls, tool results | packages/Replay |
| BootHealth | Eight NOT FAKED probes feeding a strict-mode severity rollup that drives HUD chrome + agent preamble | packages/AgentCore (BootHealth*) |
| HUD | React + R3F particle ring + chat panel + Dev Overlay tabs | webview/ |
Supporting packages: Config, DevOverlay,
Harness (Mock*/Stub*/Fake* doubles per the test naming
convention in CLAUDE.md), Keychain, Logging,
Shell, VoiceLog.
The Swift host installs subsystems in a fixed DAG in
AppDelegate.applicationWillFinishLaunching. Order is load-bearing —
packages/AgentCore's orchestrator constructor takes the VisionRouter (so
Vision installs first), Voice's adapters require the orchestrator and the
turn-transcript store (so Agent installs before Voice). The
scripts/check-install-order.sh gate
asserts the spawn-line ordering AND that voiceInstallTask awaits
agentInstallTask?.value (a strictly stronger guarantee than spawn-order
alone).
flowchart LR
Boot([applicationWillFinishLaunching]) --> Vision[installVision]
Boot --> Memory[installMemory]
Boot --> MCP[installMCP]
Boot --> Replay[installReplay]
Vision --> Agent[installAgent]
Memory --> Agent
MCP --> Agent
Replay --> Agent
Agent --> Voice[installVoice]
Voice --> Probes[registerBootHealthProbes]
Probes --> Bridge[Webview handshake]
Bridge --> Ready([HUD ready])
Requires Xcode 16+ on macOS 26 Tahoe (Apple Silicon). The Hardened
Runtime needs com.apple.security.cs.allow-jit for the WKWebView JIT;
SpeechAnalyzer needs com.apple.developer.speech-recognition-assets.
Both are wired in App/Jarvis.entitlements — do not strip them.
SPM packages (fast, offline, the default dev loop):
swift test --package-path packages/AgentCore
swift test --package-path packages/Bus
swift test --package-path packages/Voice
swift test --package-path packages/Memory
# … one per package; full list in CLAUDE.md → Build / test.Single-test filter: swift test --package-path packages/<Pkg> --filter <Suite>/<test>.
App target — swift test does not compile the App target under Xcode
26 (the xctest harness is upstream-broken on ad-hoc Debug bundles), so use
the wrapper that drives xcodebuild build -configuration Debug:
bash scripts/check-app-builds.shWebview (HUD bundle):
bash scripts/build-webview.sh
# or, during dev:
cd webview && pnpm install && pnpm --filter @jarvis/hud dev
cd webview && pnpm --filter @jarvis/hud test # vitestBoundary gates (18 grep-based architectural invariants — run before pushing):
for s in scripts/check-*.sh; do bash "$s" || break; doneThe gates encode invariants the type system can't — single-writer
HUDState, single-consumer orchestrator events, no evaluateJavaScript
outside the Bus, vision isolation, presence-bus disallowed in TTS path,
embedding-dim literal pinned, install order, applescript-confirmation,
Bus protocol version + harness parity, no leftover stubs, no modal
presentation, and a handful more.
Real-hardware / real-network env-flag gates (interactive, opt-in):
JARVIS_REAL_MODELS=1 swift test --package-path packages/Memory --filter MemoryRegressionCorpusTests
JARVIS_REAL_CAMERA=1 swift test --package-path packages/Vision --filter CameraCaptureRealHardwareTests
JARVIS_REAL_MODELS=1 swift test --package-path packages/Voice --filter OrpheusTTFATestsEvery user turn flows through a single broadcaster and fans out to the subscribers that need it. If a subsystem is degraded, the orchestrator injects a system-prompt preamble so the agent doesn't promise to remember things the memory store can't actually persist.
flowchart LR
In[User input<br/>text / voice transcript] --> Orch[AgentOrchestrator]
Health[BootHealth.rollup] -->|degradation preamble| Orch
Orch --> Provider[LLMProvider.stream]
Provider --> Events[(streaming events<br/>token / thinking / tool-use / stop)]
Events --> Bcast[Broadcaster<br/>single consumer]
Bcast --> Tx[Transcript]
Bcast --> Mem[Memory extractor]
Bcast --> Dev[DevOverlay]
Bcast --> Voice[Voice / TTS]
Bcast --> Bus[WebviewBridge → HUD]
Orch --> Tools[MCP tool calls]
Tools --> Provider
Eight probes run on a strict-mode escalation ladder:
ok → soft → loud → critical. .unknown(reason:) is honest absence; a
probe never returns .ok when the dependency is actually broken. Critical
failures turn the menu-bar icon red, raise a non-dismissible HUD
banner, and inject the degradation preamble into the agent's system
prompt.
flowchart LR
M[MemoryBootHealthProbe<br/>vec0 + FTS5 reachable]
A[AnthropicBootHealthProbe<br/>keychain has API key]
O[OllamaBootHealthProbe<br/>service reachable + model loaded]
Vc[VoiceBootHealthProbe<br/>audio graph alive]
Vi[VisionBootHealthProbe<br/>AVCapture authorized]
Mc[MCPBootHealthProbe<br/>helper apps + in-proc tools]
R[ReplayBootHealthProbe<br/>WAL writable]
W[WebviewBootHealthProbe<br/>handshake completed]
Roll[BootHealthOrchestrator.rollup]
Icon[Menu-bar icon<br/>red on critical]
Banner[Non-dismissible banner]
Pre[Agent system-prompt<br/>degradation preamble]
M --> Roll
A --> Roll
O --> Roll
Vc --> Roll
Vi --> Roll
Mc --> Roll
R --> Roll
W --> Roll
Roll --> Icon
Roll --> Banner
Roll --> Pre
A few opinions, encoded into the codebase as gates and conventions:
- In-code documentation only. No sprawling
/docsthat goes stale the day after it's written. The code teaches itself or it doesn't get taught. The full philosophy lives in.planning/code-commenting-guide.md. This README is the one exception — it is documentation of the documentation strategy, which is allowed. - NOT FAKED probes. Every subsystem has a live-probe boot-health
check.
.unknown(reason:)is the honest answer when we can't tell;.okrequires evidence. The audit-and-stabilize discipline came from finding too many "wired but dead" subsystems that looked installed but had no observable signal. - Strict-mode escalation. Soft failures are logged; loud failures surface in the Status panel; critical failures take over the chrome and the agent's system prompt. The user always knows when Jarvis is degraded.
- 18 boundary gates encode architectural invariants the type system
can't (single-writer state, isolated subsystems, no rogue
evaluateJavaScript, pinned embedding dim, install order, …). - Three-name test-double taxonomy.
Mock*records + scripts;Stub*returns canned data;Fake*is a behaviorally-realistic alternative implementation. Pick by role, not by sibling. - Audit-and-stabilize cadence. Multi-agent audits land under
.planning/audit-YYYY-MM-DD/(most recent:audit-2026-05-12) and surface wired-but-dead bugs as GitHub Issues with severity + area labels. - GitHub Issues is canonical. All bugs, audit findings, and migration
work live as issues.
docs/TODO.mdis historical only.
The roadmap is owned by GitHub Milestones — user-outcome-focused, not subsystem-focused:
| Milestone | Theme |
|---|---|
| v0.1 — It actually works | Closes the CRITICAL audit findings that break basic capabilities (voice, vision, facts, dialogs). Some get re-fixed during the API migration; v0.1 ships working NOW. |
| v0.2 — Observable + stable | Everything CRITICAL/HIGH from the 2026-05-12 audits that isn't a basic-capability blocker; tightens the observability surface; closes wired-but-dead gaps. |
| v0.5 — API M-0..M-3 | Scaffold JarvisAPI package, dual-version Bus handshake, then land Self / Settings / Diagnostics typed surfaces. HUD state moves to Diagnostics. |
| v0.8 — API M-4..M-7 | Memory / Vision / Voice / Turn typed surfaces. Per-turn OutboundBatcher partitioning. Webview retires the sessionHistory push. Bus protocol bumps to 2.4.0 strict after a 3-day soak. |
| v1.0 — Shipping | Code-commenting style applied to load-bearing files. Stale plan-id gate. Doc reconciliation. First-launch experience polish. |
| v1.1 — Backlog | Orpheus TTS tier 2 activation. WhisperKit STT fallback. Voice Log menu. Vision T2 sidecar. Presence pipeline. Advanced polish. |
Day-to-day session handoff still lives in
docs/SESSION-STATE.md; milestone-level state in
.planning/STATE.md. Per-phase artifacts under
.planning/phases/<N>/.
| Path | Purpose |
|---|---|
App/ |
Swift host — AppDelegate, MenuBar, HUD window, Settings, Wizard, MCP runtime wiring |
packages/ |
14 Swift packages (SPM) — one per subsystem |
webview/ |
pnpm workspace — @jarvis/bus (TS mirror of the Swift Bus) + @jarvis/hud (R3F bundle) |
mcp-servers/ |
Out-of-process MCP helper apps — mcp-time, mcp-clipboard, mcp-applescript |
scripts/ |
Build/codesign scripts and the 18 boundary-gate linters |
Resources/ |
Bundled assets — voice ONNX models, HUD bundle target, app icon source |
docs/ |
Live cross-session state — SESSION-STATE.md, archived TODO.md |
.planning/ |
GSD workflow artifacts — phase plans, audit reports, research syntheses, architecture |
assets/ |
Screenshots + diagrams (populated opportunistically) |
ARCHITECTURE.md— deep architecture docCONTRIBUTING.md— dev environment setup, GSD workflow, commit conventionsCLAUDE.md— settled architectural decisions, AI-agent collaboration conventions, issue-tracking policy.planning/code-commenting-guide.md— the in-code documentation philosophy.planning/architecture/JARVIS-API-DESIGN-v0.2.md— v1.0 typed-API contract (locked).planning/architecture/JARVIS-API-MIGRATION-PLAN.md— M-0..M-7 rolloutdocs/SESSION-STATE.md— current session state
This is a solo project. If you've stumbled across it and want to discuss
something, the welcoming move is to
open a GitHub issue.
Style, commit conventions, and the GSD workflow live in
CONTRIBUTING.md.
Jarvis is released under the MIT License. See
.planning/decisions/license.md for
the rationale (solo personal project; no patent strategy or SaaS
surface to protect; MIT considered against Apache-2.0 / MPL-2.0 /
AGPL-3.0 and chosen for minimum friction).
Jarvis stands on a stack of excellent open-source work. The most load-bearing dependencies:
LLM + agents
- Anthropic Messages API — Claude Opus 4.7 as the primary reasoning model.
- Ollama — local LLM runtime; Qwen 2.5-Coder for tool-
calling,
nomic-embed-textfor memory embeddings. - Model Context Protocol Swift SDK
— typed
Client/Server/StdioTransportso we don't roll our own JSON-RPC framing.
Voice (fully local)
- openWakeWord —
hey_jarvispretrained model running through ONNX Runtime. - Silero VAD v6.2.1 — voice activity detection.
onnxruntime-swift-package-manager— wake-word + VAD inference in-process.argmax-oss-swift— WhisperKit STT fallback + TTSKit streaming alternative.mlx-audio-swift— Orpheus TTS via MLX, no Python sidecar.
Memory
- SQLite + FTS5 — keyword search and the canonical store.
sqlite-vec(viasqlitevecSPM) — statically-linked vec0 vector extension.
HUD
- React Three Fiber + Three.js — particle ring, holographic chrome, agent-state-reactive shaders.
- pnpm + Vite — webview workspace
- bundle.
Swift platform
swift-log— structured logging feeding the Dev Overlay's Logs tab.swift-nio— async networking primitives under MCP and HTTP providers.yams— YAML config parsing.
The full dependency graph lives in each package's Package.resolved.