Skip to content

KofTwentyTwo/Jarvis

Repository files navigation

Jarvis

A personal, always-on macOS AI assistant in the shape of the Iron Man HUD — native Swift brain, R3F particle-ring face, voice-first, locally-grounded.

License: MIT Last commit Issues Platform: macOS 26 Swift 6

Jarvis is a hybrid native + embedded-web macOS app that runs as an ambient menu-bar presence. A Swift host owns the operating system (mic, camera, clipboard, AppleScript, hotkeys, persistent storage) and an agent loop that streams Claude Opus 4.7 with Ollama as a first-class local fallback. A WKWebView mounts a React + React Three Fiber HUD — a single pulsing particle ring plus a chat panel — purely as the rendering surface. Tools ride on MCP (Model Context Protocol). Voice (wake-word, STT, TTS) and memory (SQLite + FTS5 + sqlite-vec, mem0-style extraction) are fully local: no cloud TTS/STT, no remote embeddings.

Scope: built for one user on one machine — the author. There is no multi-tenancy, no auth, and no commercial framing. The repo is public so the code is portable across the author's machines and legible to AI coding agents; outside readers are welcome to lift the agent loop, typed JSON Bus, MCP runtime, voice pipeline, R3F HUD scaffold, or the eighteen boundary-gate linters if useful — but the product is not pitched at a general audience.

Status: alpha / under active development. develop is 0.1.0. The nine GSD phases shipped; the project is now in an audit-and-stabilize window plus a v1.0 typed-API migration. See Roadmap and GitHub Issues for what is done versus in flight.

Screenshots

Screenshots are TBD — when there is something worth showing, it lands in assets/screenshots/ per the shot list in that directory's README.md.

  • HUD particle ring across the four agent states (idle / listening / thinking / speaking)
  • Status panel (menu-bar → Status…) with the live boot-health probe set
  • Dev Overlay tabs (Turns / Tools / Bus / Logs)
  • Non-dismissible degradation banner (critical-severity probe failure)

Architecture at a glance

A two-process, one-bridge shape: Swift owns truth, the webview is a pure view, the Bus is the only legal way they talk. Subsystems live in single-purpose SPM packages; 18 boundary-gate scripts keep the isolation honest.

flowchart TB
    User[User<br/>voice / text / hotkey]

    subgraph Swift["Swift host (AppKit + SwiftUI)"]
        AppDel[AppDelegate<br/>install DAG + lifecycle]
        Orch[AgentOrchestrator<br/>tool-loop + streaming]
        Bridge[WebviewBridge<br/>typed JSON Bus]
        Voice[Voice<br/>wake / STT / TTS]
        Vision[Vision<br/>camera + frame-attach]
        Memory[Memory<br/>SQLite + vec + mem0]
        MCP[MCPRuntime<br/>helpers + in-proc tools]
        Replay[Replay<br/>turn-by-turn audit log]
        Boot[BootHealth<br/>NOT FAKED probes]
    end

    LLM["LLMProvider<br/>Anthropic Opus 4.7 / Ollama"]

    subgraph Web["WKWebView (React + R3F)"]
        Ring[Particle ring]
        Chat[Chat panel]
        DevOv[Dev Overlay]
    end

    User --> Voice
    User --> Bridge
    AppDel --> Vision
    AppDel --> Orch
    AppDel --> Voice
    AppDel --> Boot
    Voice --> Orch
    Orch <--> LLM
    Orch --> MCP
    Orch --> Memory
    Orch --> Vision
    Orch --> Replay
    Orch --> Bridge
    Boot --> Bridge
    Bridge <-->|Bus protocol| Web
Loading

The deep architecture doc is ARCHITECTURE.md (subsystem deep-dives, anti-patterns, navigation cheat sheet). The v1.0 typed-API contract that is currently being migrated to lives at .planning/architecture/JARVIS-API-DESIGN-v0.2.md; the rollout phases are in .planning/architecture/JARVIS-API-MIGRATION-PLAN.md.

Subsystems

Every subsystem is one SPM package under packages/. The Bus sits in the middle as the only legal Swift↔JS channel.

Subsystem What it does Package
Agent loop Claude Opus 4.7 + Ollama via LLMProvider; tool-call loop; streaming events to a single broadcaster packages/AgentCore
Bus Typed JSON message bus over WKScriptMessageHandler; protocol-versioned; harness-parity-checked packages/Bus
Voice openWakeWord (hey_jarvis) + Silero VAD v6.2.1 + SpeechAnalyzer STT + AVSpeech / Orpheus TTS packages/Voice
Vision AVCaptureSession camera capture + frame-attach to a turn; T2 presence pipeline deferred packages/Vision
Memory SQLite with FTS5 + sqlite-vec (statically linked) + mem0 ADD/UPDATE/NOOP extractor on Qwen 2.5-Coder packages/Memory
MCP Model Context Protocol runtime — separately codesigned helper apps under Contents/Helpers/ plus in-process tools packages/MCP
Replay Per-turn audit log to a local SQLite — user input, LLM output, tool calls, tool results packages/Replay
BootHealth Eight NOT FAKED probes feeding a strict-mode severity rollup that drives HUD chrome + agent preamble packages/AgentCore (BootHealth*)
HUD React + R3F particle ring + chat panel + Dev Overlay tabs webview/

Supporting packages: Config, DevOverlay, Harness (Mock*/Stub*/Fake* doubles per the test naming convention in CLAUDE.md), Keychain, Logging, Shell, VoiceLog.

Install order

The Swift host installs subsystems in a fixed DAG in AppDelegate.applicationWillFinishLaunching. Order is load-bearingpackages/AgentCore's orchestrator constructor takes the VisionRouter (so Vision installs first), Voice's adapters require the orchestrator and the turn-transcript store (so Agent installs before Voice). The scripts/check-install-order.sh gate asserts the spawn-line ordering AND that voiceInstallTask awaits agentInstallTask?.value (a strictly stronger guarantee than spawn-order alone).

flowchart LR
    Boot([applicationWillFinishLaunching]) --> Vision[installVision]
    Boot --> Memory[installMemory]
    Boot --> MCP[installMCP]
    Boot --> Replay[installReplay]
    Vision --> Agent[installAgent]
    Memory --> Agent
    MCP --> Agent
    Replay --> Agent
    Agent --> Voice[installVoice]
    Voice --> Probes[registerBootHealthProbes]
    Probes --> Bridge[Webview handshake]
    Bridge --> Ready([HUD ready])
Loading

Build & run

Requires Xcode 16+ on macOS 26 Tahoe (Apple Silicon). The Hardened Runtime needs com.apple.security.cs.allow-jit for the WKWebView JIT; SpeechAnalyzer needs com.apple.developer.speech-recognition-assets. Both are wired in App/Jarvis.entitlements — do not strip them.

SPM packages (fast, offline, the default dev loop):

swift test --package-path packages/AgentCore
swift test --package-path packages/Bus
swift test --package-path packages/Voice
swift test --package-path packages/Memory
# … one per package; full list in CLAUDE.md → Build / test.

Single-test filter: swift test --package-path packages/<Pkg> --filter <Suite>/<test>.

App targetswift test does not compile the App target under Xcode 26 (the xctest harness is upstream-broken on ad-hoc Debug bundles), so use the wrapper that drives xcodebuild build -configuration Debug:

bash scripts/check-app-builds.sh

Webview (HUD bundle):

bash scripts/build-webview.sh
# or, during dev:
cd webview && pnpm install && pnpm --filter @jarvis/hud dev
cd webview && pnpm --filter @jarvis/hud test    # vitest

Boundary gates (18 grep-based architectural invariants — run before pushing):

for s in scripts/check-*.sh; do bash "$s" || break; done

The gates encode invariants the type system can't — single-writer HUDState, single-consumer orchestrator events, no evaluateJavaScript outside the Bus, vision isolation, presence-bus disallowed in TTS path, embedding-dim literal pinned, install order, applescript-confirmation, Bus protocol version + harness parity, no leftover stubs, no modal presentation, and a handful more.

Real-hardware / real-network env-flag gates (interactive, opt-in):

JARVIS_REAL_MODELS=1 swift test --package-path packages/Memory --filter MemoryRegressionCorpusTests
JARVIS_REAL_CAMERA=1 swift test --package-path packages/Vision --filter CameraCaptureRealHardwareTests
JARVIS_REAL_MODELS=1 swift test --package-path packages/Voice --filter OrpheusTTFATests

Agent turn lifecycle

Every user turn flows through a single broadcaster and fans out to the subscribers that need it. If a subsystem is degraded, the orchestrator injects a system-prompt preamble so the agent doesn't promise to remember things the memory store can't actually persist.

flowchart LR
    In[User input<br/>text / voice transcript] --> Orch[AgentOrchestrator]
    Health[BootHealth.rollup] -->|degradation preamble| Orch
    Orch --> Provider[LLMProvider.stream]
    Provider --> Events[(streaming events<br/>token / thinking / tool-use / stop)]
    Events --> Bcast[Broadcaster<br/>single consumer]
    Bcast --> Tx[Transcript]
    Bcast --> Mem[Memory extractor]
    Bcast --> Dev[DevOverlay]
    Bcast --> Voice[Voice / TTS]
    Bcast --> Bus[WebviewBridge → HUD]
    Orch --> Tools[MCP tool calls]
    Tools --> Provider
Loading

Boot-health probes

Eight probes run on a strict-mode escalation ladder: ok → soft → loud → critical. .unknown(reason:) is honest absence; a probe never returns .ok when the dependency is actually broken. Critical failures turn the menu-bar icon red, raise a non-dismissible HUD banner, and inject the degradation preamble into the agent's system prompt.

flowchart LR
    M[MemoryBootHealthProbe<br/>vec0 + FTS5 reachable]
    A[AnthropicBootHealthProbe<br/>keychain has API key]
    O[OllamaBootHealthProbe<br/>service reachable + model loaded]
    Vc[VoiceBootHealthProbe<br/>audio graph alive]
    Vi[VisionBootHealthProbe<br/>AVCapture authorized]
    Mc[MCPBootHealthProbe<br/>helper apps + in-proc tools]
    R[ReplayBootHealthProbe<br/>WAL writable]
    W[WebviewBootHealthProbe<br/>handshake completed]
    Roll[BootHealthOrchestrator.rollup]
    Icon[Menu-bar icon<br/>red on critical]
    Banner[Non-dismissible banner]
    Pre[Agent system-prompt<br/>degradation preamble]
    M --> Roll
    A --> Roll
    O --> Roll
    Vc --> Roll
    Vi --> Roll
    Mc --> Roll
    R --> Roll
    W --> Roll
    Roll --> Icon
    Roll --> Banner
    Roll --> Pre
Loading

Project culture

A few opinions, encoded into the codebase as gates and conventions:

  • In-code documentation only. No sprawling /docs that goes stale the day after it's written. The code teaches itself or it doesn't get taught. The full philosophy lives in .planning/code-commenting-guide.md. This README is the one exception — it is documentation of the documentation strategy, which is allowed.
  • NOT FAKED probes. Every subsystem has a live-probe boot-health check. .unknown(reason:) is the honest answer when we can't tell; .ok requires evidence. The audit-and-stabilize discipline came from finding too many "wired but dead" subsystems that looked installed but had no observable signal.
  • Strict-mode escalation. Soft failures are logged; loud failures surface in the Status panel; critical failures take over the chrome and the agent's system prompt. The user always knows when Jarvis is degraded.
  • 18 boundary gates encode architectural invariants the type system can't (single-writer state, isolated subsystems, no rogue evaluateJavaScript, pinned embedding dim, install order, …).
  • Three-name test-double taxonomy. Mock* records + scripts; Stub* returns canned data; Fake* is a behaviorally-realistic alternative implementation. Pick by role, not by sibling.
  • Audit-and-stabilize cadence. Multi-agent audits land under .planning/audit-YYYY-MM-DD/ (most recent: audit-2026-05-12) and surface wired-but-dead bugs as GitHub Issues with severity + area labels.
  • GitHub Issues is canonical. All bugs, audit findings, and migration work live as issues. docs/TODO.md is historical only.

Roadmap

The roadmap is owned by GitHub Milestones — user-outcome-focused, not subsystem-focused:

Milestone Theme
v0.1It actually works Closes the CRITICAL audit findings that break basic capabilities (voice, vision, facts, dialogs). Some get re-fixed during the API migration; v0.1 ships working NOW.
v0.2Observable + stable Everything CRITICAL/HIGH from the 2026-05-12 audits that isn't a basic-capability blocker; tightens the observability surface; closes wired-but-dead gaps.
v0.5API M-0..M-3 Scaffold JarvisAPI package, dual-version Bus handshake, then land Self / Settings / Diagnostics typed surfaces. HUD state moves to Diagnostics.
v0.8API M-4..M-7 Memory / Vision / Voice / Turn typed surfaces. Per-turn OutboundBatcher partitioning. Webview retires the sessionHistory push. Bus protocol bumps to 2.4.0 strict after a 3-day soak.
v1.0Shipping Code-commenting style applied to load-bearing files. Stale plan-id gate. Doc reconciliation. First-launch experience polish.
v1.1Backlog Orpheus TTS tier 2 activation. WhisperKit STT fallback. Voice Log menu. Vision T2 sidecar. Presence pipeline. Advanced polish.

Day-to-day session handoff still lives in docs/SESSION-STATE.md; milestone-level state in .planning/STATE.md. Per-phase artifacts under .planning/phases/<N>/.

Repository layout

Path Purpose
App/ Swift host — AppDelegate, MenuBar, HUD window, Settings, Wizard, MCP runtime wiring
packages/ 14 Swift packages (SPM) — one per subsystem
webview/ pnpm workspace — @jarvis/bus (TS mirror of the Swift Bus) + @jarvis/hud (R3F bundle)
mcp-servers/ Out-of-process MCP helper apps — mcp-time, mcp-clipboard, mcp-applescript
scripts/ Build/codesign scripts and the 18 boundary-gate linters
Resources/ Bundled assets — voice ONNX models, HUD bundle target, app icon source
docs/ Live cross-session state — SESSION-STATE.md, archived TODO.md
.planning/ GSD workflow artifacts — phase plans, audit reports, research syntheses, architecture
assets/ Screenshots + diagrams (populated opportunistically)

Key documents

Contributing

This is a solo project. If you've stumbled across it and want to discuss something, the welcoming move is to open a GitHub issue. Style, commit conventions, and the GSD workflow live in CONTRIBUTING.md.

License

Jarvis is released under the MIT License. See .planning/decisions/license.md for the rationale (solo personal project; no patent strategy or SaaS surface to protect; MIT considered against Apache-2.0 / MPL-2.0 / AGPL-3.0 and chosen for minimum friction).

Acknowledgments

Jarvis stands on a stack of excellent open-source work. The most load-bearing dependencies:

LLM + agents

Voice (fully local)

Memory

  • SQLite + FTS5 — keyword search and the canonical store.
  • sqlite-vec (via sqlitevec SPM) — statically-linked vec0 vector extension.

HUD

Swift platform

  • swift-log — structured logging feeding the Dev Overlay's Logs tab.
  • swift-nio — async networking primitives under MCP and HTTP providers.
  • yams — YAML config parsing.

The full dependency graph lives in each package's Package.resolved.

About

Personal always-on macOS AI assistant in the shape of the Iron Man HUD — native Swift brain, R3F particle-ring face, voice-first, locally-grounded.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors