SPAR — Sim Portable Autonomy Runtime

As robot policies scale from hand-coded behaviors to foundation models, a command boundary becomes the last line of defense you can actually reason about: a VLA that decodes motor commands from raw video is opaque, but the commands it produces are not.

SPAR is a C++ runtime built around a command authority boundary: an untrusted mission layer proposes commands, and a trusted runtime monitor vets every one before it reaches the vehicle controller. The architecture holds two things fixed by construction across two substitution axes. Swapping the observation source (simulation vs. hardware) and swapping the behavior (hand-coded vs. learned) both leave the surrounding pipeline bit-identical. The policy is the only free variable.

Behaviors are swapped through the boundary from the most predictable to the most adversarial: a hand-coded baseline, a conventional RL policy (SAC via KinematicBackend), and foundation-model actors (Cosmos-based VLAs), the hardest free variable. The SAC baseline is built and available; it validates the pipeline and establishes catch fractions before Cosmos data collection begins. Cosmos is the stress test for the boundary, not a separate research track.

Research Questions

Can a fixed runtime monitor reliably catch dangerous commands regardless of what policy produces them and how does that guarantee hold up as the policy changes?

When an operator delegates a behavior to a learned policy, what does the monitor catch that the operator can't anticipate?

The four-way partition below is how the boundary is evaluated. Every catch is attributed to a layer:

Class	Caught by
Architectural-only	Runtime monitor (bounds, rate-of-change, temporal-window)
Behavioral-only	Behavioral monitor: any success-calibrated, conformal-thresholded detector (slot)
Both	Both layers
Neither	The uncatchable residual: explicit boundary of what monitoring can do

The boundary is the contribution; the partition is how it's validated. The catch fraction across these classes shows whether the guarantee holds as the policy changes, and where the two layers catch different things.

The behavioral row is a slot. It takes any detector that calibrates on successful rollouts and thresholds with conformal prediction, which is what the recent work (STAC, FAIL-Detect, FIPER) converged on; those ship as reference plug-ins. Phase 4b adds a Cosmos-specific prediction-divergence detector as one more.

Architecture

        sensor sources (async, different rates/latencies)
                              │
                              ▼
┌─────────────────────────────────────────────┐
│  Observation assembler      (TRUSTED)       │
│  ring buffers · latest-as-of-t · staleness  │
└────────────────────┬────────────────────────┘
                     │ time-coherent WorldState (one per tick)
                     ▼
┌─────────────────────────────────────────────┐
│  Mission layer              (UNTRUSTED)     │  ◀── operator selects active behavior
│  Hand-coded behaviors · learned nodes       │
└────────────────────┬────────────────────────┘
                     │ command stream
                     ▼
┌─────────────────────────────────────────────┐
│  Runtime monitor            (TRUSTED)       │
│  bounds · rate-of-change · staleness        │
│  approve → adapter   reject → safe fallback │
└────────────────────┬────────────────────────┘
                     │ approved command, or defined safe fallback on reject
                     ▼
┌─────────────────────────────────────────────┐
│  Adapter → ArduPilot        (TRUSTED)       │
│  single writer to controller                │
└─────────────────────────────────────────────┘

Observation assembler. The same assembler, the same latest_at_or_before(t) lookup, and the same staleness math run in simulation and on the rover; only the source implementation changes. For hand-coded behaviors and the SAC RL baseline, the assembled snapshot is the policy's primary input. For vision-based policies (Cosmos, Phase 4a), raw video frames flow through a separate path, but the assembler still runs, because the monitor needs assembled state (pose age, speed) to gate every command regardless of what produced it.

Runtime monitor. Sees every command before the adapter. Enforces per-sample bounds, rate-of-change limits, and temporal-window invariants derived from ArduPilot's accepted command envelope. Every decision is logged with the triggered invariant and an invariant_flags bitmask so the post-mortem question is always answerable: did no applicable invariant exist, or did one exist and the monitor miss it?

Rejection contract. A rejected command is never silently dropped. The monitor puts a defined safe fallback in its place and logs the rejection with the triggered invariant_flags. The fallback is what keeps the vehicle safe; a flagged command does nothing on its own. It's configurable per deployment: hold-last-approved, commanded-stop/zero, or hand-off to an ArduPilot failsafe mode.

Single-writer adapter. The only module that links against the controller transport. Bypass is architecturally impossible, not merely prohibited.

Current State and Roadmap

Single process, no ROS. Each phase has a gate condition; do not start the next phase until the gate is met.

Phase	What	Gate
0 ✓	SITL bug fixes (type_mask, heartbeat, UDP bind, capture timestamps); dead code cuts	Stable SITL telemetry: pose staleness < 200 ms on consecutive ticks
1 ✓	`KinematicBackend` + `TransportDegradation` + `OnnxNavigateNode` + SAC training + eval harness	Architectural catch fractions computable from a session log produced by an ONNX policy run
2	`MissionExecutor` + JSON mission loader + per-task policy assignment	Catch fractions computable per (task, policy) pair from session log
3	Temporal-window invariants (oscillation, sustained max-rate, jerk) + MCAP logging	At least one temporal invariant catches a failure class per-sample invariants miss
4a	`VideoRecorder` + `ObservationBuffer` + Cosmos-trained actor via `CosmosNavigateNode`	Architectural catch fractions for Cosmos actor vs. Phase 2 SAC fractions
4b	Cosmos prediction-divergence behavioral monitor	All four catch fractions computed for a Cosmos-trained policy

Phases 0 and 1 are complete. Phase 1 gate closed 2026-06-22: SAC policy trained via SparEnv, exported to ONNX, and run end-to-end through OnnxNavigateNode. Verified catch fractions by scenario: hand-coded baseline 0.003, pose_dropout 0.435, pose_jitter 0.053, policy_exploit 0.999, SAC ONNX baseline 0.014 (mission success, neither = 0). The trained policy produces slightly rougher commands than the hand-coded node; the monitor catches them without blocking mission completion.

Phase 4a prerequisite: Before starting Phase 4a, benchmark the candidate Cosmos actor's inference latency on target hardware and determine its output format (autoregressive vs. diffusion/chunking). If inference > 30 ms or the model outputs action chunks, docs/action-chunking.md describes the required architectural changes and must be implemented first. This decision cannot be deferred past Phase 4a start.

Out of Scope

These are correct designs for a deployed product. They are not research prerequisites and are not scheduled until Phase 3 produces results worth deploying.

Tower integration (docs/future/radio_icd.md): UDP multicast, fleet state, mailbox, multi-robot coordination
SAR / high-level behaviors (docs/future/high-level-behaviors.md): CommandBundle, BT composites, coverage planning
CarlaBackend / IsaacBackend: not needed: the fidelity ladder is kinematic → real rover video, not kinematic → photorealistic sim
Optimal planning, terrain models, probability maps: SPAR executes plans; it does not compute them
Formal verification / safety certification: SPAR's evaluation is empirical. It is not a formal safety case

Repository Layout

SPAR/
├── shared/
│   ├── contracts/           # BTNode, GoalContext, CommandStream,
│   │                        # MonitorDecision, WorldState, SimulatorBackend
│   └── assembler/           # ObservationAssembler (ring buffers + snapshot)
├── spar_rover/
│   ├── main.cpp             # tick thread (20 Hz)
│   ├── monitor/             # RuntimeMonitor (authority boundary enforcement)
│   ├── mission/             # NavigateNode, OnnxNavigateNode
│   ├── adapter/             # ArdupilotAdapter (MAVLink; disabled in kinematic mode)
│   └── sim/                 # KinematicBackend, TransportDegradation
├── python/                  # pybind11 bindings + SparEnv gym wrapper
└── docs/
    ├── architecture.md      # assembler, authority boundary, data flow, constraints
    ├── mission-design.md    # Task struct, MissionExecutor, session log schema
    └── future/              # radio_icd.md, high-level-behaviors.md (post-Phase-3)

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

The venv must be active when building the Python bindings so CMake links against the same interpreter used at runtime.

Building

Requires CMake 3.22+, a C++20 compiler.

Flag	Default	Effect
`SPAR_BACKEND`	`ardupilot`	`ardupilot` = live MAVLink; `kinematic` = in-process bicycle model
`SPAR_ENABLE_MAVLINK`	OFF	Enable real MAVLink transport (requires `third_party/mavlink` submodule)
`SPAR_ENABLE_ONNXRUNTIME`	OFF	Enable ONNX learned node (`OnnxNavigateNode`)
`SPAR_ENABLE_PYTHON`	OFF	Build pybind11 gym wrapper (requires `SPAR_BACKEND=kinematic` + pybind11)
`SPAR_ENABLE_EXPLOIT_NODE`	OFF	Build `ExploitNode` for `policy_exploit` scenario (eval only; never on in deployment builds)

Runtime environment variables:

Variable	Default	Effect
`SPAR_SCENARIO`	`baseline`	Label written to the session log header (`# scenario: <value>`). Set to the active failure scenario name so logs are unambiguous. See `docs/eval-protocol.md` for defined scenario names.

Kinematic mode (no SITL, no hardware required):

cmake -B build -DSPAR_BACKEND=kinematic -DSPAR_ENABLE_EXPLOIT_NODE=ON
cmake --build build
SPAR_SCENARIO=baseline ./build/spar_rover/spar_rover   # log written to /tmp/spar_*.log

Kinematic mode + Python bindings (SAC training):

cmake -B build -DSPAR_BACKEND=kinematic -DSPAR_ENABLE_PYTHON=ON
cmake --build build --target spar_bindings
python python/train_sac.py

Run all four eval scenarios and print catch-fraction table:

cd python && python eval_harness.py --build-dir ../build

ArduPilot stub mode (adapter runs but writes are no-ops; fixed pose injected):

cmake -B build
cmake --build build
./build/spar_rover/spar_rover

Running Against ArduPilot SITL

(MAVLink submodule required: clone third_party/mavlink, then build with SPAR_ENABLE_MAVLINK=ON)

# Start ArduPilot Rover SITL
sim_vehicle.py -v Rover --console

# Run SPAR
./build/spar_rover/spar_rover
# Session log written to /tmp/spar_<session_id>.log

Documentation

Architecture: observation assembler, authority boundary, data flow, monitor design, architectural constraints
RL Pipeline: SAC baseline, Cosmos behavior-cloning path, Phase 4a prerequisites (VideoRecorder, ObservationBuffer, action chunking decision gate), build status
Eval Protocol: named failure scenarios, eval harness, session log parser, catch-fraction table
Mission Design: Task struct, MissionExecutor design, session log schema
Future: Tower radio ICD, high-level behaviors (implement after Phase 3)

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
docs		docs
python		python
shared		shared
spar_rover		spar_rover
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPAR — Sim Portable Autonomy Runtime

Research Questions

Architecture

Current State and Roadmap

Out of Scope

Repository Layout

Setup

Building

Running Against ArduPilot SITL

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPAR — Sim Portable Autonomy Runtime

Research Questions

Architecture

Current State and Roadmap

Out of Scope

Repository Layout

Setup

Building

Running Against ArduPilot SITL

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages