Skip to content

v0.9.0-rc.25

Choose a tag to compare

@github-actions github-actions released this 27 May 19:54
· 64 commits to main since this release
v0.9.0-rc.25
6c57b76

Persistent multiplexed sync streams for Automerge — ADR-063 closure — closes peat-mesh#175 and peat-mesh#177. Three landed PRs (peat-mesh#176, peat-mesh#178, peat-mesh#180) replace the legacy per-message-stream sync path with one persistent multiplexed bi-stream per peer, driven by a writer-task + mpsc design that holds API-level concurrency unconstrained while serialising bytes on the wire. Architectural decision lives in peat ADR-063.

Lab UAT against peat-sim/experiments/7n-dual-c2's sweep-telemetry-rate.sh on the shaped 256 kbps / 100 ms link (single-platform emitter, 3 trials per rate) — all three delivery thresholds now pass:

Rate rc.24 (legacy fallback) rc.25 (persistent path) peat-mesh#175 threshold
1 Hz 100% / 100% 100% / 100% ≥ 99.5% ✓
10 Hz 94.6% / 94.6% 98.9% / 100% ≥ 99.0% ✓
25 Hz 85.3% / 85.5% 98.8% / 100% ≥ 95.0% ✓

Added — persistent sync channel wire-up (peat-mesh#176, peat-mesh#178)

  • AutomergeBackend::with_iroh wires SyncChannelManager unconditionally. Every consumer (peat-protocol, peat-node, peat-sim) now takes the persistent multiplexed bi-stream per peer; the legacy per-message-stream path is retained only as a reconnect-window fallback per ADR-063 §D4. The strong Arc<SyncChannelManager> is owned as a field on AutomergeBackend so the coordinator's Weak upgrades for the lifetime of the backend (the alternative — relying on the construction-scope local — produced the silent fall-through bug that peat-mesh#178's QA review caught: every consumer ran the legacy path even though the wire-up appeared correct).
  • SyncChannel send path: replaced the per-peer Mutex<Option<SendStream>> with mpsc::Sender<Vec<u8>> + a dedicated writer task per channel. User-visible send() is non-blocking (returns when the frame is queued, not when written), eliminating fan-out contention on the user-facing path. Bounded mpsc capacity (1024 frames) flows backpressure to callers when the writer can't keep up.
  • Wire-format symmetry on the v2 framing. SyncChannel::receive_loop rewritten to call AutomergeSyncCoordinator::receive_sync_payload_from_stream in a loop, replacing the pre-#175 [1 marker][4 len][N data] reader that desynchronized on the first byte of every frame (the sender writes [2 doc_key_len][N doc_key][1 msg_type][4 payload_len][payload]). handle_incoming_sync_stream wrapped in a frame loop with ReadExactError::FinishedEarly(0) as the graceful-close signal.
  • AutomergeSyncCoordinator::dispatch_received_payload: extracted as the single source of truth for the per-msg-type dispatch table (Delta / StateSnapshot / Tombstone / TombstoneBatch / Batch / NegentropyInit / NegentropyResponse). Both the accept-side persistent loop and the connect-side SyncChannel::receive_loop route through it.
  • AutomergeSyncCoordinator::has_persistent_sync_channel(&self) -> boolpub accessor that returns whether the coordinator's Weak<SyncChannelManager> currently upgrades. Used by the regression-pin integration test to detect dropped-Arc regressions directly rather than only via downstream delivery (legacy fallback also delivers).
  • Reconnect-race generation counter (peat-mesh#178 QA [WARNING]): each SyncChannel carries an Arc<AtomicU64> writer generation. spawn_writer captures its own generation at spawn time and only writes channel state if its generation still matches the channel's current — stale writers from prior reconnect incarnations log-and-exit silently rather than clobbering a freshly-installed Connected with Reconnecting.

Added — regression tests

  • sync_channel_send_writer_task_delivers_concurrent_batches (peat-mesh#178, automerge_sync.rs::tests::persistent_stream_wire_format_peat_mesh_175): 8 concurrent SyncChannel::send calls; asserts every batch's DeltaSync entry materialises on the responder. Catches future regressions to the writer-task pipeline (re-introducing the per-peer mutex, capacity bugs, reconnect-race state clobbering).
  • Per-variant dispatch pins (peat-mesh#178, closes peat-mesh#177): round-trip pins for StateSnapshot, Tombstone, TombstoneBatch (counter-only; in-memory store no-ops put_tombstone), plus a combined parse-side pin for NegentropyInit / NegentropyResponse. Combined with the prior Batch and DeltaSync pins from peat-mesh#176, the 7-arm dispatch table is now end-to-end covered.
  • tests/two_backend_persistent_sync_e2e.rs (peat-mesh#180): integration test that builds two real AutomergeBackend::with_iroh instances, cross-populates their iroh MemoryLookups, asserts the persistent channel is wired via has_persistent_sync_channel() on both sides, and exercises a full A → B sync round-trip. Catches the peat-mesh#178 BLOCKER class (dropped Arc<SyncChannelManager>) directly in CI rather than via lab UAT only. Verified to fail when the strong-ref field is removed from AutomergeBackend, with the diagnostic pointing straight at the BLOCKER context. Loopback convergence ~101 ms; 30 s timeout budget.

Fixed

  • peat-mesh-node binary's manual SyncChannelManager wiring at src/bin/peat-mesh-node.rs:293-297 had the same latent dropped-Arc bug as the pre-fix AutomergeBackend::new — the local was moved into set_channel_manager which downgraded to Weak and dropped the strong ref. Fixed analogously by rebinding to _channel_manager for the lifetime of main().

Documentation

  • SyncChannel::send doc-comment now contracts the delivery semantics: Ok(()) = "enqueued, will be best-effort delivered" (not "written to the wire"). DeltaSync / Batch recover via Automerge vector-clock; Tombstone / StateSnapshot are at-most-once-eventually with silent drops across reconnects — caller-side retry flagged as future work.
  • SEND_QUEUE_CAPACITY doc-comment clarifies the bound is per-FRAME-count (1024 frames ≈ 512 KB per peer for the typical workload), not per-byte: StateSnapshot payloads carry full doc.save() blobs and can push per-peer memory to gigabytes under snapshot pressure. Sized-Semaphore byte accounting flagged as the right next step if that pressure becomes a deployment concern.
  • bytes_sent doc-comment notes the counter is incremented at enqueue, not at write; bounded drift per reconnect cycle.

Out of scope (acknowledged follow-up)

  • Workload-characterisation gap (peat-mesh#175 remains partly open): the lab script runs telemetry concurrent with a 1 MB attachment download on the shaped 256 kbps link, so the combined offered load (~42 KB/s) exceeds capacity (32 KB/s). The thresholds in this release pass on the platform-3-only emitter case; the 5-platform concurrent case is bandwidth-bound, not sync-coordinator-bound — workload-side characterisation is now-isolated follow-up.
  • Persistent-stream registration on the accept side (ADR-063 §D3, "send half held in per-peer channel state on accept side as well"): the bidirectional QUIC stream's send half is already reusable by handlers like NegentropyResponse via dispatch_received_payload's send argument; full accept-side SyncChannelManager registration is deferred to a separate slice.
  • Reconnect-mid-send fault injection (peat-mesh#178 QA [SUGGESTION]): the generation-counter fix is defended-by-construction; a deterministic mid-send fault-injection regression pin requires QUIC-stream-level fault tooling and is deferred.