v0.9.0-rc.25
·
64 commits
to main
since this release
Persistent multiplexed sync streams for Automerge — ADR-063 closure — closes peat-mesh#175 and peat-mesh#177. Three landed PRs (peat-mesh#176, peat-mesh#178, peat-mesh#180) replace the legacy per-message-stream sync path with one persistent multiplexed bi-stream per peer, driven by a writer-task + mpsc design that holds API-level concurrency unconstrained while serialising bytes on the wire. Architectural decision lives in peat ADR-063.
Lab UAT against peat-sim/experiments/7n-dual-c2's sweep-telemetry-rate.sh on the shaped 256 kbps / 100 ms link (single-platform emitter, 3 trials per rate) — all three delivery thresholds now pass:
| Rate | rc.24 (legacy fallback) | rc.25 (persistent path) | peat-mesh#175 threshold |
|---|---|---|---|
| 1 Hz | 100% / 100% | 100% / 100% | ≥ 99.5% ✓ |
| 10 Hz | 94.6% / 94.6% | 98.9% / 100% | ≥ 99.0% ✓ |
| 25 Hz | 85.3% / 85.5% | 98.8% / 100% | ≥ 95.0% ✓ |
Added — persistent sync channel wire-up (peat-mesh#176, peat-mesh#178)
AutomergeBackend::with_irohwiresSyncChannelManagerunconditionally. Every consumer (peat-protocol, peat-node, peat-sim) now takes the persistent multiplexed bi-stream per peer; the legacy per-message-stream path is retained only as a reconnect-window fallback per ADR-063 §D4. The strongArc<SyncChannelManager>is owned as a field onAutomergeBackendso the coordinator'sWeakupgrades for the lifetime of the backend (the alternative — relying on the construction-scope local — produced the silent fall-through bug that peat-mesh#178's QA review caught: every consumer ran the legacy path even though the wire-up appeared correct).SyncChannelsend path: replaced the per-peerMutex<Option<SendStream>>withmpsc::Sender<Vec<u8>>+ a dedicated writer task per channel. User-visiblesend()is non-blocking (returns when the frame is queued, not when written), eliminating fan-out contention on the user-facing path. Bounded mpsc capacity (1024 frames) flows backpressure to callers when the writer can't keep up.- Wire-format symmetry on the v2 framing.
SyncChannel::receive_looprewritten to callAutomergeSyncCoordinator::receive_sync_payload_from_streamin a loop, replacing the pre-#175[1 marker][4 len][N data]reader that desynchronized on the first byte of every frame (the sender writes[2 doc_key_len][N doc_key][1 msg_type][4 payload_len][payload]).handle_incoming_sync_streamwrapped in a frame loop withReadExactError::FinishedEarly(0)as the graceful-close signal. AutomergeSyncCoordinator::dispatch_received_payload: extracted as the single source of truth for the per-msg-type dispatch table (Delta/StateSnapshot/Tombstone/TombstoneBatch/Batch/NegentropyInit/NegentropyResponse). Both the accept-side persistent loop and the connect-sideSyncChannel::receive_looproute through it.AutomergeSyncCoordinator::has_persistent_sync_channel(&self) -> bool—pubaccessor that returns whether the coordinator'sWeak<SyncChannelManager>currently upgrades. Used by the regression-pin integration test to detect dropped-Arcregressions directly rather than only via downstream delivery (legacy fallback also delivers).- Reconnect-race generation counter (
peat-mesh#178QA [WARNING]): eachSyncChannelcarries anArc<AtomicU64>writer generation.spawn_writercaptures its own generation at spawn time and only writes channel state if its generation still matches the channel's current — stale writers from prior reconnect incarnations log-and-exit silently rather than clobbering a freshly-installedConnectedwithReconnecting.
Added — regression tests
sync_channel_send_writer_task_delivers_concurrent_batches(peat-mesh#178,automerge_sync.rs::tests::persistent_stream_wire_format_peat_mesh_175): 8 concurrentSyncChannel::sendcalls; asserts every batch'sDeltaSyncentry materialises on the responder. Catches future regressions to the writer-task pipeline (re-introducing the per-peer mutex, capacity bugs, reconnect-race state clobbering).- Per-variant dispatch pins (peat-mesh#178, closes peat-mesh#177): round-trip pins for
StateSnapshot,Tombstone,TombstoneBatch(counter-only; in-memory store no-opsput_tombstone), plus a combined parse-side pin forNegentropyInit/NegentropyResponse. Combined with the priorBatchandDeltaSyncpins from peat-mesh#176, the 7-arm dispatch table is now end-to-end covered. tests/two_backend_persistent_sync_e2e.rs(peat-mesh#180): integration test that builds two realAutomergeBackend::with_irohinstances, cross-populates their irohMemoryLookups, asserts the persistent channel is wired viahas_persistent_sync_channel()on both sides, and exercises a full A → B sync round-trip. Catches the peat-mesh#178 BLOCKER class (droppedArc<SyncChannelManager>) directly in CI rather than via lab UAT only. Verified to fail when the strong-ref field is removed fromAutomergeBackend, with the diagnostic pointing straight at the BLOCKER context. Loopback convergence ~101 ms; 30 s timeout budget.
Fixed
peat-mesh-nodebinary's manualSyncChannelManagerwiring atsrc/bin/peat-mesh-node.rs:293-297had the same latent dropped-Arcbug as the pre-fixAutomergeBackend::new— the local was moved intoset_channel_managerwhich downgraded toWeakand dropped the strong ref. Fixed analogously by rebinding to_channel_managerfor the lifetime ofmain().
Documentation
SyncChannel::senddoc-comment now contracts the delivery semantics:Ok(())= "enqueued, will be best-effort delivered" (not "written to the wire").DeltaSync/Batchrecover via Automerge vector-clock;Tombstone/StateSnapshotare at-most-once-eventually with silent drops across reconnects — caller-side retry flagged as future work.SEND_QUEUE_CAPACITYdoc-comment clarifies the bound is per-FRAME-count (1024 frames ≈ 512 KB per peer for the typical workload), not per-byte:StateSnapshotpayloads carry fulldoc.save()blobs and can push per-peer memory to gigabytes under snapshot pressure. Sized-Semaphore byte accounting flagged as the right next step if that pressure becomes a deployment concern.bytes_sentdoc-comment notes the counter is incremented at enqueue, not at write; bounded drift per reconnect cycle.
Out of scope (acknowledged follow-up)
- Workload-characterisation gap (peat-mesh#175 remains partly open): the lab script runs telemetry concurrent with a 1 MB attachment download on the shaped 256 kbps link, so the combined offered load (~42 KB/s) exceeds capacity (32 KB/s). The thresholds in this release pass on the platform-3-only emitter case; the 5-platform concurrent case is bandwidth-bound, not sync-coordinator-bound — workload-side characterisation is now-isolated follow-up.
- Persistent-stream registration on the accept side (ADR-063 §D3, "send half held in per-peer channel state on accept side as well"): the bidirectional QUIC stream's send half is already reusable by handlers like
NegentropyResponseviadispatch_received_payload'ssendargument; full accept-sideSyncChannelManagerregistration is deferred to a separate slice. - Reconnect-mid-send fault injection (peat-mesh#178 QA [SUGGESTION]): the generation-counter fix is defended-by-construction; a deterministic mid-send fault-injection regression pin requires QUIC-stream-level fault tooling and is deferred.