feat: `BigRepo` by dman-os · Pull Request #14 · dman-os/townframe

dman-os · 2026-03-07T10:43:31Z

Summary by CodeRabbit

New Features
- Partition-aware multi-peer sync, per-peer/full-sync progress, and background blob transfer/download workers.
- New unified repository API surface (BigRepo) powering document lifecycle, subscriptions, and iroh/WebSocket connections.
- Doc-blobs indexing service for tracking blob→document relationships.
Improvements
- On-disk deduplicated blob storage with peer-backed downloads and materialization.
- CLI: QR-based clone ticket, improved clone/sync orchestration, and steadier startup/shutdown.
Documentation
- Added developer docs describing data model and governance.
Tests
- Extensive new integration and stress tests for sync and blob flows.

wip(cli): qr codes refactor(core): perf work refactor(ci): branch cache strategy fix refactor(iac): avoid retain on ollama pvc

coderabbitai · 2026-03-07T10:43:42Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Large refactor replacing AmCtx-centric APIs with a SharedBigRepo abstraction; introduces BigRepo, repo-integrated change listeners and per-doc brokers, partition-aware sync (protocol, node, peer, store), full-sync workers (doc/blob), blob indexing, wide app/CLI/FFI migrations, Cargo/CI tweaks, and many tests.

Changes

Cohort / File(s)	Summary
Build & CI `\.github/workflows/builds.yml`	Adjust Java setup step formatting; add `cache: false` to rust-toolchain setup steps across OSes.
Workspace Manifests `Cargo.toml`, `src/*/Cargo.toml`	Add/update workspace deps (samod-core, qrcode, irpc, irpc-iroh, camino, sqlx...), extend tokio features with `tracing`, and update patch entries.
Core BigRepo API & Public Surface `src/am_utils_rs/lib.rs`, `src/am_utils_rs/repo.rs`, `src/am_utils_rs/repo/*`, `src/am_utils_rs/codecs.rs`	Introduce `BigRepo`/`SharedBigRepo`, repo lifecycle, partition tracking, new utf8_path codec; re-orient public exports from AmCtx to repo types.
Change Listener / Broker `src/am_utils_rs/repo/changes.rs`, `src/am_utils_rs/repo/changes/broker.rs`	New multi-stream notification types, filters, central ChangeListenerManager, per-doc DocChangeBroker with diffing, readiness semantics, and per-listener lifecycle.
Removed legacy AmCtx modules `src/am_utils_rs/changes.rs`, `src/am_utils_rs/iroh.rs`	Delete old AmCtx-centric changes and iroh integration; functionality moved into `repo/*`.
Partition DB & Events `src/am_utils_rs/repo/partition.rs`	SQLite-backed partition schema, membership/doc state, event replay, live subscriptions, paging helpers, and doc access checks.
Partition Sync Protocol `src/am_utils_rs/sync/protocol.rs`	New protocol types, RPC enums, constants, and `PartitionSyncError`.
Sync Node / Peer / Store `src/am_utils_rs/sync/node.rs`, `src/am_utils_rs/sync/peer.rs`, `src/am_utils_rs/sync/store.rs`	Implement SyncNode, PeerSyncWorker, and SQLite-backed SyncStore messaging and cursor APIs.
IROH integration (repo-level) `src/am_utils_rs/repo/iroh.rs`, `src/am_utils_rs/repo/iroh/codec.rs`	Add `IrohDialer`/`IrohRepoProtocol`, repo-level iroh dialing/accept handling and ALPN; codec header tweak.
Full Sync & Workers `src/daybook_core/sync/full.rs`, `src/daybook_core/sync/full/*.rs`	Add full-sync orchestration, per-peer sessions, Doc/Blob sync workers, progress/backoff, and public worker APIs/events.
Blobs & Doc‑blobs Index `src/daybook_core/blobs.rs`, `src/daybook_core/index/doc_blobs.rs`, `src/daybook_core/index.rs`	On-disk + iroh BlobsRepo (owned/reference metadata), DocBlobsIndexRepo with background worker and triage listener.
App-level & CLI/Server migration `src/daybook_core/*`, `src/daybook_cli/main.rs`, `src/daybook_server/main.rs`	Migrate many repos/RT/CLI wiring to accept `SharedBigRepo` and `UserPath`; add doc-blobs-index integration and updated boot/stop semantics.
Stores / AmStore Migration `src/daybook_core/stores.rs`, `src/daybook_core/tables.rs`	AmStore trait and store handles switched to `SharedBigRepo`; register/listen/hydrate/flush flow moved to BigRepo APIs.
Drawer / Facet changes `src/daybook_core/drawer/types.rs`, `src/daybook_core/drawer/facet_recovery.rs`	DocEntry shape changes (StoredBranchRef, vtag, previous_version_heads) and new historical facet recovery helpers.
FFI Layer Updates `src/daybook_ffi/repos/*`, `src/daybook_ffi/rt.rs`	FFI wrappers updated to pass `SharedBigRepo` and `local_user_path`; added `DrawerRepoFfi::batch_add`.
Types & Path handling `src/daybook_types/*`, `src/am_utils_rs/codecs.rs`	Switch to `camino::Utf8PathBuf` for `UserPath`/`BranchPath`, `PathGeneric`→`String`, add utf8_path codec and user_path helpers/constants.
Tests & Stress `src/daybook_core/sync/stress.rs`, `src/daybook_core/sync/tests/*`	Add extensive unit/integration tests and a stress test harness for partitioned sync/full-sync flows.
Utilities & Runtime `src/utils_rs/lib.rs`, `src/utils_rs/testing.rs`, `src/xtask/*`	Add `AbortableJoinSet`, new error tag constants, switch multibase hashing to base58, richer tracing filters, and IRPC probe tooling in xtask.
App Repo wiring & shutdown tweaks `src/daybook_core/repo.rs`, `src/daybook_core/rt.rs`, `src/daybook_core/repos.rs`	Replace repo context with BigRepo, add repo-user helpers, adjust boot/stop tokens, remove some broker stop-token shutdowns.
Docs, Governance & Infra `AGENTS.md`, `docs/*`, `CONTRIBUTING.md`, `README.md`, `src/infra/ollama.tf`	Add docs/dict.md, editorial and governance guidance updates, README emoji, AGENTS.md additions, and PV storageClass change in TF.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant BigRepo as BigRepo
    participant Broker as DocChangeBroker
    participant Switch as ChangeListenerManager
    participant Listener as Listener

    Client->>BigRepo: create_doc(initial_content)
    BigRepo->>Broker: spawn_doc_listener(doc_id)
    Broker-->>BigRepo: DocChangeBrokerHandle
    Client->>BigRepo: apply_change(doc_id, patch)
    BigRepo->>Broker: notify_heads_changed(doc_id, new_heads, origin)
    Broker->>Broker: compute diffs -> patches
    Broker->>Switch: send Vec<BigRepoChangeNotification>
    Switch->>Listener: filter & dispatch matching notifications
    Listener->>Client: call on_change callback

sequenceDiagram
    actor PeerA
    actor PeerB
    participant SyncNode as SyncNode
    participant Store as SyncStore
    participant BigRepo as BigRepo
    participant PeerWorker as PeerSyncWorker

    PeerA->>SyncNode: register_local_peer(peer_key)
    SyncNode->>BigRepo: list_partitions_for_peer(peer_key)
    BigRepo-->>SyncNode: partitions
    PeerWorker->>SyncNode: ListPartitions RPC
    SyncNode->>Store: get_partition_member_events_for_peer(req)
    Store-->>SyncNode: events
    SyncNode-->>PeerWorker: stream events (SubPartitions)
    PeerWorker->>BigRepo: import_doc / apply events
    PeerWorker->>PeerA: ack / progress events

sequenceDiagram
    participant FullSync as FullSyncWorker
    participant BlobWorker as BlobSyncWorker
    participant BlobsRepo as BlobsRepo
    participant Endpoint as Endpoint

    FullSync->>BlobWorker: spawn_blob_sync_worker(hash, peers)
    BlobWorker->>BlobsRepo: has_hash(hash)?
    alt present locally
        BlobWorker->>FullSync: mark_synced
    else present in store
        BlobWorker->>BlobsRepo: put_from_store(hash)
        BlobWorker->>FullSync: mark_synced
    else needs download
        BlobWorker->>Endpoint: request_download(hash)
        Endpoint-->>BlobWorker: transfer progress/completion
        BlobWorker->>BlobsRepo: put(downloaded)
        BlobWorker->>FullSync: mark_synced
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Possibly related PRs

feat(core): sync #12 — overlaps am_utils_rs iroh/change-listener and RepoConnection types; closely related to the BigRepo and iroh refactor.
ci: release assets #7 — modifies the same GitHub Actions workflow (.github/workflows/builds.yml); CI changes overlap.
feat: heuristic keys for autosurgeon json codec #11 — touches am_utils_rs codecs/changes areas and may overlap change-listener and codec updates.

Poem

🐰 I dug a tunnel through the code so deep,
From AmCtx burrows to BigRepo's keep.
Partitions hum, listeners prance with glee,
Blobs hop home, docs sync across the lea.
The rabbit twitches whiskers: "All aligned!" 🥕

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/core/drawer-refactor-umpteenth

feat(am_utils_rs): big repo changes system refactor(am_utils_rs): sync manual cleanup

dman-os

WIP review

dman-os · 2026-03-13T21:21:16Z

+
+struct DocChangeBroker {
+    heads_listeners: Arc<Mutex<Vec<HeadListener>>>,
+    handle: DocHandle,


Consider using BigRepoHandle here for consistency.

dman-os · 2026-03-13T22:46:08Z

+}
+
+impl BigRepo {
+    pub async fn boot_with_repo(repo: samod::Repo, config: BigRepoConfig) -> Res<Arc<Self>> {


We probably want to hide samod itself as an impl detail. Audit usages of samod API items across the big repo api surface and evaluate what we can get rid of.

dman-os

Create a separate partition to create our app and drawer doc?

dman-os · 2026-03-13T23:13:06Z

Not sure I understand the need of such a file structure.

coderabbitai

Actionable comments posted: 6

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)

.github/workflows/builds.yml (1)

271-271: ⚠️ Potential issue | 🟠 Major

Update actions/upload-artifact to v7.

The workflow uses actions/upload-artifact@v6 across multiple jobs. Version 7.0.0 was released on February 26, 2026, and is now the latest major version. Update to actions/upload-artifact@v7 to access the latest features, improvements, and security updates.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/builds.yml at line 271, Update the GitHub Actions step
that references the actions/upload-artifact action to use the new major version
tag: replace all occurrences of actions/upload-artifact@v6 with
actions/upload-artifact@v7 in the .github/workflows/builds.yml workflow (search
for the literal string "actions/upload-artifact@v6" and update each usage to
"actions/upload-artifact@v7").

src/daybook_core/rt/switch.rs (1)

151-201: ⚠️ Potential issue | 🟠 Major

Do not swallow init replay failures.

Using .unwrap_or_log() here can silently skip initialization events and start with inconsistent switch state.

Proposed fix

-        worker
-            .track_event_heads(&SwitchEvent::Plugs(Arc::clone(&event)))
-            .await
-            .unwrap_or_log();
-        worker
-            .dispatch_to_listeners(&SwitchEvent::Plugs(event))
-            .await
-            .unwrap_or_log();
+        worker
+            .track_event_heads(&SwitchEvent::Plugs(Arc::clone(&event)))
+            .await?;
+        worker
+            .dispatch_to_listeners(&SwitchEvent::Plugs(event))
+            .await?;

Apply the same replacement for the drawer/dispatch/config init loops in this block.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/rt/switch.rs` around lines 151 - 201, The init loops for
Drawer, Dispatch and Config currently call
track_event_heads(...).await.unwrap_or_log() and
dispatch_to_listeners(...).await.unwrap_or_log(), which swallows errors; change
those calls to propagate failures instead (e.g. remove unwrap_or_log and use ?
on the awaited Result) so any error from worker.track_event_heads or
worker.dispatch_to_listeners surfaces during init for SwitchEvent::Drawer,
SwitchEvent::Dispatch and SwitchEvent::Config (mirror the fix used for the plugs
loop).

src/daybook_core/plugs.rs (1)

917-930: ⚠️ Potential issue | 🟠 Major

Don't expect() replicated plug hydration to succeed.

This path is handling remote automerge state. A missing hydrated node here is version drift/corruption, not a programming assertion, so expect(ERROR_INVALID_PATCH) will take down the repo worker on bad peer data instead of surfacing an error.
Possible fix
-                            .await?
-                            .expect(ERROR_INVALID_PATCH);
+                            .await?
+                            .ok_or_eyre(ERROR_INVALID_PATCH)?;
Based on learnings, when validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/plugs.rs` around lines 917 - 930, The code currently calls
expect(ERROR_INVALID_PATCH) after hydrate_path_at_heads in the
PlugsEvent::PlugAdded/PlugChanged handling (the call to
self.big_repo.hydrate_path_at_heads::<Versioned<ThroughJson<Arc<manifest::PlugManifest>>>>(...)),
which will panic on missing data; change this to explicitly handle the Option
result and return an error with eyre::bail! (or eyre::bail!(...)) instead of
panicking — check if the await? yields None and call eyre::bail! with a clear
message including id and heads, so the function returns an error rather than
killing the repo worker.

src/daybook_core/rt/dispatch.rs (2)

239-250: ⚠️ Potential issue | 🟠 Major

Keep wflow_to_dispatch consistent when applying remote events.

The replicated add/delete path only updates active_dispatches. get_by_wflow_job() and deletion cleanup rely on wflow_to_dispatch, so remote dispatches never become discoverable by job id and replicated deletions can leave stale index entries behind.

Possible fix

                     DispatchEvent::DispatchAdded { id, heads } => {
                         // Hydrate the new dispatch at heads
                         let (new_versioned, _) = self
                             .big_repo
                             .hydrate_path_at_heads::<Versioned<ThroughJson<Arc<ActiveDispatch>>>>(
                                 &self.app_doc_id,
                                 &heads.0,
                                 automerge::ROOT,
                                 vec![
                                     DispatchStore::prop().into(),
                                     "active_dispatches".into(),
                                     autosurgeon::Prop::Key(id.clone().into()),
                                 ],
                             )
                             .await?
                             .expect(ERROR_INVALID_PATCH);

                         self.store
                             .mutate_sync(|store| {
+                                match &new_versioned.val.0.deets {
+                                    ActiveDispatchDeets::Wflow { wflow_job_id, .. } => {
+                                        store.wflow_to_dispatch.insert(wflow_job_id.clone(), id.clone());
+                                    }
+                                }
+                                store.cancelled_dispatches.remove(id);
                                 store.active_dispatches.insert(id.clone(), new_versioned);
                             })
                             .await?;
                     }
                     DispatchEvent::DispatchDeleted { id, .. } => {
                         self.store
                             .mutate_sync(|store| {
-                                store.active_dispatches.remove(id);
+                                if let Some(old) = store.active_dispatches.remove(id) {
+                                    match &old.val.0.deets {
+                                        ActiveDispatchDeets::Wflow { wflow_job_id, .. } => {
+                                            store.wflow_to_dispatch.remove(wflow_job_id);
+                                        }
+                                    }
+                                }
+                                store.cancelled_dispatches.remove(id);
                             })
                             .await?;
                     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/rt/dispatch.rs` around lines 239 - 250, The remote event
handling updates only active_dispatches and misses updating the
wflow_to_dispatch index, so remote adds are not discoverable by
get_by_wflow_job() and remote deletes leave stale entries; inside the
DispatchEvent::DispatchAdded branch (the mutate_sync call that inserts into
active_dispatches) also insert the corresponding mapping into
store.wflow_to_dispatch (using the dispatch's workflow/job key used by
get_by_wflow_job()), and inside DispatchEvent::DispatchDeleted (the mutate_sync
call that removes from active_dispatches) remove the same mapping from
store.wflow_to_dispatch to keep the two maps consistent and avoid stale index
entries.

224-237: ⚠️ Potential issue | 🟠 Major

Handle replicated dispatch drift with Err, not panic!().

Both of these branches are driven by remote CRDT state. Right now a missing hydrated dispatch or a non-add update from another device will crash the notification worker instead of surfacing drift as an error.

Possible fix

-                            .await?
-                            .expect(ERROR_INVALID_PATCH);
+                            .await?
+                            .ok_or_eyre(ERROR_INVALID_PATCH)?;

-                out.push(if vtag.version.is_nil() {
-                    DispatchEvent::DispatchAdded {
-                        id: dispatch_id.clone(),
-                        heads: dispatch_heads,
-                    }
-                } else {
-                    panic!("dispatch update detected")
-                });
+                if vtag.version.is_nil() {
+                    out.push(DispatchEvent::DispatchAdded {
+                        id: dispatch_id.clone(),
+                        heads: dispatch_heads,
+                    });
+                } else {
+                    eyre::bail!("unexpected dispatch update for active_dispatches/{dispatch_id}");
+                }

Based on learnings, when validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!().

Also applies to: 300-311

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/rt/dispatch.rs` around lines 224 - 237, The code currently
uses expect()/panic!() when a hydrated dispatch is missing or when a non-add
update is observed (locations around the hydrate_path_at_heads call that returns
Versioned<ThroughJson<Arc<ActiveDispatch>>> and the branches referenced at lines
~300-311); change these to return a proper eyre error instead of panicking:
replace the .expect(ERROR_INVALID_PATCH) (and any panic!/unimplemented!() in the
non-add-update handling) with eyre::bail!(...) or eyre::eyre!(...) that includes
context (dispatch id, heads, and a short message about replicated drift) so the
error surfaces to the caller rather than aborting the worker.

src/daybook_core/tables.rs (1)

497-510: ⚠️ Potential issue | 🟠 Major

Avoid expect() on replicated table state.

Each of these hydrations is driven by remote automerge patches. A missing node here is version drift/data corruption, not a programming assertion, so this should return Err instead of panicking.
Possible fix
-                            .await?
-                            .expect(ERROR_INVALID_PATCH);
+                            .await?
+                            .ok_or_eyre(ERROR_INVALID_PATCH)?;
Apply the same change to the window/tab/panel/table branches in this match.
Based on learnings, when validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!().

Also applies to: 527-540, 558-571, 589-602
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/tables.rs` around lines 497 - 510, Replace the
panic-inducing expect on hydrated replicated state with an error return using
eyre::bail! so missing nodes return Err rather than panic: in the
hydrate_path_at_heads::<Versioned<Window>> call (and the analogous hydrate calls
for tab/panel/table branches) remove the .expect(ERROR_INVALID_PATCH) and
instead check for None and eyre::bail!(ERROR_INVALID_PATCH) so the function
returns an Err; target the call sites using
big_repo.hydrate_path_at_heads::<Versioned<Window>> (and similar generic calls),
the ERROR_INVALID_PATCH constant, and the path components TablesStore::prop(),
"windows"/other collection names, and autosurgeon::Prop::Key(...) when applying
the same change to each branch.

src/daybook_cli/main.rs (1)

706-715: ⚠️ Potential issue | 🟠 Major

Don't mark the clone initialized before the second sync phase succeeds.

After mark_repo_initialized() returns, later opens will treat the repo as healthy even though only the bootstrap docs are guaranteed to exist locally. If wait_for_full_sync(...) times out or fails, the user is left with an initialized-but-partial clone.

Also applies to: 778-784
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_cli/main.rs` around lines 706 - 715, The code currently calls
daybook_core::app::globals::set_init_state(...) and
daybook_core::repo::mark_repo_initialized(&destination) before the second sync
phase, which can mark a partial clone as initialized; move/defer those calls so
they execute only after wait_for_full_sync(...) completes successfully (i.e.,
after the second sync/verification step succeeds), keeping
big_repo_stop.stop().await? handling and error propagation unchanged so failures
still stop/cleanup without marking the repo initialized; update both the block
around set_init_state/mark_repo_initialized and the analogous block that covers
wait_for_full_sync (the later 778-784 region) to follow the same ordering.

src/daybook_core/config.rs (1)

206-213: ⚠️ Potential issue | 🟠 Major

Don't panic the config watcher on remote document drift.

notifs_loop() is consuming replicated document data, but these expect(...) calls turn version drift or a malformed remote patch into a dead background watcher. This path should surface an error and exit cleanly instead of panicking on external input. Based on learnings: When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking.

Also applies to: 286-296
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/config.rs` around lines 206 - 213, The spawn of the
background watcher currently calls repo.notifs_loop(...).await.expect(...),
which will panic on malformed/remote-driven data; replace the expect-based panic
with proper error propagation and graceful exit: have the async task await
repo.notifs_loop(...) and map any Err to an eyre::bail! or return an Err from
the spawned future so the task fails cleanly (do not call expect). Apply the
same change to the other similar spawn block mentioned (the block around lines
286-296) so both tokio::spawn closures propagate errors from notifs_loop and any
validation paths instead of panicking.

src/daybook_core/sync.rs (1)

322-333: ⚠️ Potential issue | 🟠 Major

Ignore late finish signals for untracked connections.

The duplicate-connection path stops a RepoConnection before it is inserted into active_samod_peers, and stopped connections can still emit ConnFinishSignals. expect("connection finished for unknown conn_id") turns that normal race into a panic in the sync loop.

Also applies to: 367-377
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync.rs` around lines 322 - 333, In handle_incoming_am_conn,
the code currently assumes a finish signal always maps to a tracked connection
and uses expect, which panics for stopped-but-untracked connections; change the
finish-signal handling so that when a ConnFinishSignal refers to an endpoint_id
not present in active_samod_peers you simply ignore (or log at debug) the signal
instead of panicking. Locate the branch that looks up endpoint_id in
active_samod_peers (and the matching block after conn.stop()) and replace the
expect("connection finished for unknown conn_id") with a fallible lookup that
handles the missing key gracefully; apply the same defensive change to the
similar finish-signal handling in the other block that mirrors this logic.

🟠 Major comments (19)

src/daybook_core/sync/stress.rs-28-28 (1)

28-28: ⚠️ Potential issue | 🟠 Major

std::env::set_var is unsafe in multi-threaded contexts.

This test uses #[tokio::test(flavor = "multi_thread")], but std::env::set_var is not thread-safe and can cause undefined behavior when called concurrently with other code reading environment variables. Since Rust 1.66, this generates a warning, and it's been marked as unsafe in more recent editions.

Consider setting this environment variable before the test runtime starts, or use a different mechanism for test configuration.
🛠️ Possible alternatives
-    std::env::set_var("DAYB_DISABLE_KEYRING", "1");
+    // Set via test harness or use a configuration struct instead of env var
+    // Or ensure this is set before spawning the async runtime
Alternatively, if the code reading DAYB_DISABLE_KEYRING can be refactored to accept a config parameter, that would be the safest approach.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/stress.rs` at line 28, The call to
std::env::set_var("DAYB_DISABLE_KEYRING", "1") inside the async test annotated
with #[tokio::test(flavor = "multi_thread")] is unsafe in multi-threaded
contexts; remove that call from the async test and instead initialize the env
var before the Tokio runtime starts — e.g., create a static std::sync::Once
(e.g., static INIT: Once) and call INIT.call_once(||
std::env::set_var("DAYB_DISABLE_KEYRING","1")) from a synchronous test setup
function or module-level initializer, or alternatively change the test to
#[tokio::test(flavor = "current_thread")] if single-threaded execution is
acceptable; update the stress test to rely on this pre-runtime initialization
rather than calling std::env::set_var inside the async function.

src/daybook_core/build.rs-4-6 (1)

4-6: ⚠️ Potential issue | 🟠 Major

Emit rerun-if-env-changed for the skip flag.

Line 4 changes build behavior via DAYBOOK_SKIP_WFLOW_BUILD, but Cargo is not told to rerun when that env var changes. This can leave stale build-script outputs when the flag is toggled between builds.

Suggested fix

 fn main() -> Result<(), Box<dyn std::error::Error>> {
+    println!("cargo:rerun-if-env-changed=DAYBOOK_SKIP_WFLOW_BUILD");
     if std::env::var("DAYBOOK_SKIP_WFLOW_BUILD").as_deref() == Ok("1") {
         return Ok(());
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/build.rs` around lines 4 - 6, In build.rs, before you check
DAYBOOK_SKIP_WFLOW_BUILD (the if block that returns Ok(())), emit a Cargo
directive to rerun the build script when that env var changes by printing
"cargo:rerun-if-env-changed=DAYBOOK_SKIP_WFLOW_BUILD"; place this println!
immediately before the if that tests std::env::var("DAYBOOK_SKIP_WFLOW_BUILD")
so toggling the flag will cause Cargo to rerun the build script rather than
leaving stale outputs.

src/daybook_core/drawer/facet_recovery.rs-55-58 (1)

55-58: ⚠️ Potential issue | 🟠 Major

Do not silently drop unrecoverable updatedAt entries.

At Line 56, hash_for_opid returning None is ignored, which can return incomplete head recovery without surfacing an error.

💡 Proposed fix

     for ii in 0..length {
-        if let Some((_, exid)) = get(doc, &updated_at_list, ii, read_heads)? {
-            if let Some(hash) = doc.hash_for_opid(&exid) {
-                recovered.push(hash);
-            }
-        }
+        let (_, exid) = get(doc, &updated_at_list, ii, read_heads)?
+            .ok_or_else(|| eyre::eyre!("updatedAt entry missing at index {ii} for facet: {facet_key}"))?;
+        let hash = doc
+            .hash_for_opid(&exid)
+            .ok_or_else(|| eyre::eyre!("could not recover hash for updatedAt entry at index {ii} for facet: {facet_key}"))?;
+        recovered.push(hash);
     }

Based on learnings: when validating external automerge/dmeta inputs, use eyre::bail!() error paths for drift instead of silently accepting unexpected states.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/drawer/facet_recovery.rs` around lines 55 - 58, The code
silently skips unrecoverable updatedAt entries when doc.hash_for_opid(&exid)
returns None, leading to incomplete recovery; update the logic in the block that
calls get(doc, &updated_at_list, ii, read_heads) so that if
doc.hash_for_opid(&exid) yields None you return an error (e.g., eyre::bail! or
eyre::Report) instead of ignoring it—reference the get(...) call, the exid
variable, doc.hash_for_opid(&exid), and the recovered vector to locate the
change and replace the silent drop with an eyre::bail! providing context about
the missing hash/drift.

src/daybook_core/blobs.rs-29-30 (1)

29-30: ⚠️ Potential issue | 🟠 Major

Deduplicated references need more than one source path.

Reference metadata is keyed by content hash but only stores one source_path. A second put_path_reference() for identical bytes overwrites the first path, and get_path() then fails as soon as the last-written file disappears even if an older source still exists and still hashes correctly.

Please preserve multiple candidates here, or at least avoid replacing a still-live reference with another one. Please also add a regression test with two distinct paths that hash identically.

As per coding guidelines "When working with Rust, use small tests in addition to cargo clippy to validate ideas."

Also applies to: 118-125, 185-188
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/blobs.rs` around lines 29 - 30, The current reference
metadata stores only one source_path (fields src_local_user_path and
source_path), so calling put_path_reference() twice for identical content
overwrites the prior candidate and get_path() can fail if the last-written path
is removed; change the metadata to preserve multiple candidate paths (e.g.,
Vec<String> or a Set) instead of a single source_path, update
put_path_reference() to append/merge without discarding live existing paths and
update get_path() to prefer an existing live path from the candidates, and add a
regression test that writes two distinct filesystem paths with identical bytes
and asserts get_path() still finds a remaining path after removing the other;
update any code that reads/writes the metadata to handle the new collection type
and ensure serialization/deserialization (if any) is adjusted accordingly.

src/daybook_core/blobs.rs-70-95 (1)

70-95: ⚠️ Potential issue | 🟠 Major

Always clean up source_snapshot, even on errors.

Both put_path_copy() and put_path_reference() only remove the snapshot on the success tail. Any failure after create_source_snapshot()—hashing, metadata I/O, or iroh ingest—leaks a full-sized temp file under root/snapshots.

Wrap the body in let result = async { ... }.await; let _ = tokio::fs::remove_file(&source_snapshot).await; result so cleanup runs on both success and error.

Also applies to: 110-129
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/blobs.rs` around lines 70 - 95, The snapshot file created by
create_source_snapshot is only removed on the success path, leaking large temp
files on errors; modify put_path_copy (and similarly put_path_reference around
lines showing create_source_snapshot/use of source_snapshot) to wrap the
existing body that operates on source_snapshot in an inner async block (e.g.,
let result = async { ... }.await), then unconditionally attempt to remove the
snapshot with let _ = tokio::fs::remove_file(&source_snapshot).await; after that
and return result so cleanup runs whether the inner block succeeded or returned
an error; ensure you keep existing uses of hash, object_paths, build_meta,
write_meta, ingest_path_with_iroh, and atomic_copy_file inside the inner block.

src/xtask/main.rs-709-713 (1)

709-713: ⚠️ Potential issue | 🟠 Major

Add a timeout when waiting for cancellation report.

This await can block forever if the server-side probe task never emits a report.

Proposed fix

-    let (elapsed, send_ok) = report_rx
-        .recv()
-        .await
-        .ok_or_eyre("probe report missing after client cancellation")?;
+    let (elapsed, send_ok) = tokio::time::timeout(Duration::from_secs(5), report_rx.recv())
+        .await
+        .wrap_err("timed out waiting for probe report after client cancellation")?
+        .ok_or_eyre("probe report missing after client cancellation")?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/xtask/main.rs` around lines 709 - 713, The code awaits
report_rx.recv().await without a timeout, which can hang if the probe task never
sends a report; wrap the recv().await call in a Tokio timeout
(tokio::time::timeout) with a reasonable Duration and map a timeout error to the
existing eyre error path (e.g., produce an "probe report missing after client
cancellation (timed out)" error) so the surrounding let (elapsed, send_ok) =
...? logic handles both normal and timed-out cases; update imports if needed
(tokio::time::Duration / timeout) and keep the same variable names (report_rx,
elapsed, send_ok) and error handling flow.

src/am_utils_rs/repo.rs-171-188 (1)

171-188: ⚠️ Potential issue | 🟠 Major

Global and pre-create listeners never attach a broker.

This only acquires a DocChangeBrokerLease when filter.doc_id is set and the doc already exists. If there is no other lifecycle hook calling ensure_change_broker, global listeners and listeners registered before an import/create will miss remote DocChanged events.

Remote change notifications flow exclusively through brokers spawned during add_doc_with_remote_heads_callback, which send to the change_tx channel consumed by the switchboard. Global listeners (doc_id: None) registered in add_change_listener receive no broker leases and thus depend entirely on brokers existing for documents—but brokers are only created when a specific doc_id is requested or when ensure_change_broker is called explicitly. Documents that lack either fail to emit remote events to global listeners.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo.rs` around lines 171 - 188, The add_change_listener
implementation only acquires a DocChangeBrokerLease when filter.doc_id exists
and the doc is already present, so global listeners and listeners added before a
doc is created never get broker leases and miss remote DocChanged events; fix by
ensuring brokers are attached for relevant docs and future docs: 1) when
filter.doc_id is None (global listener) iterate current documents (via the same
mechanism used by find_doc_handle/add_doc_with_remote_heads_callback) and call
ensure_change_broker for each to collect broker leases; 2) when filter.doc_id is
Some but the doc does not yet exist, register a one-shot hook or piggyback on
add_doc_with_remote_heads_callback so that ensure_change_broker is called and
its lease appended to the returned registration once the doc is created; use the
existing functions ensure_change_broker, find_doc_handle,
add_doc_with_remote_heads_callback and change_manager.add_listener to locate
insertion points.

src/am_utils_rs/repo.rs-109-123 (1)

109-123: ⚠️ Potential issue | 🟠 Major

Reject non-UTF-8 disk paths instead of lossy-converting them.

Both the samod storage root and the SQLite sidecar URL are built with lossy path formatting. If path contains invalid UTF-8 sequences, the code can silently open a different repository or database than requested.

🛠️ Suggested guard

             StorageConfig::Disk { path } => {
                 std::fs::create_dir_all(&path).wrap_err_with(|| {
                     format!("Failed to create storage directory: {}", path.display())
                 })?;
-                let repo = repo.with_storage(samod::storage::TokioFilesystemStorage::new(
-                    path.to_string_lossy().as_ref(),
-                ));
+                let storage_path = path
+                    .to_str()
+                    .ok_or_else(|| ferr!("repo storage path must be valid UTF-8: {}", path.display()))?;
+                let repo =
+                    repo.with_storage(samod::storage::TokioFilesystemStorage::new(storage_path));
                 let loaded = if let Some(policy) = announce_policy {
                     repo.with_announce_policy(policy).load().await
                 } else {
                     repo.load().await
                 };
-                let sqlite_url = format!("sqlite://{}", path.join("big_repo.sqlite").display());
+                let sqlite_path = path.join("big_repo.sqlite");
+                let sqlite_url = format!(
+                    "sqlite://{}",
+                    sqlite_path.to_str().ok_or_else(|| {
+                        ferr!("sqlite path must be valid UTF-8: {}", sqlite_path.display())
+                    })?
+                );
                 (loaded, sqlite_url)
             }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo.rs` around lines 109 - 123, The code currently uses
path.to_string_lossy() and path.join(...).display() which silently accepts
non-UTF8 paths; instead validate and reject non-UTF8 early: call
path.to_str().ok_or_else(|| anyhow!("storage path contains invalid UTF-8")) and
return the error before constructing the repo or sqlite URL, then pass the
validated &str into repo.with_storage(TokioFilesystemStorage::new(...)) and use
a UTF-8-safe string when building sqlite_url (e.g., construct the sidecar path
with path.join("big_repo.sqlite") and ensure its to_str() is valid before
formatting "sqlite://..."); update the block around repo.with_storage and
sqlite_url to use these checks and propagate the error.

src/am_utils_rs/repo.rs-412-414 (1)

412-414: ⚠️ Potential issue | 🟠 Major

Save and restore the actor after the scoped write.

When set_actor() is called with Some(actor_id), it mutates the document's default actor persistently. Since the same samod::DocHandle is reused across multiple with_document() calls (as seen in repeated flush() calls on the same document), the modified actor leaks into subsequent writes on that handle. Wrap the set_actor() call and transaction in a save/restore pattern to ensure the original actor is restored once the operation completes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo.rs` around lines 412 - 414, The code calls
doc.set_actor(actor.clone()) which mutates the DocHandle's default actor and
leaks into subsequent with_document()/flush() calls; before setting, capture the
current actor (e.g., via doc.actor() or the appropriate getter on
samod::DocHandle), then set the new actor only for the scoped write and restore
the saved actor in a finally/Drop scope (or after the transaction) so the
original actor is reinstated; apply this save/restore pattern around the
set_actor(actor.clone()) call and the surrounding transaction/with_document()
usage to avoid leaking the actor across uses of the same doc handle.

src/daybook_core/index/doc_blobs.rs-140-148 (1)

140-148: ⚠️ Potential issue | 🟠 Major

Emit Deleted when reindexing removes the doc from the blob index.

reindex_doc() can delete all rows for a document when it disappears, loses its Blob facets, or resolves to an empty hash set, but the Upsert worker path still publishes DocBlobsIndexEvent::Updated. Subscribers will miss evictions on blob-removal updates and can retain stale memberships.

Possible fix

-    async fn handle_worker_item(&self, item: DocBlobsIndexWorkItem) -> Res<()> {
+    async fn handle_worker_item(&self, item: DocBlobsIndexWorkItem) -> Res<()> {
         match item {
             DocBlobsIndexWorkItem::Upsert { doc_id, heads } => {
-                self.reindex_doc(&doc_id, &heads).await?;
-                self.registry
-                    .notify([DocBlobsIndexEvent::Updated { doc_id }]);
+                let still_indexed = self.reindex_doc(&doc_id, &heads).await?;
+                self.registry.notify([if still_indexed {
+                    DocBlobsIndexEvent::Updated { doc_id }
+                } else {
+                    DocBlobsIndexEvent::Deleted { doc_id }
+                }]);
             }
             DocBlobsIndexWorkItem::DeleteDoc { doc_id } => {
                 self.delete_doc(&doc_id).await?;
                 self.registry
                     .notify([DocBlobsIndexEvent::Deleted { doc_id }]);
             }
         }
         Ok(())
     }

-    pub async fn reindex_doc(&self, doc_id: &DocId, heads: &ChangeHashSet) -> Res<()> {
+    pub async fn reindex_doc(&self, doc_id: &DocId, heads: &ChangeHashSet) -> Res<bool> {
         let Some(facet_keys) = self.drawer_repo.facet_keys_at_heads(doc_id, heads).await? else {
             self.delete_doc(doc_id).await?;
-            return Ok(());
+            return Ok(false);
         };
         let selected_blob_keys: Vec<FacetKey> = facet_keys
             .into_iter()
             .filter(|facet_key| facet_key.tag == WellKnownFacetTag::Blob.into())
             .collect();
         if selected_blob_keys.is_empty() {
             self.delete_doc(doc_id).await?;
-            return Ok(());
+            return Ok(false);
         };
         // ...
-        self.reindex_doc_hashes(doc_id, heads, &hashes).await
+        self.reindex_doc_hashes(doc_id, heads, &hashes).await?;
+        Ok(!hashes.is_empty())
     }

Also applies to: 154-165, 173-188, 197-203

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/index/doc_blobs.rs` around lines 140 - 148, The Upsert
handling currently always emits DocBlobsIndexEvent::Updated even when
reindex_doc() removed all blob rows; change the logic in the
DocBlobsIndexWorkItem::Upsert match arm to examine reindex_doc()'s result (or
make reindex_doc return an enum/boolean indicating "removed/empty" vs "present")
and call registry.notify with DocBlobsIndexEvent::Deleted { doc_id } when the
doc was evicted (or has zero blob memberships) and DocBlobsIndexEvent::Updated {
doc_id } otherwise; apply the same pattern to the other Upsert-like paths
mentioned (the blocks around the other occurrences) so subscribers receive
Deleted events on complete eviction instead of only Updated.

src/daybook_core/sync/full.rs-1027-1039 (1)

1027-1039: ⚠️ Potential issue | 🟠 Major

Skip stale queued peer-session refreshes.

peer_sessions_to_refresh is filled asynchronously, so the peer can disconnect before this batch runs. The current expect(...) calls will panic the worker on an endpoint that has already been removed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/full.rs` around lines 1027 - 1039,
batch_refresh_peer_sessions currently uses expect(...) on conn_by_peer and
known_peer_set which can panic if peer_sessions_to_refresh contains stale
endpoint_ids; change to gracefully skip entries that no longer exist by
replacing the expect calls with safe lookups (e.g.,
conn_by_peer.get(&endpoint_id) and then known_peer_set.get_mut(conn_id)) and
continue the loop when either lookup returns None, then only call
refresh_peer_partition_session when both lookups succeed; keep ERROR_IMPOSSIBLE
for truly impossible invariants elsewhere but do not use it here for stale
async-driven queue items.

src/daybook_core/sync/full.rs-1197-1203 (1)

1197-1203: ⚠️ Potential issue | 🟠 Major

Clear the pending samod state when a peer deletes a doc.

You increment samod_pending_docs when the first sync request for a (peer, doc) arrives, but the DocDeleted branch only queues docs_to_stop. That leaves the entry behind in samod_doc_set, so the peer can stay permanently stale even though there is no sync work left.

Also applies to: 1228-1243
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/full.rs` around lines 1197 - 1203, When handling
DocDeleted (the branch that currently only pushes into docs_to_stop), also
remove the (endpoint_id, doc_id) entry from the samod_doc_set and clear any
pending counters: if self.conn_by_peer.get(&endpoint_id) yields a conn_id and
self.known_peer_set.get_mut(&conn_id) yields peer_state, decrement
peer_state.samod_pending_docs (use saturating_sub(1)) or otherwise adjust it to
reflect the removal; then delete the key from self.samod_doc_set so the peer
cannot remain stale. Ensure this mirrors the logic used where samod_pending_docs
is incremented (the block referencing conn_by_peer, known_peer_set and
samod_pending_docs) and call refresh_peer_fully_synced_state(endpoint_id).await?
as appropriate.

src/daybook_core/sync/full.rs-1297-1304 (1)

1297-1304: ⚠️ Potential issue | 🟠 Major

Treat a missing local doc as a terminal sync condition, not a hard error.

RequestDocSync comes from remote state and can race with local deletion. If the doc is gone by the time this boots, batch_boot_docs() errors out and the outer worker task unwrap takes down full sync instead of reusing the existing handle_doc_missing_local() cleanup path.

src/am_utils_rs/repo/partition.rs-424-430 (1)

424-430: ⚠️ Potential issue | 🟠 Major

Clamp event page sizes on the server.

req.limit is fully client-controlled and only floored at 1, so a single remote request can force very large SQL scans and allocations for every requested partition. This should be capped to a small server-side maximum before querying.

Also applies to: 447-453
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/partition.rs` around lines 424 - 430, Clamp the
client-provided page size before any work: replace the current unbounded floor
of req.limit with a server-side capped value (e.g. compute a limit as
req.limit.max(1).min(MAX_PAGE_LIMIT) and cast to usize) and use that capped
limit when calling load_member_partition_page and in the other similar block
(the second occurrence around load_member_partition_page call at lines
~447-453). Define or reuse a constant like MAX_PAGE_LIMIT in this module so the
cap is obvious and apply the same capped limit variable (not raw req.limit) for
allocations (Vec capacity) and all SQL/query calls.

src/daybook_core/sync/full.rs-783-798 (1)

783-798: ⚠️ Potential issue | 🟠 Major

Don't crash the worker if the snapshot caller disappears.

If the requester is cancelled after sending GetPeerSyncSnapshot, this expect(...) will panic the full-sync worker. The SetPeer / DelPeer branches already treat a dropped caller as harmless; this branch should do the same.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/full.rs` around lines 783 - 798, The
GetPeerSyncSnapshot handler currently calls resp.send(snapshot).expect(...),
which will panic the full-sync worker if the requester was dropped; change it to
treat a dropped caller as harmless like the SetPeer/DelPeer branches do by
sending the snapshot and ignoring any send error (e.g., perform the send and
discard the Result or check is_err() and do nothing). Update the
Msg::GetPeerSyncSnapshot branch to not unwrap or expect the send result so the
worker doesn't crash when the snapshot caller disappears.

src/am_utils_rs/repo/changes.rs-410-428 (1)

410-428: ⚠️ Potential issue | 🟠 Major

Invoke listeners after releasing the manager locks.

The switchboard currently holds listeners / local_listeners while running arbitrary callbacks. A callback that registers, drops, or otherwise re-enters the manager can deadlock the notifier, and a slow callback blocks every other listener behind the same mutex.

Also applies to: 430-449

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/changes.rs` around lines 410 - 428, The code is invoking
listener callbacks while holding the listeners mutex, risking deadlocks and
blocking; change the logic in the SwitchboardInput::Remote branch (and the
similar block at 430-449) to first collect the callbacks and their corresponding
notification vectors while the lock is held (e.g., build a Vec of
(listener_on_change_clone, notifications) or clone the listener objects/closures
and their relevant_notifications), then drop the listeners lock before iterating
that collection to call each (listener.on_change)(notifications); ensure you
still use notification_matches_filter and preserve listener.filter when building
the collection so all work that touches self.listeners happens under the lock
and only the actual callback invocation runs after the lock is released.

src/daybook_core/sync/full.rs-1729-1733 (1)

1729-1733: ⚠️ Potential issue | 🟠 Major

Ignore stale blob-backoff messages.

Blob workers report back over an async channel, so a backoff message can arrive after the blob was already marked synced or stopped during a partition refresh. expect("blob backoff requested by non-active worker") turns that race into a panic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/full.rs` around lines 1729 - 1733, Replace the
panic-causing remove/expect with a safe conditional: check
self.active_blobs.remove(&hash) with if let Some(active) = ... { await
active.stop_token.stop().await?; } else { return Ok(()); } (or otherwise
early-return/ignore) so stale blob-backoff messages for non-active workers are
ignored instead of panicking; keep using active.stop_token.stop().await? to
propagate stop errors when present.

src/am_utils_rs/repo/partition.rs-802-814 (1)

802-814: ⚠️ Potential issue | 🟠 Major

Make txid allocation atomic.

This is a split read/modify/write against a shared counter. With pooled concurrent writers, two transactions can observe the same next_txid and then race into a lost update or busy/snapshot failure.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/partition.rs` around lines 802 - 814, The alloc_txid
function performs a non-atomic read/modify/write on big_repo_meta (key
META_NEXT_TXID_KEY) and must be made atomic; change alloc_txid to perform the
increment and fetch inside a single atomic DB operation or a transaction with a
write lock: either use a single SQL statement that increments and returns the
new value (e.g. "UPDATE big_repo_meta SET value = CAST(value AS INTEGER)+1 WHERE
key = ? RETURNING value" and subtract 1 to get the allocated id) or begin a
transaction with an immediate write lock (BEGIN IMMEDIATE or sqlx transaction
via conn.begin().await), then SELECT the current value, UPDATE it, and commit
before returning the old value; update references in alloc_txid accordingly to
use the transaction/RETURNING path and ensure errors roll back.

src/daybook_core/sync/full.rs-126-158 (1)

126-158: ⚠️ Potential issue | 🟠 Major

Fix the current Clippy gate failure on SyncProgressMsg.

CI is already red here: Clippy flags enum_variant_names because every variant starts with Blob. Rename the variants or add a targeted allow if the prefix is intentional.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/full.rs` around lines 126 - 158, Clippy flags the
SyncProgressMsg enum for enum_variant_names because every variant is prefixed
with "Blob"; either rename the variants to remove the redundant "Blob" prefix
(e.g., BlobWorkerStarted -> WorkerStarted, BlobDownloadStarted ->
DownloadStarted, etc.) or add a targeted allow attribute to suppress the lint
(place #[allow(clippy::enum_variant_names)] directly above the SyncProgressMsg
declaration) so CI stops failing; update any matching pattern usages (match
arms, constructors) to the new variant names if you rename them.

🟡 Minor comments (8)

AGENTS.md-35-35 (1)

35-35: ⚠️ Potential issue | 🟡 Minor

Fix two small wording issues in guidance text.

These read as typos and reduce clarity.

✏️ Suggested patch

-    - Concealing channel errors hides lifecycle and liveness issues that are hard to diagnose in an actors based program like this.
+    - Concealing channel errors hides lifecycle and liveness issues that are hard to diagnose in an actor-based program like this.
...
-> You're begged not to be lazy wth solutions.
+> You're begged not to be lazy with solutions.

Also applies to: 96-96

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` at line 35, Replace the phrasing "an actors based program" with
the corrected hyphenated form "an actor-based program" in the AGENTS.md guidance
text (update the occurrence at the line containing "Concealing channel errors
hides lifecycle and liveness issues that are hard to diagnose in an actors based
program like this." and the duplicate at the other location). Ensure both
instances use "actor-based" (hyphenated, singular "actor") for clarity and
consistency.

src/am_utils_rs/repo/iroh.rs-308-311 (1)

308-311: ⚠️ Potential issue | 🟡 Minor

Same typos repeated: "absruptly" → "abruptly", "befor" → "before".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/iroh.rs` around lines 308 - 311, Fix the typos in the
user-facing/error strings in the iroh connection-close path: change "absruptly"
to "abruptly" in the close reason passed to connection.close and change "befor"
to "before" in the formatted AcceptError message that includes endpoint_id (the
block creating iroh::protocol::AcceptError::from_boxed and calling
connection.close with the reason string). Ensure both strings are corrected to
read "abruptly" and "before" respectively.

src/am_utils_rs/repo/iroh.rs-264-268 (1)

264-268: ⚠️ Potential issue | 🟡 Minor

Same typos in accept handler: "absruptly" → "abruptly", "befor" → "before".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/iroh.rs` around lines 264 - 268, Fix the typos in the
accept handler error messages: change "absruptly" to "abruptly" and "befor" to
"before" inside the string literals used in the log and in the AcceptError
construction (look for the message "samod connection stream ended absruptly..."
and the formatted message "connection stream ended befor disconnect with
{endpoint_id}" in the accept handler in iroh.rs). Ensure both the logged message
and the AcceptError::from_boxed(...) string use the corrected spellings.

src/am_utils_rs/repo/iroh.rs-108-108 (1)

108-108: ⚠️ Potential issue | 🟡 Minor

Same typo: "absruptly" → "abruptly".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/iroh.rs` at line 108, Fix the typo in the connection
close message: in the conn.close(...) call (the b"samod connection stream ended
absruptly" literal), change "absruptly" to "abruptly" so the byte string becomes
b"samod connection stream ended abruptly".

src/am_utils_rs/repo/iroh.rs-80-84 (1)

80-84: ⚠️ Potential issue | 🟡 Minor

Fix typos in error messages.

Multiple typos in these error messages: "absruptly" → "abruptly", "befor" → "before".

📝 Proposed fix

-                                conn.close(500u32.into(), b"samod connection stream ended absruptly");
+                                conn.close(500u32.into(), b"samod connection stream ended abruptly");
                             eyre::bail!(
-                                "connection stream ended befor disconnect"
+                                "connection stream ended before disconnect"
                             );

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/iroh.rs` around lines 80 - 84, Fix the typos in the
error strings around the connection close and bail calls: update the message
passed to conn.close(500u32.into(), ...) to "samod connection stream ended
abruptly" and change the eyre::bail! invocation to "connection stream ended
before disconnect" so the strings in the conn.close call and eyre::bail! are
spelled correctly; locate these in the iroh.rs function where conn.close and
eyre::bail! are used together.

src/am_utils_rs/repo/iroh.rs-115-115 (1)

115-115: ⚠️ Potential issue | 🟡 Minor

Typo: "atttempts" → "attempts".

📝 Proposed fix

-                                        reason: last_reason.unwrap_or_else(|| "max re-connention atttempts reached".into()),
+                                        reason: last_reason.unwrap_or_else(|| "max reconnection attempts reached".into()),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/iroh.rs` at line 115, The error message string passed in
last_reason.unwrap_or_else(|| "...".into()) contains a typo; update the fallback
literal to read "max reconnection attempts reached" (or similar correct wording)
so the code using last_reason.unwrap_or_else(|| "...".into()) uses the corrected
phrase.

src/am_utils_rs/repo/iroh.rs-129-132 (1)

129-132: ⚠️ Potential issue | 🟡 Minor

Avoid .unwrap() on the spawned future; log errors instead.

Same issue as in node.rs - the spawned task will panic if an error occurs.

🐛 Proposed fix

     let join_handle = tokio::spawn(
         async {
-            fut.await.unwrap();
+            if let Err(err) = fut.await {
+                warn!(?err, "iroh connector task exited with error");
+            }
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/iroh.rs` around lines 129 - 132, The spawned task in
iroh.rs currently calls fut.await.unwrap() (inside the tokio::spawn block),
which will panic on errors; replace the unwrap with proper error handling by
awaiting the future and matching its Result (e.g., if let Err(e) / match
fut.await) and logging the error via the project's logger/tracing (instead of
panicking); keep the spawn and join_handle usage but handle Err(e) from
fut.await and call error!/tracing::error or processLogger.error with contextual
message and the error value.

src/am_utils_rs/sync/node.rs-130-130 (1)

130-130: ⚠️ Potential issue | 🟡 Minor

Avoid .unwrap() on the spawned future; log errors instead.

The current code will panic if the worker loop returns an error. This could cause unexpected crashes in production.

🐛 Proposed fix

-    let join_handle = tokio::spawn(async { fut.await.unwrap() });
+    let join_handle = tokio::spawn(async {
+        if let Err(err) = fut.await {
+            warn!(?err, "SyncNodeWorker exited with error");
+        }
+    });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/sync/node.rs` at line 130, The spawned task currently calls
fut.await.unwrap() which will panic on error; change the tokio::spawn closure to
await the future and handle the Result instead of unwrapping — e.g. replace
tokio::spawn(async { fut.await.unwrap() }) with an async block that matches on
fut.await (or uses if let Err(e) = fut.await) and logs the error via the crate's
logger (e.g. tracing::error! or log::error!) including context about the
worker/loop, then return/exit gracefully; keep the variable names join_handle,
tokio::spawn, and fut.await in place so you edit the same spawn site in node.rs.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 28c3fd47-7945-491f-bf23-6f2723d4ea25

📥 Commits

Reviewing files that changed from the base of the PR and between 846c78f and 0edf524.

⛔ Files ignored due to path filters (5)

Cargo.lock is excluded by !**/*.lock and included by **/*
docs/DEVDOC/log.md is excluded by !docs/DEVDOC/** and included by **/*
docs/DEVDOC/todo.md is excluded by !docs/DEVDOC/** and included by **/*
flake.lock is excluded by !**/*.lock and included by **/*
src/daybook_compose/composeApp/src/commonMain/kotlin/org/example/daybook/uniffi/daybook_ffi.kt is excluded by !src/daybook_compose/composeApp/src/commonMain/kotlin/org/example/daybook/uniffi/** and included by **/*

📒 Files selected for processing (72)

.github/workflows/builds.yml
AGENTS.md
Cargo.toml
src/am_utils_rs/Cargo.toml
src/am_utils_rs/changes.rs
src/am_utils_rs/codecs.rs
src/am_utils_rs/iroh.rs
src/am_utils_rs/lib.rs
src/am_utils_rs/repo.rs
src/am_utils_rs/repo/changes.rs
src/am_utils_rs/repo/changes/broker.rs
src/am_utils_rs/repo/iroh.rs
src/am_utils_rs/repo/iroh/codec.rs
src/am_utils_rs/repo/partition.rs
src/am_utils_rs/sync.rs
src/am_utils_rs/sync/node.rs
src/am_utils_rs/sync/peer.rs
src/am_utils_rs/sync/protocol.rs
src/am_utils_rs/sync/store.rs
src/daybook_cli/Cargo.toml
src/daybook_cli/main.rs
src/daybook_core/Cargo.toml
src/daybook_core/app.rs
src/daybook_core/blobs.rs
src/daybook_core/build.rs
src/daybook_core/config.rs
src/daybook_core/drawer.rs
src/daybook_core/drawer/facet_recovery.rs
src/daybook_core/drawer/types.rs
src/daybook_core/e2e.rs
src/daybook_core/e2e/doc_created_wflow.rs
src/daybook_core/index.rs
src/daybook_core/index/doc_blobs.rs
src/daybook_core/index/facet_ref.rs
src/daybook_core/local_state.rs
src/daybook_core/plugs.rs
src/daybook_core/repo.rs
src/daybook_core/repos.rs
src/daybook_core/rt.rs
src/daybook_core/rt/dispatch.rs
src/daybook_core/rt/switch.rs
src/daybook_core/rt/triage.rs
src/daybook_core/rt/wash_plugin.rs
src/daybook_core/stores.rs
src/daybook_core/sync.rs
src/daybook_core/sync/bootstrap.rs
src/daybook_core/sync/full.rs
src/daybook_core/sync/full/blob_worker.rs
src/daybook_core/sync/full/doc_worker.rs
src/daybook_core/sync/stress.rs
src/daybook_core/sync/tests/stress.rs
src/daybook_core/tables.rs
src/daybook_core/tincans.rs
src/daybook_ffi/repos/blobs.rs
src/daybook_ffi/repos/config.rs
src/daybook_ffi/repos/dispatch.rs
src/daybook_ffi/repos/drawer.rs
src/daybook_ffi/repos/plugs.rs
src/daybook_ffi/repos/tables.rs
src/daybook_ffi/rt.rs
src/daybook_fuse/lib.rs
src/daybook_server/main.rs
src/daybook_types/Cargo.toml
src/daybook_types/doc.rs
src/daybook_types/lib.rs
src/daybook_types/test.rs
src/daybook_types/wit.rs
src/infra/ollama.tf
src/utils_rs/lib.rs
src/utils_rs/testing.rs
src/xtask/Cargo.toml
src/xtask/main.rs

💤 Files with no reviewable changes (4)

src/daybook_core/repos.rs
src/am_utils_rs/iroh.rs
src/daybook_core/local_state.rs
src/am_utils_rs/changes.rs

coderabbitai · 2026-03-13T23:49:24Z

+        Ok(exists == 1)
+    }
+
+    pub async fn list_partitions_for_peer(&self, _peer: &PeerKey) -> Res<Vec<PartitionSummary>> {


⚠️ Potential issue | 🔴 Critical

Actually enforce peer scoping on this API surface.

peer is ignored everywhere here, and get_docs_full_in_partitions() trusts a caller-supplied partition list. As written, any connected peer can enumerate partition IDs, request member/doc pages for arbitrary partitions, subscribe to those partitions, and then fetch the underlying docs, which leaves the AccessDenied / DocAccessDenied paths effectively unused.

Also applies to: 419-423, 442-446, 465-469, 553-558

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/partition.rs` at line 372, The API is ignoring the supplied PeerKey (e.g. in list_partitions_for_peer) and trusting caller-supplied partition lists (e.g. get_docs_full_in_partitions), allowing any peer to enumerate, subscribe, and fetch arbitrary partitions; fix this by enforcing peer scoping: in list_partitions_for_peer, filter and return only partitions that the supplied PeerKey is a member of using the internal membership/index data structures; in get_docs_full_in_partitions and the other affected methods (the ones around the referenced ranges) validate each requested partition against that same membership check and reject with AccessDenied/DocAccessDenied if the peer is not authorized instead of silently using the list, and do not accept caller-provided partition lists unchecked—derive allowed partitions server-side and apply permission checks before returning summaries, pages, subscriptions, or docs.

coderabbitai · 2026-03-13T23:49:24Z

+    fn object_paths(&self, hash: &str) -> Res<ObjectPaths> {
+        if !hash.is_ascii() || hash.len() < 4 {
+            eyre::bail!("invalid blob hash: {hash}");
        }
+        let Some(l0) = hash.get(0..2) else {
+            eyre::bail!("invalid blob hash: {hash}");
+        };
+        let Some(l1) = hash.get(2..4) else {
+            eyre::bail!("invalid blob hash: {hash}");
+        };
+        let dir = self.root.join("objects").join(l0).join(l1);
+        Ok(ObjectPaths {
+            blob: dir.join(format!("{hash}.blob")),
+            meta: dir.join(format!("{hash}.meta")),
+            dir,
+        })


⚠️ Potential issue | 🔴 Critical

Reject non-hash input before building filesystem paths.

object_paths() only checks ASCII and length, then joins raw hash bytes into directories and filenames. A caller can smuggle separators or an absolute prefix through here, and get_path() / put_from_store() will read or create paths outside self.root.

🔒 Proposed guard

fn object_paths(&self, hash: &str) -> Res<ObjectPaths> { - if !hash.is_ascii() || hash.len() < 4 { - eyre::bail!("invalid blob hash: {hash}"); - } + daybook_hash_to_iroh_hash(hash) + .map_err(|_| eyre::eyre!("invalid blob hash: {hash}"))?; let Some(l0) = hash.get(0..2) else { eyre::bail!("invalid blob hash: {hash}"); };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/blobs.rs` around lines 255 - 270, The object_paths function currently trusts the raw hash and can be abused to escape self.root; validate and canonicalize the hash before building paths by rejecting any hash containing non-hex characters or path separators (e.g. '/', '\\'), disallowing '.' or any characters that could form relative/absolute paths, and enforce the minimum length check already present; update object_paths (and ensure callers like get_path and put_from_store use it) to only accept hashes where hash.chars().all(|c| c.is_ascii_hexdigit()) (and no separator/period characters) so the returned ObjectPaths are guaranteed to be under self.root.

coderabbitai

Actionable comments posted: 14

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)

src/daybook_core/config.rs (1)
273-281: ⚠️ Potential issue | 🟠 Major

Don't panic on drifted replicated config state.

hydrate_path_at_heads(...).expect(ERROR_INVALID_PATCH) is consuming synced Automerge data, so a missing or reshaped remote config path can still crash the worker. Handle that None case as a normal error or logged rejection instead of panicking on external input. Based on learnings, "When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!()."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/config.rs` around lines 273 - 281, The current call to
hydrate_path_at_heads(...).await?.expect(ERROR_INVALID_PATCH) can panic if the
replicated config path is missing or reshaped; replace the expect panic with
explicit error handling that returns an eyre error (e.g., eyre::bail! or
Err(eyre::eyre!(...))) or logs and returns a controlled error instead. Locate
the chain off big_repo.hydrate_path_at_heads::<ConfigStore>(&self.app_doc_id,
&heads, automerge::ROOT, vec![ConfigStore::prop().into()]) and handle the Option
result from .await? by matching None to an eyre::bail! with a clear message
referencing ERROR_INVALID_PATCH (or similar) so external/replicated input no
longer causes a panic.
src/daybook_core/repo.rs (1)
204-235: ⚠️ Potential issue | 🟠 Major

Stop big_repo on any post-boot failure.

Once BigRepo::boot succeeds, every later ? here can exit before big_repo_stop is stored in RepoCtx. A failed connector spawn, globals init, init dance, or marker write will leave the repo running in the background after open_inner returns Err. Add a cleanup block that calls big_repo_stop.stop().await before propagating those errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/repo.rs` around lines 204 - 235, After booting BigRepo via
am_utils_rs::BigRepo::boot you must ensure big_repo_stop is awaited on failure
paths to avoid leaving the repo running; modify open_inner so that any
early-returning ? after boot (e.g. errors from big_repo.spawn_ws_connector,
init_from_globals, Self::run_repo_init_dance, mark_repo_initialized, etc.) first
calls big_repo_stop.stop().await (or equivalent) before propagating the error,
and ensure that on the successful path big_repo_stop is stored in RepoCtx as
before; in short, add a cleanup/guard that calls big_repo_stop.stop().await on
all error exits after BigRepo::boot but before RepoCtx takes ownership of
big_repo_stop.
src/daybook_core/plugs.rs (1)
903-915: ⚠️ Potential issue | 🟠 Major

Don't panic on drifted replicated plug manifests.

This hydration is consuming synced manifest data, but expect(ERROR_INVALID_PATCH) will still take down the plugs worker if a remote doc shape drifts or the manifest path disappears. Return or log a normal error here instead of panicking on external input. Based on learnings, "When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!()."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/plugs.rs` around lines 903 - 915, The code currently calls
expect(ERROR_INVALID_PATCH) after hydrate_path_at_heads(...) which will panic if
the manifest path is missing; replace that panic with an eyre::bail! (or return
Err(eyre::eyre(...))) path: match the Option returned by hydrate_path_at_heads
for None and return a descriptive error using eyre::bail! including context
values (e.g., self.app_doc_id, id.clone(), heads.0) so the worker returns an
error instead of panicking; make this change in the block that uses
self.big_repo.hydrate_path_at_heads::<Versioned<ThroughJson<Arc<manifest::PlugManifest>>>>
and remove expect(ERROR_INVALID_PATCH).
src/daybook_cli/main.rs (2)
702-711: ⚠️ Potential issue | 🔴 Critical

Don't finalize the clone before the full-sync pass succeeds.

mark_repo_initialized() now runs before the second RepoCtx::open(...) / wait_for_full_sync(...) phase. If that later phase fails, the destination is left flagged as initialized even though it only contains the bootstrap docs, so subsequent CLI runs will open a partial clone as if it were healthy. Delay the marker or roll it back on failure.

Also applies to: 714-789
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_cli/main.rs` around lines 702 - 711, The code marks the
destination repo initialized (mark_repo_initialized) before the second
clone/full-sync phase (RepoCtx::open and wait_for_full_sync) completes; move the
call to daybook_core::repo::mark_repo_initialized(&destination).await? so it
runs only after RepoCtx::open(...)/wait_for_full_sync(...) have succeeded, or if
you prefer, wrap the full-sync phase in a try/catch and roll back the
initialized marker on any failure (i.e., call an unmark/cleanup path) to ensure
a failed full-sync doesn't leave the repo flagged as initialized; adjust the
sequence around daybook_core::app::globals::set_init_state, RepoCtx::open,
wait_for_full_sync, and big_repo_stop.stop accordingly.
1192-1206: ⚠️ Potential issue | 🔴 Critical

exec ... --branch is inverted and can panic.

The explicit-branch path currently errors when the branch exists, then Line 1206 blindly unwrap()s the missing-branch lookup. That makes valid --branch values unusable and turns invalid ones into a process crash; resolve the lookup with ok_or_else(...) (or negate the condition) and return an error instead.
Possible fix
                             Some(val) => {
-                                if branches.branches.contains_key(val) {
+                                if !branches.branches.contains_key(val) {
                                     eyre::bail!("branch not found for doc: {doc_id} - {val}");
                                 }
                                 daybook_types::doc::BranchPath::from(val.as_str())
                             }
                             None => {
                                 let Some(branch) = branches.main_branch_path() else {
                                     eyre::bail!("no branches found on doc: {doc_id}");
                                 };
                                 branch
                             }
                         };
-                        let heads = branches.branches.get(&branch_path.to_string()).unwrap();
+                        let branch_key = branch_path.to_string();
+                        let heads = branches
+                            .branches
+                            .get(&branch_key)
+                            .ok_or_else(|| eyre::eyre!("branch not found for doc: {doc_id} - {branch_key}"))?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_cli/main.rs` around lines 1192 - 1206, The branch existence check
is inverted and the subsequent branches.branches.get(...).unwrap() can panic; in
the match that builds branch_path (the block using branch: Some(val) => { if
branches.branches.contains_key(val) { eyre::bail!(...) } ... }), flip the
contains_key condition (or remove the negation) so it bails when the branch is
missing, and replace the final
branches.branches.get(&branch_path.to_string()).unwrap() with a safe lookup that
returns an error (e.g., .get(...).ok_or_else(|| eyre::eyre!("branch not found
for doc: {doc_id} - {}", branch_path)) ?) to return a proper eyre error instead
of panicking; keep use of daybook_types::doc::BranchPath::from and
branches.main_branch_path() as-is.
src/daybook_core/tables.rs (1)
411-418: ⚠️ Potential issue | 🔴 Critical

Don't expect() remote table entities at final heads.

TablesRepo::notifs_loop hydrates add/change events at the batch's final heads. A remote batch can legitimately contain add/change plus delete for the same id, which makes hydrate_path_at_heads(...) return None here and currently panics the notif worker; even returning Err would still tear the task down because the spawn wrapper does .expect("error handling notifs"). Handle this as a recoverable skip/log path inside the loop instead of taking the repo down.

Based on learnings: When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!().

Also applies to: 478-500, 511-531, 541-562, 572-593
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/tables.rs` around lines 411 - 418, The notif worker
currently treats hydrate_path_at_heads(...) returning None as a fatal error
(leading to panic via expect in the tokio::spawn wrapper), so update
TablesRepo::notifs_loop to treat a None from hydrate_path_at_heads (or other
remote-validation failures) as a recoverable case: log a warning with context
(id, batch heads, source) and continue the loop instead of returning Err or
panicking; when you need to propagate validation failures from external inputs
use eyre::bail!() inside validation helpers but handle those errors in
notifs_loop by logging and skipping the problematic notif rather than
terminating the task; also remove the .expect("error handling notifs") in the
spawn wrapper (worker_handle) and replace it with non-panicking error handling
(log the error and exit the task gracefully) and apply the same change to the
other notif-handling sites mentioned (the other notifs_loop invocation blocks).
src/daybook_core/rt/dispatch.rs (1)
137-144: ⚠️ Potential issue | 🔴 Critical

Handle remote dispatch churn without panicking.

This repo still has two panic paths on shared automerge input: updates on an existing dispatch hit panic!("dispatch update detected"), and add/remove churn at the same final heads makes hydrate_path_at_heads(...).expect(ERROR_INVALID_PATCH) blow up on DispatchAdded. These need to be handled as recoverable branches inside the loop; returning Err alone still panics because the spawned worker wraps notifs_loop with .expect("error handling notifs").

Based on learnings: When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!().

Also applies to: 205-235, 267-309
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/rt/dispatch.rs` around lines 137 - 144, Replace panic! and
unwrap/expect paths with eyre::bail!/error handling and avoid panicking in the
spawned worker: in notifs_loop replace the panic!("dispatch update detected")
(and any other panic!/unimplemented!()) with eyre::bail!() including context
(e.g., "dispatch update detected for dispatch_id=..."), and change
hydrate_path_at_heads(...).expect(ERROR_INVALID_PATCH) to propagate a Result
(use ? or map_err(|e| eyre::eyre!("{}: {}", ERROR_INVALID_PATCH, e))) so
validation failures return Err rather than panicking; finally, update the
tokio::spawn wrapper around repo.notifs_loop (the place calling
repo.notifs_loop(notif_rx, cancel_token).await.expect("error handling notifs"))
to remove the .expect and handle the Result explicitly (log the eyre::Report on
Err and exit the task cleanly) so returned errors are recoverable and do not
cause a panic.

♻️ Duplicate comments (2)

src/daybook_core/blobs.rs (1)

349-374: ⚠️ Potential issue | 🟠 Major

Reuse the overwrite-safe rename logic in atomic_copy_file().

put_path_copy() still does a preflight try_exists() before calling this helper, so two writers of the same hash can race. The loser will hit the bare tokio::fs::rename(&temp, dest) here, which still fails on Windows once the winner has created dest; give this helper the same replace/retry path as atomic_write().
Does `tokio::fs::rename` overwrite an existing destination on Windows, or does it fail while Unix replaces atomically?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/blobs.rs` around lines 349 - 374, atomic_copy_file currently
does a plain tokio::fs::rename(&temp, dest) which races on Windows when dest is
created by another writer; change atomic_copy_file to use the same
overwrite/retry strategy as atomic_write: attempt tokio::fs::rename(&temp, dest)
inside a short retry loop, and if rename fails due to the destination already
existing (or a platform-specific error), remove the existing dest
(tokio::fs::remove_file(dest).await?), then retry rename once (ensuring errors
are propagated if retry fails); make sure to still sync the parent directory via
self.sync_dir(dir).await and clean up the temp file on unrecoverable errors —
reference atomic_copy_file, atomic_write, and put_path_copy to align behavior.

src/am_utils_rs/repo/partition.rs (1)

375-376: ⚠️ Potential issue | 🔴 Critical

Enforce peer scoping before partition list/read/subscribe operations.

_peer is ignored, and request paths trust caller-provided partition IDs. This leaves list/events/subscription/doc retrieval open to unauthorized partition access unless all callers pre-filter correctly.

🔐 Suggested hardening pattern

-    pub async fn list_partitions_for_peer(&self, _peer: &PeerKey) -> Res<Vec<PartitionSummary>> {
+    pub async fn list_partitions_for_peer(&self, peer: &PeerKey) -> Res<Vec<PartitionSummary>> {
+        let allowed = self.authorized_partitions_for_peer(peer).await?;
+        if allowed.is_empty() {
+            return Ok(Vec::new());
+        }
         // existing query filtered by allowed set
     }
@@
-    pub async fn get_partition_member_events_for_peer(
-        &self,
-        _peer: &PeerKey,
+    pub async fn get_partition_member_events_for_peer(
+        &self,
+        peer: &PeerKey,
         req: &GetPartitionMemberEventsRequest,
     ) -> Res<GetPartitionMemberEventsResponse> {
+        self.assert_peer_allowed_partitions(peer, req.partitions.iter().map(|p| &p.partition_id)).await?;
         ...
     }
@@
-    pub async fn get_partition_doc_events_for_peer(
-        &self,
-        _peer: &PeerKey,
+    pub async fn get_partition_doc_events_for_peer(
+        &self,
+        peer: &PeerKey,
         req: &GetPartitionDocEventsRequest,
     ) -> Res<GetPartitionDocEventsResponse> {
+        self.assert_peer_allowed_partitions(peer, req.partitions.iter().map(|p| &p.partition_id)).await?;
         ...
     }
@@
     pub async fn subscribe_partition_events_for_peer(
         &self,
         peer: &PeerKey,
         reqs: &SubPartitionsRequest,
         capacity: usize,
     ) -> Res<tokio::sync::mpsc::Receiver<SubscriptionItem>> {
+        self.assert_peer_allowed_partitions(peer, reqs.partitions.iter().map(|p| &p.partition_id)).await?;
         ...
     }

Also applies to: 423-467, 582-610

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/partition.rs` around lines 375 - 376,
list_partitions_for_peer currently ignores the _peer argument and trusts
caller-supplied partition IDs; update this and the other affected methods (e.g.,
list_partitions_for_peer, any public list/read/subscribe methods that accept a
PeerKey or partition IDs) to enforce peer scoping by using the PeerKey to fetch
the set of partitions the peer is allowed to access and filtering incoming
requests against that set before returning PartitionSummary or performing
reads/subscriptions; specifically, locate the function list_partitions_for_peer
and replace the ignored _peer usage with a lookup of allowed partition IDs (via
your existing ownership/permission store or an added helper), filter the
partitions you would return or operate on to only those owned/allowed for that
PeerKey, and return an empty/permission-denied result for any requests that
would otherwise access partitions outside that allowed set.

🟡 Minor comments (11)

src/wflow_webui/Cargo.toml-63-63 (1)
63-63: ⚠️ Potential issue | 🟡 Minor

Add --cfg tokio_unstable to enable the tracing instrumentation.

Line 63 correctly adds the tracing feature to tokio, and it is compatible with rt-multi-thread. However, the tracing instrumentation in Tokio is experimental and requires compiling with --cfg tokio_unstable to actually emit runtime/task events. Without this configuration, the feature flag will be present but non-functional. Ensure this cfg is set in your build configuration (e.g., in .cargo/config.toml or via RUSTFLAGS).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/wflow_webui/Cargo.toml` at line 63, The tokio dependency in Cargo.toml
enables the "tracing" feature but requires compiling with the tokio_unstable cfg
to emit runtime/task tracing; update the build config to pass --cfg
tokio_unstable (for example by adding the flag in .cargo/config.toml or
exporting RUSTFLAGS) so the tracing instrumentation for the tokio = { workspace
= true, features = ["rt-multi-thread", "tracing"], optional = true } dependency
is actually enabled at compile time.
src/daybook_ffi/Cargo.toml-50-50 (1)
50-50: ⚠️ Potential issue | 🟡 Minor

Tokio's tracing feature is unstable; ensure --cfg tokio_unstable is configured.

The tracing feature in Tokio is an unstable feature that requires --cfg tokio_unstable to work. The workspace has this configured in .cargo/config.toml under [build], so the change is compatible. This feature enables Tokio runtime instrumentation/events for tools like tokio-console, which aligns with the existing tracing/tracing-subscriber dependencies in this file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_ffi/Cargo.toml` at line 50, The tokio dependency in Cargo.toml
enables the unstable "tracing" feature (the line with tokio = { workspace =
true, features = ["rt-multi-thread","sync", "macros", "fs", "tracing"] }), which
requires the compiler cfg flag; ensure the build is configured to pass --cfg
tokio_unstable (add or verify in .cargo/config.toml under [build] that rustflags
includes "--cfg" "tokio_unstable"), or alternatively gate the tracing feature
behind a stable feature flag and only enable it when the cfg is set; update
config or gating so building with the tokio "tracing" feature succeeds.
src/mltools/Cargo.toml-37-37 (1)
37-37: ⚠️ Potential issue | 🟡 Minor

Note the unstable requirement for tokio's tracing feature on Line 37.

Adding tokio's tracing feature is syntactically correct, but be aware that this feature is experimental in Tokio 1.x and requires compiling with --cfg tokio_unstable for the tracing integration to actually be enabled. Without that compiler flag, specifying "tracing" in Cargo.toml alone won't activate the feature—it will be silently ignored due to Tokio's conditional dependency configuration.

Verify that your build setup or CI pipeline includes this flag if you intend to use Tokio's tracing instrumentation (e.g., for tokio-console).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mltools/Cargo.toml` at line 37, The Cargo.toml declares tokio = {
workspace = true, features = ["rt", "tracing"] } but Tokio's "tracing" feature
is unstable and requires compiling with --cfg tokio_unstable to actually enable
it; either ensure your build/CI adds the compiler flag (e.g., RUSTFLAGS="--cfg
tokio_unstable" or pass --cfg tokio_unstable to rustc in your pipeline) when
building/running code that relies on tokio's tracing integration (tokio feature
"tracing"), or remove the "tracing" feature from the tokio entry to avoid a
silent no-op if you don't intend to enable the unstable tracing support.
src/daybook_core/sync/stress.rs-70-83 (1)
70-83: ⚠️ Potential issue | 🟡 Minor

Use unique event indices in transfer phase.

At Line 82, every transfer-phase event uses the same idx (EVENT_COUNT), which reduces mutation diversity (e.g., repeated "branch-mod-64" writes).
🎯 Suggested fix
-for _ in 0..(EVENT_COUNT / 2) {
+for transfer_idx in 0..(EVENT_COUNT / 2) {
@@
-    let _ = apply_event(node, kind, EVENT_COUNT, &mut rng).await?;
+    let _ = apply_event(node, kind, EVENT_COUNT + transfer_idx, &mut rng).await?;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/stress.rs` around lines 70 - 83, The transfer-phase
loop currently calls apply_event(node, kind, EVENT_COUNT, &mut rng) which passes
the same EVENT_COUNT index for every event; change this to generate or derive a
unique event index per iteration (e.g., use the loop index, a counter, or
rng.gen_range(0..EVENT_COUNT)) and pass that variable instead of EVENT_COUNT so
each transfer uses a distinct idx; update the loop in
src/daybook_core/sync/stress.rs where active, node, kind are chosen to compute
let idx = ... and call apply_event(node, kind, idx, &mut rng).
AGENTS.md-112-113 (1)
112-113: ⚠️ Potential issue | 🟡 Minor

Typo: "mantain" → "maintain".
✏️ Proposed fix
-Human auditors of this repo don't really read test code or mantain it.
+Human auditors of this repo don't really read test code or maintain it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 112 - 113, Fix the spelling typo in the AGENTS.md
sentence: change "mantain" to "maintain" in the line that reads "Human auditors
of this repo don't really read test code or mantain it." — update that string in
AGENTS.md so it reads "maintain" to correct the typo.
docs/dict.md-19-19 (1)
19-19: ⚠️ Potential issue | 🟡 Minor

Typo: "peices" → "pieces".
✏️ Proposed fix
-Documents are mainly made up of facets which are JSON objects describing the different peices of the document.
+Documents are mainly made up of facets which are JSON objects describing the different pieces of the document.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/dict.md` at line 19, Replace the typo "peices" with the correct spelling
"pieces" in the sentence "Documents are mainly made up of facets which are JSON
objects describing the different peices of the document." inside docs/dict.md so
it reads "...describing the different pieces of the document."
docs/dict.md-94-94 (1)
94-94: ⚠️ Potential issue | 🟡 Minor

Typo: "refefrence" → "reference".
✏️ Proposed fix
-If the heads set is empty, it's assumed that the referred to facet exists at the same change hash of the facet holding the refefrence. 
+If the heads set is empty, it's assumed that the referred-to facet exists at the same change hash of the facet holding the reference.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/dict.md` at line 94, Fix the typo "refefrence" to "reference" in the
sentence that reads "If the heads set is empty, it's assumed that the referred
to facet exists at the same change hash of the facet holding the refefrence."
Update the word near the phrase "facet holding the reference" so the sentence
reads correctly and keep surrounding punctuation and capitalization unchanged.
docs/dict.md-10-11 (1)
10-11: ⚠️ Potential issue | 🟡 Minor

Minor grammar issues.

Line 10: "weather" → "whether"
Line 11: Sentence appears incomplete: "whenever a working connection between devices" — consider "whenever there is a working connection between devices" or similar.
✏️ Proposed fix
-Changes can be made on any replicas of the same repo, weather it is to add a new document or to modify an existing one.
-Daybook ensures then that these changes are synced to other replicas whenever a working connection between devices.
+Changes can be made on any replicas of the same repo, whether it is to add a new document or to modify an existing one.
+Daybook ensures that these changes are synced to other replicas whenever there is a working connection between devices.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/dict.md` around lines 10 - 11, Fix the two grammar issues in the two
sentences shown: change "weather" to "whether" and adjust word order and
completeness — e.g., rewrite the first sentence to "Changes can be made on any
replica of the same repo, whether it is to add a new document or to modify an
existing one." and rewrite the second sentence to "Daybook then ensures that
these changes are synced to other replicas whenever there is a working
connection between devices." Update the exact phrases in the doc to match this
corrected wording.
src/am_utils_rs/repo/iroh.rs-145-148 (1)
145-148: ⚠️ Potential issue | 🟡 Minor

Potential panic in spawned task hides error context.

The .unwrap() on fut.await will panic if the connection handling fails, potentially crashing the task without useful error context. Consider logging the error or using .unwrap_or_log() for consistency with other error handling in this codebase.
Suggested fix
     let join_handle = tokio::spawn(
         async {
-            fut.await.unwrap();
+            fut.await.unwrap_or_log();
         }
         .instrument(tracing::info_span!(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/iroh.rs` around lines 145 - 148, The spawned task
currently calls fut.await.unwrap() inside tokio::spawn (creating join_handle),
which can panic and lose error context; change the task to await the future and
handle Err by logging the error (e.g., use fut.await.map_err(|e| { /* log with
processLogger/tracing/error! */ }) or match on fut.await and call a logger)
instead of unwrap so failures are recorded; update the closure used in
tokio::spawn (the block that references fut and produces join_handle) to log the
error with the repository's standard logger/unwrap_or_log helper and avoid
panicking.
src/am_utils_rs/sync/store.rs-163-163 (1)
163-163: ⚠️ Potential issue | 🟡 Minor

Spawned task panics on error.

Similar to the iroh.rs issue, the .unwrap() on the spawned future will panic without context if an error occurs. Consider using .unwrap_or_log() for better observability.
Suggested fix
-    let join_handle = tokio::spawn(async { fut.await.unwrap() });
+    let join_handle = tokio::spawn(async { fut.await.unwrap_or_log() });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/sync/store.rs` at line 163, The spawned task currently calls
fut.await.unwrap() inside tokio::spawn which will panic without context; update
the spawn closure to handle errors instead of unwrap by calling
fut.await.unwrap_or_log() (or otherwise match/map the Result and log the error)
so the spawned task logs failures rather than panics—look for the tokio::spawn
invocation that assigns join_handle and replace the inner fut.await.unwrap()
with fut.await.unwrap_or_log() or an explicit error logging branch.
src/daybook_core/plugs.rs-1723-1747 (1)
1723-1747: ⚠️ Potential issue | 🟡 Minor

setup_repo fixture leaks background repo workers.

This fixture drops both BigRepoStopToken and RepoStopToken without calling stop(). Neither type has a Drop implementation, so cleanup only occurs if stop() is explicitly called. The background workers spawned by BigRepo::boot() and PlugsRepo::load() will continue running after the fixture returns, affecting all 12+ tests that use setup_repo.

Return both stop tokens from the fixture so tests can properly shut them down:
async fn setup_repo() -> Res<(SharedBigRepo, Arc<PlugsRepo>, DocumentId, tempfile::TempDir, BigRepoStopToken, RepoStopToken)>
Then update each test to call .stop() on the tokens before completion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/plugs.rs` around lines 1723 - 1747, The setup_repo fixture
currently drops background stop tokens returned by BigRepo::boot and
PlugsRepo::load, leaking workers; change setup_repo's signature to return the
stop tokens (e.g. add BigRepoStopToken and RepoStopToken to the returned tuple)
by capturing the second values from BigRepo::boot(...) and PlugsRepo::load(...),
and update all tests that call setup_repo to receive those tokens and call
.stop() on each token before test completion (use the BigRepoStopToken.stop()
and RepoStopToken.stop() methods). Ensure the returned tuple ordering and types
(SharedBigRepo, Arc<PlugsRepo>, DocumentId, tempfile::TempDir, BigRepoStopToken,
RepoStopToken) match callers.

🧹 Nitpick comments (13)

src/daybook_core/drawer/facet_recovery.rs (1)

9-15: Add targeted tests for recover_facet_heads_at and drift/error branches.

The new read_heads branch (get_at + length_at) is currently unexercised in this file’s tests. Please add:

a historical-heads test for recover_facet_heads_at, and
at least one malformed/external-drift shape test asserting an Err via bail!().

✅ Suggested test additions

 #[cfg(test)]
 mod tests {
     use super::*;
@@
     fn test_recover_facet_heads_merge() -> Res<()> {
@@
         Ok(())
     }
+
+    #[test]
+    fn test_recover_facet_heads_at_historical_heads() -> Res<()> {
+        let mut doc = Automerge::new();
+        let facet_key = FacetKey::from(WellKnownFacetTag::Note);
+        let facet_key_str = facet_key.to_string();
+        let dmeta_key = format!("{}/main", WellKnownFacetTag::Dmeta.as_str());
+
+        let mut tx = doc.transaction();
+        let facets_id = tx.put_object(automerge::ROOT, "facets", ObjType::Map)?;
+        let dmeta_id = tx.put_object(&facets_id, &dmeta_key, ObjType::Map)?;
+        let dmeta_facets_id = tx.put_object(&dmeta_id, "facets", ObjType::Map)?;
+        let facet_meta_id = tx.put_object(&dmeta_facets_id, &facet_key_str, ObjType::Map)?;
+        let updated_at_id = tx.put_object(&facet_meta_id, "updatedAt", ObjType::List)?;
+        tx.insert(&updated_at_id, 0, 1000i64)?;
+        let first_hash = tx.commit().0.expect("commit should produce hash");
+
+        let historical_heads = doc.get_heads();
+
+        let mut tx2 = doc.transaction();
+        let facets = tx2.get(automerge::ROOT, "facets")?.unwrap().1;
+        let dmeta = tx2.get(&facets, &dmeta_key)?.unwrap().1;
+        let dmeta_facets = tx2.get(&dmeta, "facets")?.unwrap().1;
+        let facet_meta = tx2.get(&dmeta_facets, &facet_key_str)?.unwrap().1;
+        let updated_at = tx2.get(&facet_meta, "updatedAt")?.unwrap().1;
+        tx2.insert(&updated_at, 1, 1001i64)?;
+        tx2.commit();
+
+        let heads_at_old = recover_facet_heads_at(&doc, &facet_key, &historical_heads)?;
+        assert_eq!(heads_at_old, vec![first_hash]);
+        Ok(())
+    }
+
+    #[test]
+    fn test_recover_facet_heads_drifted_shape_returns_err() {
+        let doc = Automerge::new();
+        let facet_key = FacetKey::from(WellKnownFacetTag::Note);
+        let res = recover_facet_heads(&doc, &facet_key);
+        assert!(res.is_err());
+    }
 }

Based on learnings: “When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() …; consider adding unit tests that simulate external input drift and ensure bail!() paths are exercised.” and as per coding guidelines: “When working with Rust, use small tests in addition to cargo clippy to validate ideas.”

Also applies to: 50-53

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/drawer/facet_recovery.rs` around lines 9 - 15, The test
suite lacks coverage for the new read_heads branch used by
recover_facet_heads_at (which calls recover_facet_heads_inner with Some(heads))
and the external-drift/error path that should bail!(); add two unit tests: one
"historical-heads" test that constructs an Automerge doc and facet metadata to
exercise the get_at + length_at branch via recover_facet_heads_at and asserts
the expected Vec<ChangeHash> result, and one "malformed/external-drift" test
that feeds a deliberately malformed dmeta/Automerge shape to
recover_facet_heads_at (or recover_facet_heads_inner via same call pattern) and
asserts that the call returns an Err (triggering eyre::bail!()); use the
function names recover_facet_heads_at, recover_facet_heads_inner, and the
read_heads logic (get_at and length_at) to locate where to construct inputs and
assert error handling.

src/daybook_core/sync/stress.rs (1)

110-133: Add a focused invariant test for topology generation.

generate_connected_edges is foundational for this stress suite. A small deterministic test (fixed seeds) for connectedness, canonical edge ordering, and no self-edges would make failures easier to localize.

As per coding guidelines: “When working with Rust, use small tests in addition to cargo clippy to validate ideas.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/stress.rs` around lines 110 - 133, Add a small
deterministic unit test for the generate_connected_edges function that uses
fixed StdRng seeds to validate the topology invariants: ensure the returned
Vec<(usize, usize)> contains no self-edges (a != b), each tuple is canonical
(first < second consistent with canon_edge), and the undirected graph is
connected given NODE_COUNT (use a simple BFS/DFS on the edges to verify all
nodes reachable). Run the test for a couple of different fixed seeds to catch
RNG-dependent bugs and place it near other tests for sync/stress; reference
generate_connected_edges, canon_edge and NODE_COUNT to locate the code.

src/am_utils_rs/Cargo.toml (1)

49-56: Consider workspace-level version management for consistency.

camino, irpc, tokio-tungstenite, and futures-buffered are pinned directly here rather than via workspace. If these are used across multiple crates, consider promoting them to workspace dependencies for easier version management.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/Cargo.toml` around lines 49 - 56, Several dependencies
(camino, irpc, tokio-tungstenite, futures-buffered) are pinned locally instead
of managed at workspace level; move their version declarations to the workspace
Cargo.toml and mark them as workspace dependencies here (or remove the explicit
version and use workspace = true) so all crates share a single source of truth;
specifically add entries for camino = "1.2.2", irpc = "0.12.0" (with features =
["derive"] if needed), tokio-tungstenite and futures-buffered into the workspace
[dependencies] (or [patch.crates-io] if pinning), then update this crate’s
Cargo.toml to reference them via workspace = true or make them optional here as
needed (adjust features like "sqlite"/"runtime-tokio" only where required).

src/am_utils_rs/sync/protocol.rs (2)

36-42: Redundant has_more field in PartitionCursorPage.

The FIXME on line 40 correctly identifies that has_more may be redundant when next_cursor: Option<CursorIndex> already indicates whether more data exists. Consider removing has_more to simplify the API, unless there's a specific use case where next_cursor is Some but has_more is false.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/sync/protocol.rs` around lines 36 - 42, The
PartitionCursorPage struct contains a redundant has_more boolean because
next_cursor: Option<CursorIndex> already expresses whether there is a next page;
remove the has_more field from the struct definition (PartitionCursorPage) and
update any serialization/deserialization, constructors, uses, and tests that
reference has_more to rely solely on next_cursor.is_some() (or equivalent)
instead; if there are places that previously set has_more independently, replace
those with has_more = next_cursor.is_some() logic or remove the parameter
entirely to keep the API consistent.

5-9: Acknowledged FIXME: Consider efficiency improvements for frequently cloned types.

The FIXME note about using Arc<str> or [u8; 32] for PartitionId and PeerKey is valid. These types are cloned frequently in RPC paths. Consider addressing this in a follow-up to reduce allocation overhead.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/sync/protocol.rs` around lines 5 - 9, The PartitionId and
PeerKey type aliases are currently String and are cloned frequently; change them
to a cheaper, shareable representation such as Arc<str> (or a fixed-size [u8;
32] newtype if keys are fixed-length) to reduce allocation overhead: replace pub
type PartitionId = String; and pub type PeerKey = String; with pub type
PartitionId = Arc<str>; pub type PeerKey = Arc<str>; (or define a Key([u8;32])
newtype and use that) and then update all code paths that construct or clone
these types to use Arc::from(s) or Key::from_bytes, add From<&str>/From<String>
impls and adjust serde/serialization derives (or implement
Serialize/Deserialize) so conversions remain ergonomic and cloning becomes cheap
for CursorIndex leave as u64.

src/daybook_core/index/doc_blobs.rs (2)

442-452: Test timeout of 60 seconds may be excessive for CI.

The wait_for_hash helper uses a 60-second timeout. While this is generous for reliability, it could slow down test failure feedback. Consider a shorter timeout (e.g., 10-15 seconds) with the option to increase via environment variable for slow CI environments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/index/doc_blobs.rs` around lines 442 - 452, The
wait_for_hash helper uses a hardcoded 60s deadline which slows CI; change the
timeout in wait_for_hash (function) from 60s to a shorter default (e.g., 10–15s)
and make it configurable via an environment variable (e.g.,
DOCBLOBS_WAIT_TIMEOUT_SECS) so DocBlobsIndexRepo tests can override in slow CI;
update the deadline construction and document that list_hashes_for_doc polling
still uses the same 50ms sleep interval and ensure the env value is parsed
safely with a fallback to the new default.
202-233: Potential inefficiency: DELETE followed by INSERT for reindex.

The reindex_doc_hashes method deletes all existing entries for a doc and re-inserts them. For documents with many blob references where only one changes, this could be inefficient. However, for the expected use case (small number of blobs per doc), this simplicity is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/index/doc_blobs.rs` around lines 202 - 233, The
reindex_doc_hashes function currently deletes all rows for a doc then re-inserts
every hash which is inefficient for docs where only a subset of blob refs
changed; instead, query existing blob_hashes for the given doc_id from
doc_blob_refs, compute the set difference against the provided hashes to derive
to_delete and to_insert, execute a single DELETE ... WHERE doc_id = ? AND
blob_hash IN (...) for to_delete, and use QueryBuilder to insert only to_insert
(and update origin_heads for rows that remain if heads changed); keep references
to reindex_doc_hashes, doc_blob_refs, QueryBuilder, hashes, heads and
serialized_heads when locating and changing the logic.

src/daybook_server/main.rs (2)

102-103: Unused channel sender.

_doc_setup_req_tx is created but never used (underscore prefix indicates intentional non-use). If the doc setup request flow is no longer needed, consider removing the channel entirely to reduce code complexity.

Suggested cleanup

-        let (_doc_setup_req_tx, mut doc_setup_req_rx) =
-            mpsc::unbounded_channel::<DocSetupRequest>();
+        let (doc_setup_req_tx, mut doc_setup_req_rx) =
+            mpsc::unbounded_channel::<DocSetupRequest>();

Or if the entire doc_setup flow is deprecated, remove the channel, worker, and related code.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_server/main.rs` around lines 102 - 103, The sender variable
_doc_setup_req_tx is created but never used; either remove the entire doc_setup
channel and related flow or wire it up: delete the let (_doc_setup_req_tx, mut
doc_setup_req_rx) = mpsc::unbounded_channel::<DocSetupRequest>(); line and any
code that references doc_setup_req_rx/DocSetupRequest if the doc setup flow is
deprecated, or keep the channel but use the sender (rename to doc_setup_req_tx)
where DocSetupRequest messages should be sent (e.g., from the component that
initiates doc setup) and ensure receivers (doc_setup_req_rx) are polled; update
related worker/task creation and imports accordingly.

104-111: BigRepo boot uses hardcoded path.

The storage path /tmp/samod-sync is hardcoded. Consider making this configurable via environment variable or config for production deployments.

Suggested improvement

         let (big_repo, _big_repo_stop) = am_utils_rs::BigRepo::boot(am_utils_rs::repo::Config {
             peer_id: "daybook_sync".to_string(),
             storage: am_utils_rs::repo::StorageConfig::Disk {
-                path: "/tmp/samod-sync".into(),
+                path: utils_rs::get_env_var("SAMOD_STORAGE_PATH")
+                    .unwrap_or_else(|_| "/tmp/samod-sync".to_string())
+                    .into(),
                 big_repo_sqlite_url: None,
             },
         })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_server/main.rs` around lines 104 - 111, The BigRepo boot call
currently hardcodes the storage path "/tmp/samod-sync" in the
am_utils_rs::repo::StorageConfig::Disk config; change this to read from a
configurable source (e.g., an environment variable like DAYBOOK_SYNC_PATH or a
config struct) and fall back to "/tmp/samod-sync" if not set, then pass that
variable into the path field in the BigRepo::boot invocation so deployments can
override the storage location without code changes.

src/am_utils_rs/repo/iroh.rs (1)

269-271: Consider propagating error instead of panicking on acceptor creation.

The .expect("error making acceptor") will panic if acceptor creation fails. Since this involves external peer connections, returning an error via AcceptError would be more robust and consistent with the error handling pattern used elsewhere in this function.

Suggested fix

         let acceptor = repo
             .make_acceptor(
                 Url::parse(&format!(
                     "{CONN_URL_SCHEME}:{}",
                     utils_rs::hash::encode_base58_multibase(endpoint_id)
                 ))
                 .expect(ERROR_IMPOSSIBLE),
             )
-            .expect("error making acceptor");
+            .map_err(|err| {
+                let msg = format!("error making acceptor for {endpoint_id}: {err:?}");
+                connection.close(500u32.into(), msg.as_bytes());
+                iroh::protocol::AcceptError::from_boxed(Box::from(msg))
+            })?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/iroh.rs` around lines 269 - 271, Replace the
panic-causing .expect("error making acceptor") with proper error propagation:
change the acceptor creation call that currently chains
.expect(ERROR_IMPOSSIBLE) and .expect("error making acceptor") to return a
Result and propagate errors into the function's AcceptError path (e.g., use ? or
.map_err(|e| AcceptError::new(...)) to convert the underlying error into
AcceptError). Keep ERROR_IMPOSSIBLE logic as needed but ensure the final failure
is returned rather than panicked so callers receive an AcceptError instead of a
panic.

src/am_utils_rs/sync/store.rs (1)

173-197: Schema lacks foreign key constraint.

The sync_partition_cursors table references peer_key but doesn't have a foreign key constraint to sync_known_peers. This is intentional based on the comment on line 220-221 (cursors are preserved across unregister/register), but could lead to orphaned cursor rows if peers are never re-registered. Consider adding a cleanup mechanism or documenting this behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/sync/store.rs` around lines 173 - 197, The schema in
ensure_schema defines sync_partition_cursors referencing peer_key but lacks a
foreign key to sync_known_peers, which can leave orphaned cursor rows; fix by
adding a cleanup path instead of changing current preservation behavior:
implement a function (e.g., remove_peer_cursors or cleanup_orphaned_cursors)
that deletes rows from sync_partition_cursors for a given peer_key (or deletes
rows whose peer_key is not present in sync_known_peers) and call it from the
peer-unregister flow (where peers are removed) or wire it into a periodic
maintenance task; keep ensure_schema and table definitions unchanged but ensure
unregister_peer (or equivalent) invokes the new cleanup routine or document this
behavior clearly.

src/am_utils_rs/repo/changes.rs (1)

196-264: De-duplicate broker spawn logic in add_doc_listener.

Both map-entry branches repeat nearly identical broker/lease creation code. Extracting a shared helper will reduce drift risk.

♻️ Refactor sketch

+    fn build_doc_broker_lease(
+        &self,
+        handle: DocHandle,
+    ) -> Res<Arc<DocChangeBrokerLease>> {
+        let (broker, stop_token) = broker::spawn_doc_listener(
+            handle,
+            self.cancel_token.child_token(),
+            self.change_tx.clone(),
+            self.head_tx.clone(),
+            Arc::new({
+                let listeners = Arc::clone(&self.listeners);
+                move |doc_id, origin| {
+                    let listeners = listeners.lock().expect(ERROR_MUTEX);
+                    listeners.iter().any(|listener| {
+                        let doc_ok = listener
+                            .filter
+                            .doc_id
+                            .as_ref()
+                            .map(|target| target.doc_id == *doc_id)
+                            .unwrap_or(true);
+                        let origin_ok = listener
+                            .filter
+                            .origin
+                            .as_ref()
+                            .map(|target| origin_matches_filter(origin, *target))
+                            .unwrap_or(true);
+                        doc_ok && origin_ok
+                    })
+                }
+            }),
+        )?;
+        Ok(Arc::new(DocChangeBrokerLease {
+            handle: Arc::new(broker),
+            stop_token: std::sync::Mutex::new(Some(stop_token)),
+        }))
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/repo/changes.rs` around lines 196 - 264, The two branches in
add_doc_listener duplicate broker and lease creation; extract that logic into a
helper (e.g., spawn_doc_change_lease or make_doc_broker_lease) that takes the
shared inputs (handle, self.cancel_token.child_token(), self.change_tx.clone(),
self.head_tx.clone(), Arc::clone(&self.listeners)) and returns
Result<Arc<DocChangeBrokerLease>, Error> so both Entry::Occupied and
Entry::Vacant call it, insert Arc::downgrade(&lease) into the map, and return
Ok(lease); ensure the helper builds the same closure (locking listeners with
ERROR_MUTEX), forwards the ? from broker::spawn_doc_listener, wraps broker and
stop_token into DocChangeBrokerLease with Arc/Mutex as before, and preserves
ownership/cloning semantics.

src/daybook_core/sync/full.rs (1)

1660-1679: Avoid retry/backoff churn when there are zero blob peers.

At Line 1662, retry state is advanced even though no peer can satisfy the download. Consider deferring scheduling until peer availability changes, instead of burning retry budget.

♻️ Suggested simplification

             if peers.is_empty() {
-                let prior = prior_pending.unwrap_or(PendingBlobSyncState {
-                    attempt_no: 0,
-                    last_backoff: Duration::from_millis(0),
-                    last_attempt_at: std::time::Instant::now(),
-                    due_at: std::time::Instant::now(),
-                });
-                let delay = next_backoff_delay(prior.last_backoff, Duration::from_millis(500));
-                self.pending_blobs.insert(
-                    hash.clone(),
-                    PendingBlobSyncState {
-                        attempt_no: prior.attempt_no + 1,
-                        last_backoff: delay,
-                        last_attempt_at: std::time::Instant::now(),
-                        due_at: std::time::Instant::now() + delay,
-                    },
-                );
+                self.pending_blobs.entry(hash.clone()).or_insert_with(|| PendingBlobSyncState {
+                    attempt_no: 0,
+                    last_backoff: Duration::from_millis(0),
+                    last_attempt_at: std::time::Instant::now(),
+                    due_at: std::time::Instant::now(),
+                });
             } else {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/full.rs` around lines 1660 - 1679, The code currently
advances retry/backoff when peers.is_empty() by inserting a new
PendingBlobSyncState (using prior_pending.unwrap_or and next_backoff_delay)
which burns retry budget even though no peers can satisfy the download; change
the branch so that when peers.is_empty() you do not advance or replace pending
state—i.e., do not call self.pending_blobs.insert(...) and instead leave
prior_pending untouched (or skip scheduling entirely) so retries are only
advanced when there are actual peers; use the existing symbols peers,
prior_pending, PendingBlobSyncState, next_backoff_delay, pending_blobs, and hash
to locate and implement this guard.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 71dec451-8b05-425c-80be-af38191d3fab

📥 Commits

Reviewing files that changed from the base of the PR and between 0edf524 and 0274b9c.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock and included by **/*
docs/DEVDOC/todo.md is excluded by !docs/DEVDOC/** and included by **/*

📒 Files selected for processing (50)

AGENTS.md
CONTRIBUTING.md
README.md
docs/README.md
docs/dict.md
src/am_utils_rs/Cargo.toml
src/am_utils_rs/repo.rs
src/am_utils_rs/repo/changes.rs
src/am_utils_rs/repo/changes/broker.rs
src/am_utils_rs/repo/iroh.rs
src/am_utils_rs/repo/partition.rs
src/am_utils_rs/sync.rs
src/am_utils_rs/sync/node.rs
src/am_utils_rs/sync/peer.rs
src/am_utils_rs/sync/protocol.rs
src/am_utils_rs/sync/store.rs
src/api_utils_rs/Cargo.toml
src/daybook_cli/Cargo.toml
src/daybook_cli/main.rs
src/daybook_core/blobs.rs
src/daybook_core/build.rs
src/daybook_core/config.rs
src/daybook_core/drawer.rs
src/daybook_core/drawer/facet_recovery.rs
src/daybook_core/e2e.rs
src/daybook_core/index/doc_blobs.rs
src/daybook_core/plugs.rs
src/daybook_core/repo.rs
src/daybook_core/rt/dispatch.rs
src/daybook_core/stores.rs
src/daybook_core/sync.rs
src/daybook_core/sync/bootstrap.rs
src/daybook_core/sync/full.rs
src/daybook_core/sync/full/doc_worker.rs
src/daybook_core/sync/stress.rs
src/daybook_core/tables.rs
src/daybook_ffi/Cargo.toml
src/daybook_fuse/lib.rs
src/daybook_server/Cargo.toml
src/daybook_server/main.rs
src/daybook_types/doc.rs
src/mltools/Cargo.toml
src/pauperfuse/Cargo.toml
src/tests_http/Cargo.toml
src/utils_rs/Cargo.toml
src/utils_rs/lib.rs
src/wash_plugin_pglite/Cargo.toml
src/wflow/Cargo.toml
src/wflow_webui/Cargo.toml
src/xtask/Cargo.toml

✅ Files skipped from review due to trivial changes (1)

docs/README.md

🚧 Files skipped from review as they are similar to previous changes (3)

src/daybook_core/build.rs
src/daybook_cli/Cargo.toml
src/xtask/Cargo.toml

coderabbitai · 2026-03-15T09:06:21Z

+fn map_store_err(err: eyre::Report) -> PartitionSyncError {
+    map_repo_err(err)
+}
+
+fn map_repo_err(err: eyre::Report) -> PartitionSyncError {
+    PartitionSyncError::Internal {
+        message: err.to_string(),
+    }


⚠️ Potential issue | 🟠 Major

Don't send raw internal errors back to peers.

err.to_string() forwards the full eyre report over the RPC boundary. That can disclose local paths, SQL details, or other internals to remote peers. Log the detailed error locally and return a stable, sanitized protocol message instead.

Suggested sanitization

fn map_repo_err(err: eyre::Report) -> PartitionSyncError { + warn!(?err, "sync node repo error"); PartitionSyncError::Internal { - message: err.to_string(), + message: "internal sync node error".into(), } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/node.rs` around lines 335 - 342, The current map_store_err/map_repo_err functions leak the full eyre::Report to peers by using err.to_string(); instead, log or report the full error locally (e.g. via tracing::error! or log::error! with the err/Debug) and return a stable, sanitized PartitionSyncError payload (for example a fixed message like "internal server error" or an opaque error code) from PartitionSyncError::Internal so no local paths/SQL/details are sent over RPC; update map_store_err and map_repo_err to perform local logging of err and construct the sanitized message instead of err.to_string().

fix: formatting

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

src/daybook_types/doc.rs (1)
83-90: ⚠️ Potential issue | 🔴 Critical

Make Dmeta.actors backward-compatible.

Adding actors as a required field means older docs without it now fail to deserialize, so mixed-version devices can no longer hydrate the same Dmeta. Give it #[serde(default)] so missing maps deserialize as empty.
Suggested fix
         Dmeta struct {
             pub id: DocId,
             // FIXME: unix timestamp codec
             pub created_at: Timestamp,
             pub updated_at: Vec<Timestamp>,
+            #[serde(default)]
             pub actors: HashMap<String, UserMeta>,
             pub facet_uuids: HashMap<Uuid,FacetKey>,
             pub facets: HashMap<FacetKey, FacetMeta>
         },
Based on learnings: Data structures in automerge documents (like dmeta metadata) are external input that can have version drift across devices.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_types/doc.rs` around lines 83 - 90, The Dmeta struct's actors
field is required and will fail deserialization for older docs; update the
actors field (the HashMap<String, UserMeta> member on struct Dmeta in doc.rs) to
derive a default for serde by adding #[serde(default)] so missing actors maps
deserialize as empty, ensuring backward compatibility across mixed-version
devices.
src/daybook_core/repo.rs (1)
210-241: ⚠️ Potential issue | 🟠 Major

Shut big_repo_stop down on every post-boot failure path.

Once BigRepo::boot() succeeds, any later error here (spawn_ws_connector, init_from_globals, run_repo_init_dance, or mark_repo_initialized) returns early while the stop token is still just a local variable. That leaks the repo task tree and any opened resources whenever open_inner() fails after boot. Wrap the rest of the initialization in a fallible block and call big_repo_stop.stop().await before propagating the error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/repo.rs` around lines 210 - 241, After successful
BigRepo::boot (which yields big_repo and big_repo_stop) ensure
big_repo_stop.stop().await is invoked on every error path that occurs after
boot: wrap the subsequent fallible operations (spawn_ws_connector,
init_from_globals, cloning doc_app/doc_drawer, run_repo_init_dance,
mark_repo_initialized) in a try/fallible block and on any early return or
propagated error call big_repo_stop.stop().await before returning the error; do
the same if any awaited spawn_ws_connector or init_from_globals call fails so
the BigRepo task tree and resources are always cleanly shut down.
src/daybook_core/rt/dispatch.rs (1)
298-313: ⚠️ Potential issue | 🟠 Major

Don't panic on synced dispatch updates.

A non-nil vtag can arrive from another peer or an older client. Panicking here tears down the notification worker instead of surfacing a recoverable compatibility path. Please emit an update/upsert event or return an error with context instead. Based on learnings, when validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/rt/dispatch.rs` around lines 298 - 313, The code currently
panics when a non-nil VersionTag is encountered in
src/daybook_core/rt/dispatch.rs (the match that calls VersionTag::hydrate_bytes
produces vtag and then panics with "dispatch update detected"); instead, handle
synced/remote updates by emitting an update/upsert DispatchEvent (e.g., a
DispatchUpdated or DispatchUpsert variant) or return a contextual error via
eyre::bail!() so the worker isn't torn down. Modify the branch after
vtag.version.is_nil() to produce the appropriate DispatchEvent variant carrying
id and heads (and any needed vtag info) or replace the panic! with
eyre::bail!("unexpected dispatch update: {:?}", vtag) so callers can handle it;
ensure you touch the branch that currently constructs DispatchAdded and the else
that contains the panic and VersionTag::hydrate_bytes usage.
src/daybook_core/config.rs (1)
344-364: ⚠️ Potential issue | 🟠 Major

facet_display and users writes still never emit ConfigEvent::Changed.

The patch.path.len() == 2 guard only matches direct Versioned fields. Entry-level writes under facet_display and users are one segment deeper, so set_facet_display_hint() and remote user-meta updates fall through this matcher and listeners never refresh.
💡 Targeted fix
-            } if patch.path.len() == 2 && key == "vtag" => {
+            } if key == "vtag" => {
                 let Some((_obj, automerge::Prop::Map(section_key))) = patch.path.get(1) else {
                     return Ok(());
                 };
+                let is_supported_path = match section_key.as_ref() {
+                    "facet_display" | "users" => patch.path.len() == 3,
+                    "mltools" | "global_props_doc_id" => patch.path.len() == 2,
+                    _ => false,
+                };
+                if !is_supported_path {
+                    return Ok(());
+                }
                 if !matches!(
                     val,
                     automerge::Value::Scalar(scalar)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/config.rs` around lines 344 - 364, The current matcher only
triggers when patch.path.len() == 2, so deeper entry-level writes under the
"facet_display" and "users" maps are skipped; change the guard to allow any
depth (e.g., patch.path.len() >= 2) or otherwise accept path lengths >2 when key
== "vtag" and the second path segment (section_key from patch.path.get(1))
matches "facet_display" or "users", so that the automerge::PatchAction::PutMap
branch pushes ConfigEvent::Changed (affecting set_facet_display_hint and remote
user-meta updates). Ensure you still validate val is a Bytes scalar as before.

♻️ Duplicate comments (3)

src/daybook_core/sync/stress.rs (2)
174-203: ⚠️ Potential issue | 🟠 Major

Rollback partially opened test resources on setup failures.

Both loops start async resources and then bail!/? before the matching shutdown()/stop(). A clone-validation failure skips ctx.shutdown(), and a later open_sync_node failure leaves every earlier node running. That can poison the rest of the test process. Apply the same scoped-cleanup pattern here too.

Also applies to: 212-215
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/stress.rs` around lines 174 - 203, The clone validation
path can early-return without calling ctx.shutdown(), leaving resources running;
update the RepoCtx::open usage so that any partially opened ctx is always
cleaned up on error (e.g., create a short-lived guard or scope that calls
ctx.shutdown().await in Drop or use a try/finally-like pattern: perform
validations after open and if any check fails call ctx.shutdown().await before
returning the error). Apply the same pattern for other resources opened later
(e.g., open_sync_node) so every opened resource (ctx, sync nodes) is
shutdown/stopped on all error paths.
99-101: ⚠️ Potential issue | 🟠 Major

Don't let teardown stop() hide the real failure.

These stop().await? calls run after a stored result, so the first shutdown failure replaces the actual test/init error and aborts cleanup of the remaining nodes. Make teardown best-effort here and return the original result afterward.

Also applies to: 208-209
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/stress.rs` around lines 99 - 101, The teardown loop
currently calls node.stop().await? which can overwrite and return a teardown
error instead of the original test/init `result`; change both occurrences (the
loop over `nodes` where `node.stop().await?` is used, including the duplicate at
lines ~208-209) to run shutdown as best-effort: iterate through
`nodes.into_iter().flatten()`, call `node.stop().await` but do not propagate the
first stop error—capture/log any stop errors (or convert them to warnings) and
continue stopping remaining nodes, then return the original `result` (not the
teardown error); ensure the code references the existing `nodes` iterator,
`node.stop().await`, and the `result` variable so the original outcome is
preserved while still attempting full cleanup.
src/daybook_types/doc.rs (1)
371-381: ⚠️ Potential issue | 🟠 Major

new() and the accessors no longer agree on the path layout.

new("dev", Some("plug"), Some("routine")) yields /dev/plug/routine, but plug() reads segment 3 and routine() reads segment 4, so the accessor pair only works for for_plug_routine()-style paths. That silently misidentifies plug/routine on the default constructor.

Also applies to: 406-419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_types/doc.rs` around lines 371 - 381, UserPath::new builds
"/device/plug/routine" but the accessors plug() and routine() expect the
"for_plug_routine()" layout (they read segments 3 and 4), so update new (and the
similar constructor around the 406-419 region) to construct the same path shape
as for_plug_routine(): start with the root, push the "for" segment (or whatever
literal that for_plug_routine uses), then push device, then optional plug, then
optional routine so plug() and routine() index the same segments; adjust the
same logic in the other constructor to match.

🧹 Nitpick comments (5)

docs/dict.md (1)

11-11: Tighten wording/typos in core definitions to avoid ambiguity.

There are several wording issues in key conceptual lines (e.g., Line 11, Line 15, Line 44, Line 51, Line 76, Line 89) that make the model harder to parse quickly for contributors.

Suggested wording edits

-Daybook ensures then that these changes are synced to other replicas whenever a working connection between devices.
+Daybook ensures these changes are synced to other replicas whenever there is a working connection between devices.

-Documents are a the main units of information in a repo and have a unique ids.
+Documents are the main units of information in a repo and each has a unique id.

-CRDTs are a family of data structures allowing concurrent edits acoss devices that can then be resolved to a final, merged state in a repeatable and unsupervised manner.
+CRDTs are a family of data structures allowing concurrent edits across devices that can be resolved to a final merged state in a repeatable, unsupervised manner.

-A change can be taught of as a single git commit with a hash used to refer to it, the change hash.
+A change can be thought of as a single git commit with a hash used to refer to it (the change hash).

-It's also the gatekeeper for all docs weather it's reads or writes.
+It's also the gatekeeper for all docs, whether for reads or writes.

-As a special convention, `self` can be used instead of the document id to indicates that the reference is to the same doc containing the facet making holding the URL.
+As a special convention, `self` can be used instead of the document id to indicate that the reference is to the same doc containing the facet that holds the URL.

Also applies to: 15-15, 44-45, 51-52, 76-76, 89-89

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/dict.md` at line 11, Rewrite the awkward sentence "Daybook ensures then
that these changes are synced to other replicas whenever a working connection
between devices." to a clear, grammatical form such as "Daybook ensures that
these changes are synced to other replicas whenever a working connection exists
between devices." Also scan and tighten the other flagged lines (original lines
referenced around the same paragraphs) to fix typos, remove extraneous words,
and make definitions concise and unambiguous (e.g., clarify subject/verb order,
replace informal words like "then"/"whenever" with precise phrasing, and ensure
consistent tense and punctuation across the core definitions in the same
section).

src/daybook_core/sync/full/blob_worker.rs (2)

364-368: Consider distinguishing log levels for dropped lifecycle vs progress messages.

Using try_send with backpressure is appropriate for progress reporting. However, when BlobWorkerStarted or BlobWorkerFinished messages are dropped, it may complicate debugging worker lifecycle issues. Consider logging dropped lifecycle messages at warn level while keeping intermediate progress drops at debug.

Alternatively, if all SyncProgressMsg variants are truly non-critical for correctness (with authoritative state managed via the Msg channel), the current approach is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/full/blob_worker.rs` around lines 364 - 368, In
send_progress, distinguish lifecycle messages from intermediate progress: when
sync_progress_tx.try_send(msg) fails, match on the SyncProgressMsg value (check
for BlobWorkerStarted and BlobWorkerFinished variants) and log those dropped
lifecycle messages with tracing::warn!(?err, hash = %self.hash, ...) while
keeping other variants logged at debug as currently done; locate this behavior
around sync_progress_tx.try_send and the send_progress async fn to implement the
conditional logging.
176-176: Consider moving import to file level.

The use futures::StreamExt; import inside the async block works but is unconventional. Moving it to the top of the file (or relying on super::* if it's re-exported there) improves consistency and discoverability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/full/blob_worker.rs` at line 176, The local import use
futures::StreamExt; found inside the async block in blob_worker.rs should be
moved to the module/file scope: remove the inline import and add use
futures::StreamExt; at the top of the file (or import it via super::* if already
re-exported) so that the StreamExt trait is visible consistently across
functions such as the async block where it was used; ensure there are no
duplicate imports and that the trait remains in scope for methods like .next()
or other StreamExt extensions.

src/am_utils_rs/sync/peer.rs (1)

322-322: Remove the placeholder XXX log before merge.

This logs the full partition list at info level with a throwaway message, which will add noisy startup logs in a hot path.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/sync/peer.rs` at line 322, The placeholder info!(?partitions,
"XXX"); is logging the entire partitions list at info level; remove this noisy
line or change it to a lower log level and a meaningful message (e.g.,
debug!("partitions: {} entries", partitions.len()) or debug!(partitions =
?partitions, "partitions snapshot") if full content is needed for debugging).
Update the call to either delete info!(?partitions, "XXX"); or replace it with a
debug-level log that avoids verbose startup noise while still conveying useful
context, referencing the info!(?partitions, "XXX") call and the partitions
variable.

src/daybook_core/sync/stress.rs (1)

466-469: Give CreateBranch a collision-free branch path.

The /stress/{0..32} namespace is small relative to the run size, so this event will often revisit an existing branch instead of exercising branch creation. Using idx and/or a random suffix keeps this path higher-signal.

♻️ Suggested tweak

-            let new_branch = daybook_types::doc::BranchPath::from(format!(
-                "/stress/{}",
-                rng.random_range(0..32)
-            ));
+            let new_branch = daybook_types::doc::BranchPath::from(format!(
+                "/stress/{idx}-{}",
+                rng.random::<u64>()
+            ));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/sync/stress.rs` around lines 466 - 469, The branch path for
CreateBranch uses a small namespace "/stress/{0..32}" which causes collisions;
update the construction of new_branch (created via
daybook_types::doc::BranchPath::from) to include the loop index (idx) and/or a
random suffix from rng (e.g., combine idx and rng.random_range or a hex/random
token) so each generated path is effectively unique (e.g.,
"/stress/{idx}-{rand}"); ensure the same variable new_branch is used where
CreateBranch is emitted so branch creation events target collision-free paths.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/dict.md`:
- Line 86: Replace the indented code example with a fenced code block: wrap the
URL db+facet://self/org.example.daybook.title/main?at=hash1|hash2 inside triple
backticks (```) so it becomes a proper Markdown fenced code block, ensuring
there's a blank line before and after the fenced block to satisfy markdownlint
MD046.
- Around line 28-38: Example object literals in the docs are missing trailing
commas which makes them invalid JavaScript; update the snippets (notably the
object under "org.example.daybook.dmeta/main" and the other similar examples) by
adding the missing commas after property entries (e.g., add a comma after the id
property value and after any other single-line properties shown) so every
property in those example object literals is comma-separated and the snippets
are valid JS.

In `@src/am_utils_rs/sync/peer.rs`:
- Around line 199-203: The select branch handling worker.samod_ack_rx.recv()
currently calls eyre::bail! when the channel is closed, which converts samod
backpressure into a bootstrap failure; instead, change the behavior so that when
recv yields None you do not abort bootstrap but pause the affected partition and
return control so other partitions can continue (e.g., set a paused flag or
requeue the RequestSamodSync state tied to doc_cursor and skip advancing it), or
await/loop until ack slots free before proceeding; apply the same change pattern
in RequestSamodSync-related logic (references: worker.samod_ack_rx.recv(),
worker.handle_samod_ack(ack), RequestSamodSync, doc_cursor) and the other
similar sites mentioned (around the 450-481 and 648-668 blocks) so pending samod
acks cause partition pause/retry instead of eyre::bail!.
- Around line 584-607: The code currently inserts an "empty slot" into
unresolved_by_partition before calling reduce_doc_event which lets a later
successful cursor (N+1) advance and drop a failed cursor (N); change the
sentinel semantics so we record a blocking/unresolved state instead of a no-op
slot: replace the current entry insertion prior to reduce_doc_event with a mark
that this partition+cursor is blocked (e.g., an explicit enum state or a
distinct value) and only clear/pop that blocked state when
apply_doc_event_decision successfully handles that cursor; update logic in
note_doc_cursor_seen, reduce_doc_event handling, apply_doc_event_decision, and
any code that reads unresolved_by_partition (the branches around
PartitionDocEventDeets::DocChanged and the later pop logic) to treat the blocked
state as non-ready; add a small regression test that simulates a failure at
cursor N (reduce_doc_event returns Err) then a success at N+1 and asserts that N
remains unresolved (blocked) until N is retried and succeeds.
- Around line 729-745: The code currently takes the first element from
response.docs (response.docs.into_iter().next()) without verifying it matches
the requested doc_id; update this to search response.docs for an entry whose
doc_id equals the requested doc_id (the doc_id variable passed into
GetDocsFullRequest) and use that matched doc, otherwise eyre::bail! that the
remote did not return the requested doc; keep the subsequent parsing of
doc.doc_id (parse::<samod::DocumentId>()) and its ferr! mapping unchanged so the
error path remains the same.

In `@src/daybook_core/blobs.rs`:
- Around line 77-80: The copy path is racy: put_path_copy() checks try_exists()
then calls atomic_copy_file(), but atomic_copy_file() performs a plain rename
which can fail if another concurrent writer created the target; modify
atomic_copy_file() to treat a rename failure with ErrorKind::AlreadyExists (or
an OS-specific "file exists") as success (i.e., return Ok(())) the same way
atomic_write() does, so duplicate ingests become idempotent; apply the same
change to the other copy site that also calls atomic_copy_file() (the second
occurrence around the other blob-ingest function) to ensure both paths handle
AlreadyExists gracefully.
- Around line 127-145: The read-modify-write of metadata (using build_meta,
read_meta, write_meta and ingest_path_with_iroh) races when two
put_path_reference calls run concurrently and can drop source_paths; serialize
the merge per hash by acquiring a per-hash lock (e.g., an async Mutex keyed by
the blob hash or a file lock on object_paths.meta) before calling read_meta,
merging source_paths, updating meta, writing it back, performing
ingest_path_with_iroh, setting meta.iroh_ingested and writing final meta, and
then release the lock so concurrent calls cannot interleave the read/merge/write
sequence.

In `@src/daybook_core/plugs.rs`:
- Around line 810-811: The listener registered via
PlugsStore::register_change_listener(&big_repo, &app_doc_id, vec![]) is
currently unfiltered so local changes are emitted twice (once via add() calling
self.registry.notify(...) and again when BigRepo emits DocChanged), so change
the registration to filter out BigRepoChangeOrigin::Local (or add that origin to
the vec! filter) for the big_repo listeners referenced here and at the similar
block around lines 985-994; alternatively remove the direct
self.registry.notify(...) call in add() so that notifs_loop alone emits
events—use the symbols PlugsStore::register_change_listener,
BigRepoChangeOrigin::Local, add(), self.registry.notify(...), notifs_loop,
DocChanged and PlugsEvent to locate and implement the fix.

In `@src/daybook_core/rt/dispatch.rs`:
- Around line 269-278: The current filter only skips the excluded remote peer
but still lets local-origin patches through, causing double notifications
(because add()/remove() already notify synchronously); update the origin check
so that any local-origin change (match
am_utils_rs::repo::BigRepoChangeOrigin::Local or origin == None if applicable)
returns early as well as the existing remote exclude check — modify the pattern
matching around `origin` (the block that currently matches
`BigRepoChangeOrigin::Remote { peer_id, .. }`) to also match and short-circuit
for `BigRepoChangeOrigin::Local` (and/or None) and keep the existing
`exclude_peer` remote comparison; apply the same change for the code paths that
emit DispatchAdded/DispatchDeleted so local mutations aren’t re-emitted from
`notifs_loop`.

In `@src/daybook_core/sync/full.rs`:
- Around line 886-911: The code is incorrectly overwriting
PeerSyncState.partitions with only big-repo partitions from
desired_big_repo_partitions_for_peer(), which must remain the full membership
set; instead, stop mutating peer_state.partitions here and update the
per-session partition set stored in peer_partition_sessions (or the Session
struct) so session-level desired_parts are tracked separately. Concretely:
replace assignments like peer_state.partitions = desired_parts and
peer_state.partitions.clear() with updates to session.partitions (or
insert/modify the entry in self.peer_partition_sessions to reflect
desired_parts), insert/remove session IDs in self.peer_sessions_to_refresh as
before, and apply the same change to the analogous block referenced at 920-934
so PeerSyncState.partitions remains the authoritative full membership only
updated by SetPeer/DelPeer message handling.
- Around line 300-309: The current startup only seeds worker.known_blob_set with
PartitionKey::DocBlobsFullSync which allows delete/update handlers (which
consult worker.doc_to_known_hashes) to mistakenly drop the partition bit for
shared hashes or for pre-existing docs because doc_to_known_hashes is empty; fix
by making the blob-tracking per-doc: populate worker.doc_to_known_hashes at
startup (for each doc that references a hash from
worker.doc_blobs_index_repo.list_all_hashes()) or replace known_blob_set entries
with a per-doc refcount map (e.g., hash -> map<DocId, count>) so that the
delete/update paths consult per-doc counts before removing
PartitionKey::DocBlobsFullSync, or as a simpler alternative check the live index
(list_all_hashes) before clearing the partition bit in the delete/update
handlers (ensure PartitionKey::DocBlobsFullSync is only cleared if the hash is
truly absent from the global index); update the delete/update handlers and any
uses of known_blob_set and doc_to_known_hashes accordingly.

In `@src/daybook_core/tables.rs`:
- Around line 396-398: The listener is receiving local mutations as well as the
eager local notifications, causing duplicate TablesEvent emissions; in the call
to TablesStore::register_change_listener(&big_repo, &app_doc_id, vec![]) (and
similarly at the other location) restrict the origin filter so local-origin
changes are ignored—i.e., pass a filter that excludes BigRepoChangeOrigin::Local
or, inside the notifs_loop that handles ticket/notif_rx from big_repo,
explicitly drop notifications whose origin is BigRepoChangeOrigin::Local before
calling self.registry.notify; alternatively, stop emitting the eager local
notifications from the mutator paths (the direct self.registry.notify calls) so
events are only emitted once. Ensure you update both the
register_change_listener usage and the notifs_loop handler (and the mutator
notify calls) to avoid double notifications.

In `@src/daybook_types/doc.rs`:
- Around line 390-404: The constructors for for_repo and for_plug_routine
currently build paths via format! which allows multi-segment values and
double-slashes; instead validate each input segment (repo_scope, user_id,
device_id, plug_id, routine_id) is non-empty and contains no '/' or path
traversal (e.g. ".."), then build the UserPath by starting from base_user_path
(or UserPath::root for the plug routine) and calling push() for each validated
segment, and finally call parse() or return the constructed UserPath; update
error messages to indicate which segment is invalid and ensure base_user_path ==
"/" is handled by using push rather than string concatenation.

---

Outside diff comments:
In `@src/daybook_core/config.rs`:
- Around line 344-364: The current matcher only triggers when patch.path.len()
== 2, so deeper entry-level writes under the "facet_display" and "users" maps
are skipped; change the guard to allow any depth (e.g., patch.path.len() >= 2)
or otherwise accept path lengths >2 when key == "vtag" and the second path
segment (section_key from patch.path.get(1)) matches "facet_display" or "users",
so that the automerge::PatchAction::PutMap branch pushes ConfigEvent::Changed
(affecting set_facet_display_hint and remote user-meta updates). Ensure you
still validate val is a Bytes scalar as before.

In `@src/daybook_core/repo.rs`:
- Around line 210-241: After successful BigRepo::boot (which yields big_repo and
big_repo_stop) ensure big_repo_stop.stop().await is invoked on every error path
that occurs after boot: wrap the subsequent fallible operations
(spawn_ws_connector, init_from_globals, cloning doc_app/doc_drawer,
run_repo_init_dance, mark_repo_initialized) in a try/fallible block and on any
early return or propagated error call big_repo_stop.stop().await before
returning the error; do the same if any awaited spawn_ws_connector or
init_from_globals call fails so the BigRepo task tree and resources are always
cleanly shut down.

In `@src/daybook_core/rt/dispatch.rs`:
- Around line 298-313: The code currently panics when a non-nil VersionTag is
encountered in src/daybook_core/rt/dispatch.rs (the match that calls
VersionTag::hydrate_bytes produces vtag and then panics with "dispatch update
detected"); instead, handle synced/remote updates by emitting an update/upsert
DispatchEvent (e.g., a DispatchUpdated or DispatchUpsert variant) or return a
contextual error via eyre::bail!() so the worker isn't torn down. Modify the
branch after vtag.version.is_nil() to produce the appropriate DispatchEvent
variant carrying id and heads (and any needed vtag info) or replace the panic!
with eyre::bail!("unexpected dispatch update: {:?}", vtag) so callers can handle
it; ensure you touch the branch that currently constructs DispatchAdded and the
else that contains the panic and VersionTag::hydrate_bytes usage.

In `@src/daybook_types/doc.rs`:
- Around line 83-90: The Dmeta struct's actors field is required and will fail
deserialization for older docs; update the actors field (the HashMap<String,
UserMeta> member on struct Dmeta in doc.rs) to derive a default for serde by
adding #[serde(default)] so missing actors maps deserialize as empty, ensuring
backward compatibility across mixed-version devices.

---

Duplicate comments:
In `@src/daybook_core/sync/stress.rs`:
- Around line 174-203: The clone validation path can early-return without
calling ctx.shutdown(), leaving resources running; update the RepoCtx::open
usage so that any partially opened ctx is always cleaned up on error (e.g.,
create a short-lived guard or scope that calls ctx.shutdown().await in Drop or
use a try/finally-like pattern: perform validations after open and if any check
fails call ctx.shutdown().await before returning the error). Apply the same
pattern for other resources opened later (e.g., open_sync_node) so every opened
resource (ctx, sync nodes) is shutdown/stopped on all error paths.
- Around line 99-101: The teardown loop currently calls node.stop().await? which
can overwrite and return a teardown error instead of the original test/init
`result`; change both occurrences (the loop over `nodes` where
`node.stop().await?` is used, including the duplicate at lines ~208-209) to run
shutdown as best-effort: iterate through `nodes.into_iter().flatten()`, call
`node.stop().await` but do not propagate the first stop error—capture/log any
stop errors (or convert them to warnings) and continue stopping remaining nodes,
then return the original `result` (not the teardown error); ensure the code
references the existing `nodes` iterator, `node.stop().await`, and the `result`
variable so the original outcome is preserved while still attempting full
cleanup.

In `@src/daybook_types/doc.rs`:
- Around line 371-381: UserPath::new builds "/device/plug/routine" but the
accessors plug() and routine() expect the "for_plug_routine()" layout (they read
segments 3 and 4), so update new (and the similar constructor around the 406-419
region) to construct the same path shape as for_plug_routine(): start with the
root, push the "for" segment (or whatever literal that for_plug_routine uses),
then push device, then optional plug, then optional routine so plug() and
routine() index the same segments; adjust the same logic in the other
constructor to match.

---

Nitpick comments:
In `@docs/dict.md`:
- Line 11: Rewrite the awkward sentence "Daybook ensures then that these changes
are synced to other replicas whenever a working connection between devices." to
a clear, grammatical form such as "Daybook ensures that these changes are synced
to other replicas whenever a working connection exists between devices." Also
scan and tighten the other flagged lines (original lines referenced around the
same paragraphs) to fix typos, remove extraneous words, and make definitions
concise and unambiguous (e.g., clarify subject/verb order, replace informal
words like "then"/"whenever" with precise phrasing, and ensure consistent tense
and punctuation across the core definitions in the same section).

In `@src/am_utils_rs/sync/peer.rs`:
- Line 322: The placeholder info!(?partitions, "XXX"); is logging the entire
partitions list at info level; remove this noisy line or change it to a lower
log level and a meaningful message (e.g., debug!("partitions: {} entries",
partitions.len()) or debug!(partitions = ?partitions, "partitions snapshot") if
full content is needed for debugging). Update the call to either delete
info!(?partitions, "XXX"); or replace it with a debug-level log that avoids
verbose startup noise while still conveying useful context, referencing the
info!(?partitions, "XXX") call and the partitions variable.

In `@src/daybook_core/sync/full/blob_worker.rs`:
- Around line 364-368: In send_progress, distinguish lifecycle messages from
intermediate progress: when sync_progress_tx.try_send(msg) fails, match on the
SyncProgressMsg value (check for BlobWorkerStarted and BlobWorkerFinished
variants) and log those dropped lifecycle messages with tracing::warn!(?err,
hash = %self.hash, ...) while keeping other variants logged at debug as
currently done; locate this behavior around sync_progress_tx.try_send and the
send_progress async fn to implement the conditional logging.
- Line 176: The local import use futures::StreamExt; found inside the async
block in blob_worker.rs should be moved to the module/file scope: remove the
inline import and add use futures::StreamExt; at the top of the file (or import
it via super::* if already re-exported) so that the StreamExt trait is visible
consistently across functions such as the async block where it was used; ensure
there are no duplicate imports and that the trait remains in scope for methods
like .next() or other StreamExt extensions.

In `@src/daybook_core/sync/stress.rs`:
- Around line 466-469: The branch path for CreateBranch uses a small namespace
"/stress/{0..32}" which causes collisions; update the construction of new_branch
(created via daybook_types::doc::BranchPath::from) to include the loop index
(idx) and/or a random suffix from rng (e.g., combine idx and rng.random_range or
a hex/random token) so each generated path is effectively unique (e.g.,
"/stress/{idx}-{rand}"); ensure the same variable new_branch is used where
CreateBranch is emitted so branch creation events target collision-free paths.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0c4365a0-65b0-4be0-aafa-74d9e482947c

📥 Commits

Reviewing files that changed from the base of the PR and between 0274b9c and 3adc023.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock and included by **/*

📒 Files selected for processing (23)

Cargo.toml
docs/dict.md
src/am_utils_rs/Cargo.toml
src/am_utils_rs/repo/changes.rs
src/am_utils_rs/sync/peer.rs
src/daybook_core/Cargo.toml
src/daybook_core/blobs.rs
src/daybook_core/config.rs
src/daybook_core/drawer.rs
src/daybook_core/index/doc_blobs.rs
src/daybook_core/plugs.rs
src/daybook_core/repo.rs
src/daybook_core/rt/dispatch.rs
src/daybook_core/stores.rs
src/daybook_core/sync/bootstrap.rs
src/daybook_core/sync/full.rs
src/daybook_core/sync/full/blob_worker.rs
src/daybook_core/sync/stress.rs
src/daybook_core/tables.rs
src/daybook_ffi/Cargo.toml
src/daybook_server/Cargo.toml
src/daybook_types/doc.rs
src/utils_rs/lib.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/daybook_ffi/Cargo.toml

coderabbitai · 2026-03-17T05:44:55Z

+            self.note_doc_cursor_seen(&event.partition_id, event.cursor);
+            let decision = match self.reduce_doc_event(event).await {
+                Ok(decision) => decision,
+                Err(err) => {
+                    if let PartitionDocEventDeets::DocChanged { doc_id, .. } = &event.deets {
+                        let entry = out
+                            .unresolved_by_partition
+                            .entry(event.partition_id.clone())
+                            .or_insert(0);
+                        *entry = entry.saturating_add(1);
+                        warn!(
+                            ?err,
+                            partition = %event.partition_id,
+                            doc_id,
+                            cursor = event.cursor,
+                            "failed reducing remote doc event"
+                        );
+                        continue;
+                    }
+                    return Err(err);
+                }
+            };
+            self.apply_doc_event_decision(event, decision, &mut out)
+                .await?;


⚠️ Potential issue | 🔴 Critical

Unresolved doc events can be skipped permanently.

Line 584 inserts an empty slot before reduction/import succeeds. If a doc change then hits the unresolved paths at Lines 587-601 or Lines 685-699, that cursor still looks "ready", and Lines 927-937 will pop it the next time a later event succeeds. A failure at cursor N followed by a success at cursor N+1 can therefore advance the stored cursor to N+1 and drop N forever. Unresolved cursors need a blocking state, not an empty slot. Please add a small regression test for this ordering.

As per coding guidelines: "When working with Rust, use small tests in addition to cargo clippy to validate ideas."

Also applies to: 685-699, 921-937

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/peer.rs` around lines 584 - 607, The code currently inserts an "empty slot" into unresolved_by_partition before calling reduce_doc_event which lets a later successful cursor (N+1) advance and drop a failed cursor (N); change the sentinel semantics so we record a blocking/unresolved state instead of a no-op slot: replace the current entry insertion prior to reduce_doc_event with a mark that this partition+cursor is blocked (e.g., an explicit enum state or a distinct value) and only clear/pop that blocked state when apply_doc_event_decision successfully handles that cursor; update logic in note_doc_cursor_seen, reduce_doc_event handling, apply_doc_event_decision, and any code that reads unresolved_by_partition (the branches around PartitionDocEventDeets::DocChanged and the later pop logic) to treat the blocked state as non-ready; add a small regression test that simulates a failure at cursor N (reduce_doc_event returns Err) then a success at N+1 and asserts that N remains unresolved (blocked) until N is retried and succeeds.

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/daybook_core/rt/dispatch.rs (1)

312-323: ⚠️ Potential issue | 🟡 Minor

Consider using eyre::bail!() instead of panic!() for dispatch update detection.

If a dispatch update patch arrives from an external peer (e.g., due to version drift or a future protocol change), this panic! will crash the application. Per the retrieved learnings, external input validation should use eyre::bail!() to handle version drift safely.

🛡️ Proposed fix

                 out.push(if vtag.version.is_nil() {
                     DispatchEvent::DispatchAdded {
                         id: dispatch_id.clone(),
                         heads: dispatch_heads,
                     }
                 } else {
-                    panic!("dispatch update detected")
-                    // DispatchEvent::DispatchUpdated {
-                    //     id: dispatch_id.clone(),
-                    //     heads: dispatch_heads,
-                    // }
+                    // Dispatch updates are not currently supported; log and skip
+                    warn!(dispatch_id = %dispatch_id, "ignoring unexpected dispatch update from external source");
+                    return Ok(());
                 });

Based on learnings: "When validating data that comes from external inputs (such as automerge documents), use eyre::bail!() to return errors rather than panicking."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/rt/dispatch.rs` around lines 312 - 323, Replace the
panic!("dispatch update detected") with an eyre::bail!() so external/peer-driven
dispatch update patches return an error instead of aborting; import eyre::bail!,
return a Result from the surrounding function (or propagate the error) where
out.push(...) is created, and use a clear error message like "dispatch update
detected" (you can later decide whether to handle DispatchEvent::DispatchUpdated
or treat it as an error); keep the existing DispatchEvent::DispatchAdded branch
and reference vtag.version.is_nil(), dispatch_id, and dispatch_heads when making
the decision.

♻️ Duplicate comments (3)

docs/dict.md (2)

28-38: ⚠️ Potential issue | 🟡 Minor

Fix invalid JavaScript object examples (missing commas).

At line 33, the property is missing a trailing comma, making this snippet invalid JS that will fail if copied.

📝 Proposed fix

   "org.example.daybook.dmeta/main": {
-    id: "<the id that the drawer knows it by>",
+    id: "<the id that the drawer knows it by>",
     createdAt: "timestamp",
     updatedAt: "timestamp",
     // more metadata

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/dict.md` around lines 28 - 38, The example JSON-like object in
docs/dict.md has missing trailing comma(s) making it invalid JS; update the
snippet so each property entry is comma-separated—specifically ensure a comma
after the "org.example.daybook.path/main" entry and after the closing brace for
the "org.example.daybook.dmeta/main" object (and any other entries that follow)
so the object with keys "org.example.daybook.title/main",
"org.example.daybook.path/main", and "org.example.daybook.dmeta/main" is valid
JavaScript.

109-125: ⚠️ Potential issue | 🟡 Minor

Fix invalid JavaScript object examples (missing commas).

At lines 114 and 121, properties are missing trailing commas, making these snippets invalid JS object literals that will fail if copied.

📝 Proposed fix

   "org.example.daybook.blob/main": {
     mime: "image/png",
     lengthOctets: 1024,
-    digest: "<hash>",
+    digest: "<hash>",
     inlineBase64: "small blobs are stored inline as base64 strings but not all blobs",
     urls: [
       "db+blob://<digest>"
     ],
   },
   "org.example.daybook.imagemetadata/main": {
-    facetRef: "db+facet://self/org.example.daybook.blob/main",
+    facetRef: "db+facet://self/org.example.daybook.blob/main",
     refHeads: [],
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/dict.md` around lines 109 - 125, The example objects for
"org.example.daybook.blob/main" and "org.example.daybook.imagemetadata/main"
have missing trailing commas that make the JavaScript invalid; update the
literal for "org.example.daybook.blob/main" to include a comma after the mime
property value and ensure the urls array line ends with a comma, and update the
"org.example.daybook.imagemetadata/main" object to include a trailing comma
after the refHeads property so all object properties (mime, lengthOctets,
digest, inlineBase64, urls, facetRef, refHeads) are properly comma-separated.

src/am_utils_rs/sync/peer.rs (1)

460-491: ⚠️ Potential issue | 🟠 Major

Pause ack-gated bootstrap pages instead of aborting them.

When apply_doc_events() queues RequestDocSync, doc_cursor is supposed to stay put until a Samod ack arrives. If that page also reports has_more = true, Line 486 turns that expected wait into a hard failure before the select loop can ever drain samod_ack_rx. A partition with more than DEFAULT_EVENT_PAGE_LIMIT already-local docs still dies here.

🛠️ Minimal fix

                 self.emit_doc_sync_status(synced_docs, unresolved);
-                if any_more && next_cursor == cursor.doc_cursor {
+                let waiting_on_samod_acks = self
+                    .doc_cursor_state
+                    .get(part)
+                    .is_some_and(|state| {
+                        state.slots.values().any(|slot| {
+                            matches!(
+                                slot,
+                                CursorSlotState::Pending(pending_doc_ids)
+                                    if !pending_doc_ids.is_empty()
+                            )
+                        })
+                    });
+                if any_more && next_cursor == cursor.doc_cursor && waiting_on_samod_acks {
+                    debug!(
+                        partition = %part,
+                        cursor = ?next_cursor,
+                        "pausing bootstrap pagination until samod acks advance the cursor"
+                    );
+                    break;
+                }
+                if any_more && next_cursor == cursor.doc_cursor {
                     eyre::bail!(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/sync/peer.rs` around lines 460 - 491, The code currently
treats a page with has_more=true and an unchanged doc_cursor as an error, but
apply_doc_events() may legitimately queue a RequestDocSync and keep doc_cursor
steady until a Samod ack arrives; instead of eyre::bail! in the block checking
any_more && next_cursor == cursor.doc_cursor, change behavior to pause advancing
this partition's bootstrap (do not error) so the select loop can receive from
samod_ack_rx: detect this stalled-but-ack-gated condition in the place around
apply_doc_events()/next_cursor/doc_cursor, set a local flag or break out of the
partition loop (keeping unresolved_total/synced_docs state) to defer further
cursor advancement, and ensure emit_doc_sync_status is still called; reference
apply_doc_events, RequestDocSync, doc_cursor, samod_ack_rx and
emit_doc_sync_status when making the change.

🧹 Nitpick comments (5)

AGENTS.md (1)

112-119: Rephrase test-policy framing to avoid lowering quality expectations.

Line 112 can be interpreted as “tests are less important.” Consider reframing this section to keep the same intent (gardening/improving tests) without implying reduced rigor.

🧹 Suggested rewrite

-Human auditors of this repo don't really read test code or mantain it.
-So when working on tests, take a gardening hand to it and assume they've been less strictly audited in the past.
+Test code is often less consistently audited than production code.
+When touching tests, proactively improve them: prune shallow tests, reduce duplication, and strengthen assertions.
 See a useless or too shallow a test?
 Flag it for removal.
 Are tests repeating too much setup code?
 Consider using TDD or common cleanup code.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 112 - 119, The sentence "Human auditors of this repo
don't really read test code or mantain it." in AGENTS.md should be rephrased to
avoid implying tests are unimportant; replace it with a constructive guideline
that encourages contributors to proactively improve and maintain tests (e.g.,
emphasize "when working on tests, assume they need cleanup and strengthen them"
and suggest concrete actions like removing shallow tests, consolidating setup,
and using snapshot/TDD macros), and update surrounding lines mentioning
"gardening" and tools (snapshot tests, TDD, macros) to frame these as practices
to preserve test quality rather than tolerate lower standards.

docs/dict.md (1)

94-94: Consider rewording for clarity.

"the facet making holding the URL" is awkward. Consider "the facet that holds the URL" or "the facet containing the URL".

📝 Proposed fix

-As a special convention, `self` can be used instead of the document id to indicates that the reference is to the same doc containing the facet making holding the URL.
+As a special convention, `self` can be used instead of the document id to indicate that the reference is to the same doc containing the facet that holds the URL.

Note: Also changed "indicates" to "indicate" to match the infinitive form.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/dict.md` at line 94, Reword the sentence describing the special
convention for `self`: replace the awkward phrase "the facet making holding the
URL" with a clearer phrase such as "the facet that holds the URL" or "the facet
containing the URL", and change "indicates" to "indicate" so the clause reads
properly (e.g., "As a special convention, `self` can be used instead of the
document id to indicate that the reference is to the same doc containing the
facet that holds the URL.").

src/daybook_core/index/doc_blobs.rs (2)

178-182: Consider logging when facets are unexpectedly missing.

unwrap_or_default() silently treats a None result as an empty map. If selected_blob_keys was non-empty but get_at_heads_with_facets_arc returns None, this might indicate an unexpected state worth logging for debugging purposes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/index/doc_blobs.rs` around lines 178 - 182, The code
currently calls self.drawer_repo.get_at_heads_with_facets_arc(...).await? and
immediately unwrap_or_default into facets, which hides when the repo returned
None; change this to explicitly match the Option result from
get_at_heads_with_facets_arc (or keep the interim variable) and if it is None
while selected_blob_keys is non-empty, emit a warning/error log containing
doc_id, heads and selected_blob_keys (and any repo error context) before falling
back to an empty map; keep the existing behavior of using an empty map when None
but ensure the unexpected None is logged for debugging (referencing
get_at_heads_with_facets_arc, facets, selected_blob_keys, doc_id and heads).

131-133: Index on doc_id may be redundant.

Since doc_id is the first column of the composite primary key (doc_id, blob_hash), SQLite can use the primary key index for queries filtering by doc_id alone. The separate index on line 131 may be unnecessary overhead.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/daybook_core/index/doc_blobs.rs` around lines 131 - 133, The CREATE INDEX
call for idx_doc_blob_refs_doc_id in doc_blobs.rs is redundant because doc_id is
the leading column of the composite primary key (doc_id, blob_hash); remove the
sqlx::query("CREATE INDEX IF NOT EXISTS idx_doc_blob_refs_doc_id ON
doc_blob_refs(doc_id)")...execute(db_pool).await?; statement to avoid
unnecessary overhead, or if you decide to keep it for a documented performance
reason, add a clear comment above it referencing doc_blob_refs, doc_id, and
blob_hash explaining why the extra index is required and add a test/benchmark
proving the benefit.

src/am_utils_rs/sync/peer.rs (1)

332-332: Remove the XXX info log before merging.

This logs the full partition list on every worker start at info and looks like leftover debugging.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/am_utils_rs/sync/peer.rs` at line 332, Remove the leftover info-level
debug log that prints the full partition list: delete or change the call to
info!(?partitions, "XXX"); in peer.rs (the logging invocation referencing the
partitions variable and the info! macro). If you want to keep the diagnostic,
convert it to debug!(?partitions, "partitions") or otherwise redact sensitive
details; otherwise simply remove the line before merging.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 103-107: Fix the spelling mistakes in the governance text of
AGENTS.md: replace "sensebilites" with "sensibilities", "tires" with "tries",
and "mantain" with "maintain"; search the newly added governance paragraph
containing phrases like "Avoid cheating through hacks that violate common
sense..." and the nearby lines referenced as 112-113 to ensure all occurrences
are corrected and update surrounding wording if necessary for grammatical
consistency.

In `@docs/dict.md`:
- Line 44: In the sentence starting "CRDTs are a family of data structures
allowing concurrent edits acoss devices..." update the misspelled word "acoss"
to "across" so the line reads "...allowing concurrent edits across devices...";
locate that exact sentence in docs/dict.md and correct the typo.
- Line 21: Replace the awkward phrase in the sentence beginning "The facet tag
indicates expected schema of the value of under that key." by removing "of" and
adding "the" where needed so it reads: "The facet tag indicates the expected
schema of the value under that key." Update that exact sentence in docs/dict.md
to use this corrected phrasing.
- Line 11: The sentence in docs/dict.md is incomplete; update the sentence
"Daybook ensures then that these changes are synced to other replicas whenever a
working connection between devices." to include the missing verb—e.g., change it
to "whenever a working connection between devices exists" or "whenever a working
connection is established between devices" so the sentence reads properly;
locate the sentence text "Daybook ensures then that these changes are synced to
other replicas whenever a working connection between devices." and replace it
accordingly.
- Line 76: Fix the typo "weather" to "whether" in the sentence "It's also the
gatekeeper for all docs weather it's reads or writes." within docs/dict.md;
search for that sentence (or the word "weather") and replace it with "whether"
and also consider correcting surrounding grammar ("it's reads" → "its reads" or
rephrase) to ensure the sentence reads correctly.
- Line 51: Fix the typo in the sentence in docs/dict.md where "taught" is used
incorrectly; change the word "taught" to "thought" in the sentence "A change can
be thought of as a single git commit with a hash used to refer to it, the change
hash." to correct the meaning.
- Line 15: Fix the typo in the sentence "Documents are a the main units of
information in a repo and have a unique ids." by removing the extra "a" so it
reads "Documents are the main units of information in a repo" and also remove
the extraneous "a" before "unique ids" (resulting clause: "and have unique
ids"); update that exact line accordingly.

In `@src/am_utils_rs/sync.rs`:
- Around line 767-790: The test currently waits only for
dst.partition_member_count(&p_ok) to reach 1, which can happen before docs are
synced; change the wait condition to also poll
dst_store.get_partition_cursor(peer_key.clone(), p_ok.clone()) and require that
the returned ok_cursor.doc_cursor.is_some() before proceeding. Update the
wait_until closure (the one that currently calls partition_member_count) to
fetch the partition cursor for p_ok and return Ok(count == 1 &&
ok_cursor.doc_cursor.is_some()), so the test only continues once the p_ok
document cursor has advanced. Ensure you reference the existing wait_until,
dst.partition_member_count, dst_store.get_partition_cursor, and p_ok symbols
when making this change.

In `@src/am_utils_rs/sync/peer.rs`:
- Around line 69-72: PeerSyncWorkerHandle exposes optional progress_rx and
events_rx so sending should not panic when receivers are dropped; replace all
`.expect(ERROR_CHANNEL)` uses on broadcast::Sender::send with non-panicking
sends (e.g., `let _ = events_tx.send(...)` / `let _ = progress_tx.send(...)`) so
the worker continues if there are zero listeners—apply this change to the three
events_tx send sites for Bootstrapped, LiveReady, and AbnormalExit and the three
progress_tx send sites inside emit_phase_started, emit_phase_finished, and
emit_doc_sync_status.

In `@src/daybook_core/index/doc_blobs.rs`:
- Around line 444-451: The code currently calls index_repo.enqueue_delete(...)
when self.drawer_repo.get_facet_keys_if_latest(...) returns None (a stale
event); instead, mirror the DocAdded behavior and skip stale events by returning
early without deleting. Change the else branch for get_facet_keys_if_latest(id,
&branch_path, heads, drawer_heads) to simply return Ok(outcome) (no
enqueue_delete), leaving index data intact so a newer event can reconcile state.

In `@src/daybook_core/sync/full.rs`:
- Around line 1582-1592: remove_blob_tracking_if_unused currently deletes
bookkeeping for unreferenced hashes but worker callbacks like
BlobSyncMarkedSynced and BlobSyncRequestBackoff can later reinsert state for
those hashes; fix by adding a guard in those worker handlers (or any code that
mutates synced_blobs, pending_blobs, blobs_to_boot) to first check that
self.known_blob_set.contains(hash) (or a generation/version token if you prefer)
and return/ignore the callback if the hash is not present, ensuring no
reinsertion of removed hashes by functions handling BlobSyncMarkedSynced /
BlobSyncRequestBackoff and other blob worker callbacks.
- Around line 1188-1193: The current parse::<DocumentId>() call in the full-sync
handling path should not propagate a parse error up and kill the worker;
instead, detect a malformed peer doc_id, log a warning/debug with the offending
peer_key and raw doc_id, and drop the message by returning Ok(()) so the worker
continues; update the block around parse::<DocumentId>() and the surrounding
logic that uses self.endpoint_for_peer_key(&peer_key) to early-return Ok(()) on
parse failures (and similarly at the other sites you noted), replacing map_err
propagation with a non-fatal drop-and-log behavior.

---

Outside diff comments:
In `@src/daybook_core/rt/dispatch.rs`:
- Around line 312-323: Replace the panic!("dispatch update detected") with an
eyre::bail!() so external/peer-driven dispatch update patches return an error
instead of aborting; import eyre::bail!, return a Result from the surrounding
function (or propagate the error) where out.push(...) is created, and use a
clear error message like "dispatch update detected" (you can later decide
whether to handle DispatchEvent::DispatchUpdated or treat it as an error); keep
the existing DispatchEvent::DispatchAdded branch and reference
vtag.version.is_nil(), dispatch_id, and dispatch_heads when making the decision.

---

Duplicate comments:
In `@docs/dict.md`:
- Around line 28-38: The example JSON-like object in docs/dict.md has missing
trailing comma(s) making it invalid JS; update the snippet so each property
entry is comma-separated—specifically ensure a comma after the
"org.example.daybook.path/main" entry and after the closing brace for the
"org.example.daybook.dmeta/main" object (and any other entries that follow) so
the object with keys "org.example.daybook.title/main",
"org.example.daybook.path/main", and "org.example.daybook.dmeta/main" is valid
JavaScript.
- Around line 109-125: The example objects for "org.example.daybook.blob/main"
and "org.example.daybook.imagemetadata/main" have missing trailing commas that
make the JavaScript invalid; update the literal for
"org.example.daybook.blob/main" to include a comma after the mime property value
and ensure the urls array line ends with a comma, and update the
"org.example.daybook.imagemetadata/main" object to include a trailing comma
after the refHeads property so all object properties (mime, lengthOctets,
digest, inlineBase64, urls, facetRef, refHeads) are properly comma-separated.

In `@src/am_utils_rs/sync/peer.rs`:
- Around line 460-491: The code currently treats a page with has_more=true and
an unchanged doc_cursor as an error, but apply_doc_events() may legitimately
queue a RequestDocSync and keep doc_cursor steady until a Samod ack arrives;
instead of eyre::bail! in the block checking any_more && next_cursor ==
cursor.doc_cursor, change behavior to pause advancing this partition's bootstrap
(do not error) so the select loop can receive from samod_ack_rx: detect this
stalled-but-ack-gated condition in the place around
apply_doc_events()/next_cursor/doc_cursor, set a local flag or break out of the
partition loop (keeping unresolved_total/synced_docs state) to defer further
cursor advancement, and ensure emit_doc_sync_status is still called; reference
apply_doc_events, RequestDocSync, doc_cursor, samod_ack_rx and
emit_doc_sync_status when making the change.

---

Nitpick comments:
In `@AGENTS.md`:
- Around line 112-119: The sentence "Human auditors of this repo don't really
read test code or mantain it." in AGENTS.md should be rephrased to avoid
implying tests are unimportant; replace it with a constructive guideline that
encourages contributors to proactively improve and maintain tests (e.g.,
emphasize "when working on tests, assume they need cleanup and strengthen them"
and suggest concrete actions like removing shallow tests, consolidating setup,
and using snapshot/TDD macros), and update surrounding lines mentioning
"gardening" and tools (snapshot tests, TDD, macros) to frame these as practices
to preserve test quality rather than tolerate lower standards.

In `@docs/dict.md`:
- Line 94: Reword the sentence describing the special convention for `self`:
replace the awkward phrase "the facet making holding the URL" with a clearer
phrase such as "the facet that holds the URL" or "the facet containing the URL",
and change "indicates" to "indicate" so the clause reads properly (e.g., "As a
special convention, `self` can be used instead of the document id to indicate
that the reference is to the same doc containing the facet that holds the
URL.").

In `@src/am_utils_rs/sync/peer.rs`:
- Line 332: Remove the leftover info-level debug log that prints the full
partition list: delete or change the call to info!(?partitions, "XXX"); in
peer.rs (the logging invocation referencing the partitions variable and the
info! macro). If you want to keep the diagnostic, convert it to
debug!(?partitions, "partitions") or otherwise redact sensitive details;
otherwise simply remove the line before merging.

In `@src/daybook_core/index/doc_blobs.rs`:
- Around line 178-182: The code currently calls
self.drawer_repo.get_at_heads_with_facets_arc(...).await? and immediately
unwrap_or_default into facets, which hides when the repo returned None; change
this to explicitly match the Option result from get_at_heads_with_facets_arc (or
keep the interim variable) and if it is None while selected_blob_keys is
non-empty, emit a warning/error log containing doc_id, heads and
selected_blob_keys (and any repo error context) before falling back to an empty
map; keep the existing behavior of using an empty map when None but ensure the
unexpected None is logged for debugging (referencing
get_at_heads_with_facets_arc, facets, selected_blob_keys, doc_id and heads).
- Around line 131-133: The CREATE INDEX call for idx_doc_blob_refs_doc_id in
doc_blobs.rs is redundant because doc_id is the leading column of the composite
primary key (doc_id, blob_hash); remove the sqlx::query("CREATE INDEX IF NOT
EXISTS idx_doc_blob_refs_doc_id ON
doc_blob_refs(doc_id)")...execute(db_pool).await?; statement to avoid
unnecessary overhead, or if you decide to keep it for a documented performance
reason, add a clear comment above it referencing doc_blob_refs, doc_id, and
blob_hash explaining why the extra index is required and add a test/benchmark
proving the benefit.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb3f9b40-a4f8-4a19-bd67-8b84b38ea985

📥 Commits

Reviewing files that changed from the base of the PR and between e9d973d and 89a7941.

📒 Files selected for processing (12)

AGENTS.md
docs/dict.md
src/am_utils_rs/sync.rs
src/am_utils_rs/sync/peer.rs
src/daybook_core/blobs.rs
src/daybook_core/drawer.rs
src/daybook_core/index/doc_blobs.rs
src/daybook_core/plugs.rs
src/daybook_core/rt/dispatch.rs
src/daybook_core/sync/full.rs
src/daybook_core/tables.rs
src/daybook_types/doc.rs

coderabbitai · 2026-03-18T11:20:42Z

+                let doc_id = doc_id
+                    .parse::<DocumentId>()
+                    .map_err(|err| ferr!("invalid doc id '{}': {err}", doc_id))?;
+                let Some(endpoint_id) = self.endpoint_for_peer_key(&peer_key) else {
+                    debug!(peer_key, %doc_id, "dropping samod doc sync request for unknown peer");
+                    return Ok(());


⚠️ Potential issue | 🟠 Major

Do not let malformed peer doc_id kill the full-sync worker.

doc_id.parse::<DocumentId>() errors from peer input currently propagate, and this can terminate the worker task via fut.await.unwrap() at Line 415. Treat malformed peer payloads as non-fatal and drop them with a warning.

💡 Suggested fix

@@ - let doc_id = doc_id - .parse::<DocumentId>() - .map_err(|err| ferr!("invalid doc id '{}': {err}", doc_id))?; + let doc_id = match doc_id.parse::<DocumentId>() { + Ok(doc_id) => doc_id, + Err(err) => { + warn!(peer_key, raw_doc_id = %doc_id, ?err, "dropping samod doc sync request with invalid doc id"); + return Ok(()); + } + }; @@ - let doc_id = doc_id - .parse::<DocumentId>() - .map_err(|err| ferr!("invalid doc id '{}': {err}", doc_id))?; + let doc_id = match doc_id.parse::<DocumentId>() { + Ok(doc_id) => doc_id, + Err(err) => { + warn!(peer_key, raw_doc_id = %doc_id, ?err, "dropping samod doc-deleted request with invalid doc id"); + return Ok(()); + } + };

Based on learnings: “When validating data that comes from external inputs … use eyre::bail!() to return errors rather than panicking … handle version drift across devices safely.”

Also applies to: 1243-1248, 413-416

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/full.rs` around lines 1188 - 1193, The current parse::<DocumentId>() call in the full-sync handling path should not propagate a parse error up and kill the worker; instead, detect a malformed peer doc_id, log a warning/debug with the offending peer_key and raw doc_id, and drop the message by returning Ok(()) so the worker continues; update the block around parse::<DocumentId>() and the surrounding logic that uses self.endpoint_for_peer_key(&peer_key) to early-return Ok(()) on parse failures (and similarly at the other sites you noted), replacing map_err propagation with a non-fatal drop-and-log behavior.

dman-os added 8 commits March 6, 2026 08:47

feat(core): blobs sync

c905b15

wip(cli): qr codes refactor(core): perf work refactor(ci): branch cache strategy fix refactor(iac): avoid retain on ollama pvc

feat: BlobSyncWorker

e2329e6

feat(core): DrawerRepo.batch_add

68a879a

wip(sync): ProgressRepo integration

0c15be5

fix: restore truncated file

53b2533

wip(core): drawer refactor 5

e2e75b3

wip: drawer refactor cont

8094a0e

refactor: user path cleanup

060e8f3

dman-os added 9 commits March 8, 2026 17:39

wip: samod 0.8

ebca821

feat: BigRepo

1b2265d

feat(am_utils_rs): change provenance

e15ba74

feat(am_utils_rs): big repo changes system refactor(am_utils_rs): sync manual cleanup

wip: full re-work

aa10a05

fix: green tests

b2dd2f3

wip: hand cleanup

e229648

wip: wip

83c756b

wip: drawer cleanup

6927648

fix: small cleanup

72f1d95

dman-os commented Mar 13, 2026

View reviewed changes

dman-os changed the base branch from feat/sync/blob-sync to main March 13, 2026 22:32

refactor: remove dead files

0edf524

dman-os commented Mar 13, 2026

View reviewed changes

Comment thread src/daybook_core/sync/tests/stress.rs

Copy link
Copy Markdown

Owner Author

dman-os Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the need of such a file structure.

dman-os marked this pull request as ready for review March 13, 2026 23:31

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

dman-os added 5 commits March 14, 2026 14:23

refactor: use channel based listeners

3104177

fix: address more feedback

8ade7d2

refactor: more cleanup

64561d1

fix: address feedback

3c35cd2

refactor: hand cleanup

fb39d83

dman-os added 4 commits March 15, 2026 02:49

wip: wip

5715ede

fix: adddress feedback

c181ca9

fix: address feedback

b48e4f9

fx: get tests working

0274b9c

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

dman-os added 3 commits March 15, 2026 23:05

fix: address feedback

82c3db6

fix: address more feedback

2225c7a

fix: formatting

fix: samod tls backend

3adc023

dman-os changed the title ~~refactor(core): drawer refactor umpteenth~~ feat: BigRepo Mar 17, 2026

fix: broken samod

e9d973d

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

dman-os added 2 commits March 18, 2026 14:02

fix: address feedback

b27c3f8

fix: formatting

89a7941

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

dman-os added 2 commits March 18, 2026 15:15

fix: address feedback

3b6d4f4

fix: done with the PR

3d43934

dman-os merged commit 2f221d6 into main Mar 18, 2026
7 of 8 checks passed

dman-os deleted the refactor/core/drawer-refactor-umpteenth branch March 18, 2026 13:13

dman-os mentioned this pull request Mar 18, 2026

feat(sync): blob sync #13

Closed

This was referenced Mar 18, 2026

fix: get compose app to compile #15

Merged

feat(sync): blob partitions #16

Merged

coderabbitai bot mentioned this pull request Apr 1, 2026

feat: plug_plabels #18

Merged

This was referenced Apr 8, 2026

refactor: drawer rework again #21

Merged

feat(plugs): invoke #20

Merged

refactor(sync): subduction #23

Open

Conversation

dman-os commented Mar 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

dman-os left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dman-os Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dman-os Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dman-os left a comment

Choose a reason for hiding this comment

Uh oh!

dman-os Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dman-os commented Mar 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 7, 2026 •

edited

Loading