Conversation
wip(cli): qr codes refactor(core): perf work refactor(ci): branch cache strategy fix refactor(iac): avoid retain on ollama pvc
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughLarge refactor replacing AmCtx-centric APIs with a SharedBigRepo abstraction; introduces BigRepo, repo-integrated change listeners and per-doc brokers, partition-aware sync (protocol, node, peer, store), full-sync workers (doc/blob), blob indexing, wide app/CLI/FFI migrations, Cargo/CI tweaks, and many tests. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant BigRepo as BigRepo
participant Broker as DocChangeBroker
participant Switch as ChangeListenerManager
participant Listener as Listener
Client->>BigRepo: create_doc(initial_content)
BigRepo->>Broker: spawn_doc_listener(doc_id)
Broker-->>BigRepo: DocChangeBrokerHandle
Client->>BigRepo: apply_change(doc_id, patch)
BigRepo->>Broker: notify_heads_changed(doc_id, new_heads, origin)
Broker->>Broker: compute diffs -> patches
Broker->>Switch: send Vec<BigRepoChangeNotification>
Switch->>Listener: filter & dispatch matching notifications
Listener->>Client: call on_change callback
sequenceDiagram
actor PeerA
actor PeerB
participant SyncNode as SyncNode
participant Store as SyncStore
participant BigRepo as BigRepo
participant PeerWorker as PeerSyncWorker
PeerA->>SyncNode: register_local_peer(peer_key)
SyncNode->>BigRepo: list_partitions_for_peer(peer_key)
BigRepo-->>SyncNode: partitions
PeerWorker->>SyncNode: ListPartitions RPC
SyncNode->>Store: get_partition_member_events_for_peer(req)
Store-->>SyncNode: events
SyncNode-->>PeerWorker: stream events (SubPartitions)
PeerWorker->>BigRepo: import_doc / apply events
PeerWorker->>PeerA: ack / progress events
sequenceDiagram
participant FullSync as FullSyncWorker
participant BlobWorker as BlobSyncWorker
participant BlobsRepo as BlobsRepo
participant Endpoint as Endpoint
FullSync->>BlobWorker: spawn_blob_sync_worker(hash, peers)
BlobWorker->>BlobsRepo: has_hash(hash)?
alt present locally
BlobWorker->>FullSync: mark_synced
else present in store
BlobWorker->>BlobsRepo: put_from_store(hash)
BlobWorker->>FullSync: mark_synced
else needs download
BlobWorker->>Endpoint: request_download(hash)
Endpoint-->>BlobWorker: transfer progress/completion
BlobWorker->>BlobsRepo: put(downloaded)
BlobWorker->>FullSync: mark_synced
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
feat(am_utils_rs): big repo changes system refactor(am_utils_rs): sync manual cleanup
|
|
||
| struct DocChangeBroker { | ||
| heads_listeners: Arc<Mutex<Vec<HeadListener>>>, | ||
| handle: DocHandle, |
There was a problem hiding this comment.
Consider using BigRepoHandle here for consistency.
| } | ||
|
|
||
| impl BigRepo { | ||
| pub async fn boot_with_repo(repo: samod::Repo, config: BigRepoConfig) -> Res<Arc<Self>> { |
There was a problem hiding this comment.
We probably want to hide samod itself as an impl detail. Audit usages of samod API items across the big repo api surface and evaluate what we can get rid of.
dman-os
left a comment
There was a problem hiding this comment.
Create a separate partition to create our app and drawer doc?
There was a problem hiding this comment.
Not sure I understand the need of such a file structure.
There was a problem hiding this comment.
Actionable comments posted: 6
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
.github/workflows/builds.yml (1)
271-271:⚠️ Potential issue | 🟠 MajorUpdate
actions/upload-artifactto v7.The workflow uses
actions/upload-artifact@v6across multiple jobs. Version 7.0.0 was released on February 26, 2026, and is now the latest major version. Update toactions/upload-artifact@v7to access the latest features, improvements, and security updates.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/builds.yml at line 271, Update the GitHub Actions step that references the actions/upload-artifact action to use the new major version tag: replace all occurrences of actions/upload-artifact@v6 with actions/upload-artifact@v7 in the .github/workflows/builds.yml workflow (search for the literal string "actions/upload-artifact@v6" and update each usage to "actions/upload-artifact@v7").src/daybook_core/rt/switch.rs (1)
151-201:⚠️ Potential issue | 🟠 MajorDo not swallow init replay failures.
Using
.unwrap_or_log()here can silently skip initialization events and start with inconsistent switch state.Proposed fix
- worker - .track_event_heads(&SwitchEvent::Plugs(Arc::clone(&event))) - .await - .unwrap_or_log(); - worker - .dispatch_to_listeners(&SwitchEvent::Plugs(event)) - .await - .unwrap_or_log(); + worker + .track_event_heads(&SwitchEvent::Plugs(Arc::clone(&event))) + .await?; + worker + .dispatch_to_listeners(&SwitchEvent::Plugs(event)) + .await?;Apply the same replacement for the drawer/dispatch/config init loops in this block.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/rt/switch.rs` around lines 151 - 201, The init loops for Drawer, Dispatch and Config currently call track_event_heads(...).await.unwrap_or_log() and dispatch_to_listeners(...).await.unwrap_or_log(), which swallows errors; change those calls to propagate failures instead (e.g. remove unwrap_or_log and use ? on the awaited Result) so any error from worker.track_event_heads or worker.dispatch_to_listeners surfaces during init for SwitchEvent::Drawer, SwitchEvent::Dispatch and SwitchEvent::Config (mirror the fix used for the plugs loop).src/daybook_core/plugs.rs (1)
917-930:⚠️ Potential issue | 🟠 MajorDon't
expect()replicated plug hydration to succeed.This path is handling remote automerge state. A missing hydrated node here is version drift/corruption, not a programming assertion, so
expect(ERROR_INVALID_PATCH)will take down the repo worker on bad peer data instead of surfacing an error.Possible fix
- .await? - .expect(ERROR_INVALID_PATCH); + .await? + .ok_or_eyre(ERROR_INVALID_PATCH)?;Based on learnings, when validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use
eyre::bail!()to return errors rather than panicking (panic!()) or leavingunimplemented!().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/plugs.rs` around lines 917 - 930, The code currently calls expect(ERROR_INVALID_PATCH) after hydrate_path_at_heads in the PlugsEvent::PlugAdded/PlugChanged handling (the call to self.big_repo.hydrate_path_at_heads::<Versioned<ThroughJson<Arc<manifest::PlugManifest>>>>(...)), which will panic on missing data; change this to explicitly handle the Option result and return an error with eyre::bail! (or eyre::bail!(...)) instead of panicking — check if the await? yields None and call eyre::bail! with a clear message including id and heads, so the function returns an error rather than killing the repo worker.src/daybook_core/rt/dispatch.rs (2)
239-250:⚠️ Potential issue | 🟠 MajorKeep
wflow_to_dispatchconsistent when applying remote events.The replicated add/delete path only updates
active_dispatches.get_by_wflow_job()and deletion cleanup rely onwflow_to_dispatch, so remote dispatches never become discoverable by job id and replicated deletions can leave stale index entries behind.Possible fix
DispatchEvent::DispatchAdded { id, heads } => { // Hydrate the new dispatch at heads let (new_versioned, _) = self .big_repo .hydrate_path_at_heads::<Versioned<ThroughJson<Arc<ActiveDispatch>>>>( &self.app_doc_id, &heads.0, automerge::ROOT, vec![ DispatchStore::prop().into(), "active_dispatches".into(), autosurgeon::Prop::Key(id.clone().into()), ], ) .await? .expect(ERROR_INVALID_PATCH); self.store .mutate_sync(|store| { + match &new_versioned.val.0.deets { + ActiveDispatchDeets::Wflow { wflow_job_id, .. } => { + store.wflow_to_dispatch.insert(wflow_job_id.clone(), id.clone()); + } + } + store.cancelled_dispatches.remove(id); store.active_dispatches.insert(id.clone(), new_versioned); }) .await?; } DispatchEvent::DispatchDeleted { id, .. } => { self.store .mutate_sync(|store| { - store.active_dispatches.remove(id); + if let Some(old) = store.active_dispatches.remove(id) { + match &old.val.0.deets { + ActiveDispatchDeets::Wflow { wflow_job_id, .. } => { + store.wflow_to_dispatch.remove(wflow_job_id); + } + } + } + store.cancelled_dispatches.remove(id); }) .await?; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/rt/dispatch.rs` around lines 239 - 250, The remote event handling updates only active_dispatches and misses updating the wflow_to_dispatch index, so remote adds are not discoverable by get_by_wflow_job() and remote deletes leave stale entries; inside the DispatchEvent::DispatchAdded branch (the mutate_sync call that inserts into active_dispatches) also insert the corresponding mapping into store.wflow_to_dispatch (using the dispatch's workflow/job key used by get_by_wflow_job()), and inside DispatchEvent::DispatchDeleted (the mutate_sync call that removes from active_dispatches) remove the same mapping from store.wflow_to_dispatch to keep the two maps consistent and avoid stale index entries.
224-237:⚠️ Potential issue | 🟠 MajorHandle replicated dispatch drift with
Err, notpanic!().Both of these branches are driven by remote CRDT state. Right now a missing hydrated dispatch or a non-add update from another device will crash the notification worker instead of surfacing drift as an error.
Possible fix
- .await? - .expect(ERROR_INVALID_PATCH); + .await? + .ok_or_eyre(ERROR_INVALID_PATCH)?;- out.push(if vtag.version.is_nil() { - DispatchEvent::DispatchAdded { - id: dispatch_id.clone(), - heads: dispatch_heads, - } - } else { - panic!("dispatch update detected") - }); + if vtag.version.is_nil() { + out.push(DispatchEvent::DispatchAdded { + id: dispatch_id.clone(), + heads: dispatch_heads, + }); + } else { + eyre::bail!("unexpected dispatch update for active_dispatches/{dispatch_id}"); + }Based on learnings, when validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use
eyre::bail!()to return errors rather than panicking (panic!()) or leavingunimplemented!().Also applies to: 300-311
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/rt/dispatch.rs` around lines 224 - 237, The code currently uses expect()/panic!() when a hydrated dispatch is missing or when a non-add update is observed (locations around the hydrate_path_at_heads call that returns Versioned<ThroughJson<Arc<ActiveDispatch>>> and the branches referenced at lines ~300-311); change these to return a proper eyre error instead of panicking: replace the .expect(ERROR_INVALID_PATCH) (and any panic!/unimplemented!() in the non-add-update handling) with eyre::bail!(...) or eyre::eyre!(...) that includes context (dispatch id, heads, and a short message about replicated drift) so the error surfaces to the caller rather than aborting the worker.src/daybook_core/tables.rs (1)
497-510:⚠️ Potential issue | 🟠 MajorAvoid
expect()on replicated table state.Each of these hydrations is driven by remote automerge patches. A missing node here is version drift/data corruption, not a programming assertion, so this should return
Errinstead of panicking.Possible fix
- .await? - .expect(ERROR_INVALID_PATCH); + .await? + .ok_or_eyre(ERROR_INVALID_PATCH)?;Apply the same change to the window/tab/panel/table branches in this match.
Based on learnings, when validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use
eyre::bail!()to return errors rather than panicking (panic!()) or leavingunimplemented!().Also applies to: 527-540, 558-571, 589-602
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/tables.rs` around lines 497 - 510, Replace the panic-inducing expect on hydrated replicated state with an error return using eyre::bail! so missing nodes return Err rather than panic: in the hydrate_path_at_heads::<Versioned<Window>> call (and the analogous hydrate calls for tab/panel/table branches) remove the .expect(ERROR_INVALID_PATCH) and instead check for None and eyre::bail!(ERROR_INVALID_PATCH) so the function returns an Err; target the call sites using big_repo.hydrate_path_at_heads::<Versioned<Window>> (and similar generic calls), the ERROR_INVALID_PATCH constant, and the path components TablesStore::prop(), "windows"/other collection names, and autosurgeon::Prop::Key(...) when applying the same change to each branch.src/daybook_cli/main.rs (1)
706-715:⚠️ Potential issue | 🟠 MajorDon't mark the clone initialized before the second sync phase succeeds.
After
mark_repo_initialized()returns, later opens will treat the repo as healthy even though only the bootstrap docs are guaranteed to exist locally. Ifwait_for_full_sync(...)times out or fails, the user is left with an initialized-but-partial clone.Also applies to: 778-784
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_cli/main.rs` around lines 706 - 715, The code currently calls daybook_core::app::globals::set_init_state(...) and daybook_core::repo::mark_repo_initialized(&destination) before the second sync phase, which can mark a partial clone as initialized; move/defer those calls so they execute only after wait_for_full_sync(...) completes successfully (i.e., after the second sync/verification step succeeds), keeping big_repo_stop.stop().await? handling and error propagation unchanged so failures still stop/cleanup without marking the repo initialized; update both the block around set_init_state/mark_repo_initialized and the analogous block that covers wait_for_full_sync (the later 778-784 region) to follow the same ordering.src/daybook_core/config.rs (1)
206-213:⚠️ Potential issue | 🟠 MajorDon't panic the config watcher on remote document drift.
notifs_loop()is consuming replicated document data, but theseexpect(...)calls turn version drift or a malformed remote patch into a dead background watcher. This path should surface an error and exit cleanly instead of panicking on external input. Based on learnings: When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking.Also applies to: 286-296
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/config.rs` around lines 206 - 213, The spawn of the background watcher currently calls repo.notifs_loop(...).await.expect(...), which will panic on malformed/remote-driven data; replace the expect-based panic with proper error propagation and graceful exit: have the async task await repo.notifs_loop(...) and map any Err to an eyre::bail! or return an Err from the spawned future so the task fails cleanly (do not call expect). Apply the same change to the other similar spawn block mentioned (the block around lines 286-296) so both tokio::spawn closures propagate errors from notifs_loop and any validation paths instead of panicking.src/daybook_core/sync.rs (1)
322-333:⚠️ Potential issue | 🟠 MajorIgnore late finish signals for untracked connections.
The duplicate-connection path stops a
RepoConnectionbefore it is inserted intoactive_samod_peers, and stopped connections can still emitConnFinishSignals.expect("connection finished for unknown conn_id")turns that normal race into a panic in the sync loop.Also applies to: 367-377
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync.rs` around lines 322 - 333, In handle_incoming_am_conn, the code currently assumes a finish signal always maps to a tracked connection and uses expect, which panics for stopped-but-untracked connections; change the finish-signal handling so that when a ConnFinishSignal refers to an endpoint_id not present in active_samod_peers you simply ignore (or log at debug) the signal instead of panicking. Locate the branch that looks up endpoint_id in active_samod_peers (and the matching block after conn.stop()) and replace the expect("connection finished for unknown conn_id") with a fallible lookup that handles the missing key gracefully; apply the same defensive change to the similar finish-signal handling in the other block that mirrors this logic.
🟠 Major comments (19)
src/daybook_core/sync/stress.rs-28-28 (1)
28-28:⚠️ Potential issue | 🟠 Major
std::env::set_varis unsafe in multi-threaded contexts.This test uses
#[tokio::test(flavor = "multi_thread")], butstd::env::set_varis not thread-safe and can cause undefined behavior when called concurrently with other code reading environment variables. Since Rust 1.66, this generates a warning, and it's been marked as unsafe in more recent editions.Consider setting this environment variable before the test runtime starts, or use a different mechanism for test configuration.
🛠️ Possible alternatives
- std::env::set_var("DAYB_DISABLE_KEYRING", "1"); + // Set via test harness or use a configuration struct instead of env var + // Or ensure this is set before spawning the async runtimeAlternatively, if the code reading
DAYB_DISABLE_KEYRINGcan be refactored to accept a config parameter, that would be the safest approach.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/stress.rs` at line 28, The call to std::env::set_var("DAYB_DISABLE_KEYRING", "1") inside the async test annotated with #[tokio::test(flavor = "multi_thread")] is unsafe in multi-threaded contexts; remove that call from the async test and instead initialize the env var before the Tokio runtime starts — e.g., create a static std::sync::Once (e.g., static INIT: Once) and call INIT.call_once(|| std::env::set_var("DAYB_DISABLE_KEYRING","1")) from a synchronous test setup function or module-level initializer, or alternatively change the test to #[tokio::test(flavor = "current_thread")] if single-threaded execution is acceptable; update the stress test to rely on this pre-runtime initialization rather than calling std::env::set_var inside the async function.src/daybook_core/build.rs-4-6 (1)
4-6:⚠️ Potential issue | 🟠 MajorEmit
rerun-if-env-changedfor the skip flag.Line 4 changes build behavior via
DAYBOOK_SKIP_WFLOW_BUILD, but Cargo is not told to rerun when that env var changes. This can leave stale build-script outputs when the flag is toggled between builds.Suggested fix
fn main() -> Result<(), Box<dyn std::error::Error>> { + println!("cargo:rerun-if-env-changed=DAYBOOK_SKIP_WFLOW_BUILD"); if std::env::var("DAYBOOK_SKIP_WFLOW_BUILD").as_deref() == Ok("1") { return Ok(()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/build.rs` around lines 4 - 6, In build.rs, before you check DAYBOOK_SKIP_WFLOW_BUILD (the if block that returns Ok(())), emit a Cargo directive to rerun the build script when that env var changes by printing "cargo:rerun-if-env-changed=DAYBOOK_SKIP_WFLOW_BUILD"; place this println! immediately before the if that tests std::env::var("DAYBOOK_SKIP_WFLOW_BUILD") so toggling the flag will cause Cargo to rerun the build script rather than leaving stale outputs.src/daybook_core/drawer/facet_recovery.rs-55-58 (1)
55-58:⚠️ Potential issue | 🟠 MajorDo not silently drop unrecoverable
updatedAtentries.At Line 56,
hash_for_opidreturningNoneis ignored, which can return incomplete head recovery without surfacing an error.💡 Proposed fix
for ii in 0..length { - if let Some((_, exid)) = get(doc, &updated_at_list, ii, read_heads)? { - if let Some(hash) = doc.hash_for_opid(&exid) { - recovered.push(hash); - } - } + let (_, exid) = get(doc, &updated_at_list, ii, read_heads)? + .ok_or_else(|| eyre::eyre!("updatedAt entry missing at index {ii} for facet: {facet_key}"))?; + let hash = doc + .hash_for_opid(&exid) + .ok_or_else(|| eyre::eyre!("could not recover hash for updatedAt entry at index {ii} for facet: {facet_key}"))?; + recovered.push(hash); }Based on learnings: when validating external automerge/dmeta inputs, use
eyre::bail!()error paths for drift instead of silently accepting unexpected states.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/drawer/facet_recovery.rs` around lines 55 - 58, The code silently skips unrecoverable updatedAt entries when doc.hash_for_opid(&exid) returns None, leading to incomplete recovery; update the logic in the block that calls get(doc, &updated_at_list, ii, read_heads) so that if doc.hash_for_opid(&exid) yields None you return an error (e.g., eyre::bail! or eyre::Report) instead of ignoring it—reference the get(...) call, the exid variable, doc.hash_for_opid(&exid), and the recovered vector to locate the change and replace the silent drop with an eyre::bail! providing context about the missing hash/drift.src/daybook_core/blobs.rs-29-30 (1)
29-30:⚠️ Potential issue | 🟠 MajorDeduplicated references need more than one source path.
Reference metadata is keyed by content hash but only stores one
source_path. A secondput_path_reference()for identical bytes overwrites the first path, andget_path()then fails as soon as the last-written file disappears even if an older source still exists and still hashes correctly.Please preserve multiple candidates here, or at least avoid replacing a still-live reference with another one. Please also add a regression test with two distinct paths that hash identically.
As per coding guidelines "When working with Rust, use small tests in addition to
cargo clippyto validate ideas."Also applies to: 118-125, 185-188
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/blobs.rs` around lines 29 - 30, The current reference metadata stores only one source_path (fields src_local_user_path and source_path), so calling put_path_reference() twice for identical content overwrites the prior candidate and get_path() can fail if the last-written path is removed; change the metadata to preserve multiple candidate paths (e.g., Vec<String> or a Set) instead of a single source_path, update put_path_reference() to append/merge without discarding live existing paths and update get_path() to prefer an existing live path from the candidates, and add a regression test that writes two distinct filesystem paths with identical bytes and asserts get_path() still finds a remaining path after removing the other; update any code that reads/writes the metadata to handle the new collection type and ensure serialization/deserialization (if any) is adjusted accordingly.src/daybook_core/blobs.rs-70-95 (1)
70-95:⚠️ Potential issue | 🟠 MajorAlways clean up
source_snapshot, even on errors.Both
put_path_copy()andput_path_reference()only remove the snapshot on the success tail. Any failure aftercreate_source_snapshot()—hashing, metadata I/O, or iroh ingest—leaks a full-sized temp file underroot/snapshots.Wrap the body in
let result = async { ... }.await; let _ = tokio::fs::remove_file(&source_snapshot).await; resultso cleanup runs on both success and error.Also applies to: 110-129
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/blobs.rs` around lines 70 - 95, The snapshot file created by create_source_snapshot is only removed on the success path, leaking large temp files on errors; modify put_path_copy (and similarly put_path_reference around lines showing create_source_snapshot/use of source_snapshot) to wrap the existing body that operates on source_snapshot in an inner async block (e.g., let result = async { ... }.await), then unconditionally attempt to remove the snapshot with let _ = tokio::fs::remove_file(&source_snapshot).await; after that and return result so cleanup runs whether the inner block succeeded or returned an error; ensure you keep existing uses of hash, object_paths, build_meta, write_meta, ingest_path_with_iroh, and atomic_copy_file inside the inner block.src/xtask/main.rs-709-713 (1)
709-713:⚠️ Potential issue | 🟠 MajorAdd a timeout when waiting for cancellation report.
This await can block forever if the server-side probe task never emits a report.
Proposed fix
- let (elapsed, send_ok) = report_rx - .recv() - .await - .ok_or_eyre("probe report missing after client cancellation")?; + let (elapsed, send_ok) = tokio::time::timeout(Duration::from_secs(5), report_rx.recv()) + .await + .wrap_err("timed out waiting for probe report after client cancellation")? + .ok_or_eyre("probe report missing after client cancellation")?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/xtask/main.rs` around lines 709 - 713, The code awaits report_rx.recv().await without a timeout, which can hang if the probe task never sends a report; wrap the recv().await call in a Tokio timeout (tokio::time::timeout) with a reasonable Duration and map a timeout error to the existing eyre error path (e.g., produce an "probe report missing after client cancellation (timed out)" error) so the surrounding let (elapsed, send_ok) = ...? logic handles both normal and timed-out cases; update imports if needed (tokio::time::Duration / timeout) and keep the same variable names (report_rx, elapsed, send_ok) and error handling flow.src/am_utils_rs/repo.rs-171-188 (1)
171-188:⚠️ Potential issue | 🟠 MajorGlobal and pre-create listeners never attach a broker.
This only acquires a
DocChangeBrokerLeasewhenfilter.doc_idis set and the doc already exists. If there is no other lifecycle hook callingensure_change_broker, global listeners and listeners registered before an import/create will miss remoteDocChangedevents.Remote change notifications flow exclusively through brokers spawned during
add_doc_with_remote_heads_callback, which send to thechange_txchannel consumed by the switchboard. Global listeners (doc_id: None) registered inadd_change_listenerreceive no broker leases and thus depend entirely on brokers existing for documents—but brokers are only created when a specific doc_id is requested or whenensure_change_brokeris called explicitly. Documents that lack either fail to emit remote events to global listeners.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo.rs` around lines 171 - 188, The add_change_listener implementation only acquires a DocChangeBrokerLease when filter.doc_id exists and the doc is already present, so global listeners and listeners added before a doc is created never get broker leases and miss remote DocChanged events; fix by ensuring brokers are attached for relevant docs and future docs: 1) when filter.doc_id is None (global listener) iterate current documents (via the same mechanism used by find_doc_handle/add_doc_with_remote_heads_callback) and call ensure_change_broker for each to collect broker leases; 2) when filter.doc_id is Some but the doc does not yet exist, register a one-shot hook or piggyback on add_doc_with_remote_heads_callback so that ensure_change_broker is called and its lease appended to the returned registration once the doc is created; use the existing functions ensure_change_broker, find_doc_handle, add_doc_with_remote_heads_callback and change_manager.add_listener to locate insertion points.src/am_utils_rs/repo.rs-109-123 (1)
109-123:⚠️ Potential issue | 🟠 MajorReject non-UTF-8 disk paths instead of lossy-converting them.
Both the samod storage root and the SQLite sidecar URL are built with lossy path formatting. If
pathcontains invalid UTF-8 sequences, the code can silently open a different repository or database than requested.🛠️ Suggested guard
StorageConfig::Disk { path } => { std::fs::create_dir_all(&path).wrap_err_with(|| { format!("Failed to create storage directory: {}", path.display()) })?; - let repo = repo.with_storage(samod::storage::TokioFilesystemStorage::new( - path.to_string_lossy().as_ref(), - )); + let storage_path = path + .to_str() + .ok_or_else(|| ferr!("repo storage path must be valid UTF-8: {}", path.display()))?; + let repo = + repo.with_storage(samod::storage::TokioFilesystemStorage::new(storage_path)); let loaded = if let Some(policy) = announce_policy { repo.with_announce_policy(policy).load().await } else { repo.load().await }; - let sqlite_url = format!("sqlite://{}", path.join("big_repo.sqlite").display()); + let sqlite_path = path.join("big_repo.sqlite"); + let sqlite_url = format!( + "sqlite://{}", + sqlite_path.to_str().ok_or_else(|| { + ferr!("sqlite path must be valid UTF-8: {}", sqlite_path.display()) + })? + ); (loaded, sqlite_url) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo.rs` around lines 109 - 123, The code currently uses path.to_string_lossy() and path.join(...).display() which silently accepts non-UTF8 paths; instead validate and reject non-UTF8 early: call path.to_str().ok_or_else(|| anyhow!("storage path contains invalid UTF-8")) and return the error before constructing the repo or sqlite URL, then pass the validated &str into repo.with_storage(TokioFilesystemStorage::new(...)) and use a UTF-8-safe string when building sqlite_url (e.g., construct the sidecar path with path.join("big_repo.sqlite") and ensure its to_str() is valid before formatting "sqlite://..."); update the block around repo.with_storage and sqlite_url to use these checks and propagate the error.src/am_utils_rs/repo.rs-412-414 (1)
412-414:⚠️ Potential issue | 🟠 MajorSave and restore the actor after the scoped write.
When
set_actor()is called withSome(actor_id), it mutates the document's default actor persistently. Since the samesamod::DocHandleis reused across multiplewith_document()calls (as seen in repeatedflush()calls on the same document), the modified actor leaks into subsequent writes on that handle. Wrap theset_actor()call and transaction in a save/restore pattern to ensure the original actor is restored once the operation completes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo.rs` around lines 412 - 414, The code calls doc.set_actor(actor.clone()) which mutates the DocHandle's default actor and leaks into subsequent with_document()/flush() calls; before setting, capture the current actor (e.g., via doc.actor() or the appropriate getter on samod::DocHandle), then set the new actor only for the scoped write and restore the saved actor in a finally/Drop scope (or after the transaction) so the original actor is reinstated; apply this save/restore pattern around the set_actor(actor.clone()) call and the surrounding transaction/with_document() usage to avoid leaking the actor across uses of the same doc handle.src/daybook_core/index/doc_blobs.rs-140-148 (1)
140-148:⚠️ Potential issue | 🟠 MajorEmit
Deletedwhen reindexing removes the doc from the blob index.
reindex_doc()can delete all rows for a document when it disappears, loses its Blob facets, or resolves to an empty hash set, but theUpsertworker path still publishesDocBlobsIndexEvent::Updated. Subscribers will miss evictions on blob-removal updates and can retain stale memberships.Possible fix
- async fn handle_worker_item(&self, item: DocBlobsIndexWorkItem) -> Res<()> { + async fn handle_worker_item(&self, item: DocBlobsIndexWorkItem) -> Res<()> { match item { DocBlobsIndexWorkItem::Upsert { doc_id, heads } => { - self.reindex_doc(&doc_id, &heads).await?; - self.registry - .notify([DocBlobsIndexEvent::Updated { doc_id }]); + let still_indexed = self.reindex_doc(&doc_id, &heads).await?; + self.registry.notify([if still_indexed { + DocBlobsIndexEvent::Updated { doc_id } + } else { + DocBlobsIndexEvent::Deleted { doc_id } + }]); } DocBlobsIndexWorkItem::DeleteDoc { doc_id } => { self.delete_doc(&doc_id).await?; self.registry .notify([DocBlobsIndexEvent::Deleted { doc_id }]); } } Ok(()) } - pub async fn reindex_doc(&self, doc_id: &DocId, heads: &ChangeHashSet) -> Res<()> { + pub async fn reindex_doc(&self, doc_id: &DocId, heads: &ChangeHashSet) -> Res<bool> { let Some(facet_keys) = self.drawer_repo.facet_keys_at_heads(doc_id, heads).await? else { self.delete_doc(doc_id).await?; - return Ok(()); + return Ok(false); }; let selected_blob_keys: Vec<FacetKey> = facet_keys .into_iter() .filter(|facet_key| facet_key.tag == WellKnownFacetTag::Blob.into()) .collect(); if selected_blob_keys.is_empty() { self.delete_doc(doc_id).await?; - return Ok(()); + return Ok(false); }; // ... - self.reindex_doc_hashes(doc_id, heads, &hashes).await + self.reindex_doc_hashes(doc_id, heads, &hashes).await?; + Ok(!hashes.is_empty()) }Also applies to: 154-165, 173-188, 197-203
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/index/doc_blobs.rs` around lines 140 - 148, The Upsert handling currently always emits DocBlobsIndexEvent::Updated even when reindex_doc() removed all blob rows; change the logic in the DocBlobsIndexWorkItem::Upsert match arm to examine reindex_doc()'s result (or make reindex_doc return an enum/boolean indicating "removed/empty" vs "present") and call registry.notify with DocBlobsIndexEvent::Deleted { doc_id } when the doc was evicted (or has zero blob memberships) and DocBlobsIndexEvent::Updated { doc_id } otherwise; apply the same pattern to the other Upsert-like paths mentioned (the blocks around the other occurrences) so subscribers receive Deleted events on complete eviction instead of only Updated.src/daybook_core/sync/full.rs-1027-1039 (1)
1027-1039:⚠️ Potential issue | 🟠 MajorSkip stale queued peer-session refreshes.
peer_sessions_to_refreshis filled asynchronously, so the peer can disconnect before this batch runs. The currentexpect(...)calls will panic the worker on an endpoint that has already been removed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/full.rs` around lines 1027 - 1039, batch_refresh_peer_sessions currently uses expect(...) on conn_by_peer and known_peer_set which can panic if peer_sessions_to_refresh contains stale endpoint_ids; change to gracefully skip entries that no longer exist by replacing the expect calls with safe lookups (e.g., conn_by_peer.get(&endpoint_id) and then known_peer_set.get_mut(conn_id)) and continue the loop when either lookup returns None, then only call refresh_peer_partition_session when both lookups succeed; keep ERROR_IMPOSSIBLE for truly impossible invariants elsewhere but do not use it here for stale async-driven queue items.src/daybook_core/sync/full.rs-1197-1203 (1)
1197-1203:⚠️ Potential issue | 🟠 MajorClear the pending samod state when a peer deletes a doc.
You increment
samod_pending_docswhen the first sync request for a(peer, doc)arrives, but theDocDeletedbranch only queuesdocs_to_stop. That leaves the entry behind insamod_doc_set, so the peer can stay permanently stale even though there is no sync work left.Also applies to: 1228-1243
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/full.rs` around lines 1197 - 1203, When handling DocDeleted (the branch that currently only pushes into docs_to_stop), also remove the (endpoint_id, doc_id) entry from the samod_doc_set and clear any pending counters: if self.conn_by_peer.get(&endpoint_id) yields a conn_id and self.known_peer_set.get_mut(&conn_id) yields peer_state, decrement peer_state.samod_pending_docs (use saturating_sub(1)) or otherwise adjust it to reflect the removal; then delete the key from self.samod_doc_set so the peer cannot remain stale. Ensure this mirrors the logic used where samod_pending_docs is incremented (the block referencing conn_by_peer, known_peer_set and samod_pending_docs) and call refresh_peer_fully_synced_state(endpoint_id).await? as appropriate.src/daybook_core/sync/full.rs-1297-1304 (1)
1297-1304:⚠️ Potential issue | 🟠 MajorTreat a missing local doc as a terminal sync condition, not a hard error.
RequestDocSynccomes from remote state and can race with local deletion. If the doc is gone by the time this boots,batch_boot_docs()errors out and the outer worker task unwrap takes down full sync instead of reusing the existinghandle_doc_missing_local()cleanup path.src/am_utils_rs/repo/partition.rs-424-430 (1)
424-430:⚠️ Potential issue | 🟠 MajorClamp event page sizes on the server.
req.limitis fully client-controlled and only floored at 1, so a single remote request can force very large SQL scans and allocations for every requested partition. This should be capped to a small server-side maximum before querying.Also applies to: 447-453
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/partition.rs` around lines 424 - 430, Clamp the client-provided page size before any work: replace the current unbounded floor of req.limit with a server-side capped value (e.g. compute a limit as req.limit.max(1).min(MAX_PAGE_LIMIT) and cast to usize) and use that capped limit when calling load_member_partition_page and in the other similar block (the second occurrence around load_member_partition_page call at lines ~447-453). Define or reuse a constant like MAX_PAGE_LIMIT in this module so the cap is obvious and apply the same capped limit variable (not raw req.limit) for allocations (Vec capacity) and all SQL/query calls.src/daybook_core/sync/full.rs-783-798 (1)
783-798:⚠️ Potential issue | 🟠 MajorDon't crash the worker if the snapshot caller disappears.
If the requester is cancelled after sending
GetPeerSyncSnapshot, thisexpect(...)will panic the full-sync worker. TheSetPeer/DelPeerbranches already treat a dropped caller as harmless; this branch should do the same.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/full.rs` around lines 783 - 798, The GetPeerSyncSnapshot handler currently calls resp.send(snapshot).expect(...), which will panic the full-sync worker if the requester was dropped; change it to treat a dropped caller as harmless like the SetPeer/DelPeer branches do by sending the snapshot and ignoring any send error (e.g., perform the send and discard the Result or check is_err() and do nothing). Update the Msg::GetPeerSyncSnapshot branch to not unwrap or expect the send result so the worker doesn't crash when the snapshot caller disappears.src/am_utils_rs/repo/changes.rs-410-428 (1)
410-428:⚠️ Potential issue | 🟠 MajorInvoke listeners after releasing the manager locks.
The switchboard currently holds
listeners/local_listenerswhile running arbitrary callbacks. A callback that registers, drops, or otherwise re-enters the manager can deadlock the notifier, and a slow callback blocks every other listener behind the same mutex.Also applies to: 430-449
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/changes.rs` around lines 410 - 428, The code is invoking listener callbacks while holding the listeners mutex, risking deadlocks and blocking; change the logic in the SwitchboardInput::Remote branch (and the similar block at 430-449) to first collect the callbacks and their corresponding notification vectors while the lock is held (e.g., build a Vec of (listener_on_change_clone, notifications) or clone the listener objects/closures and their relevant_notifications), then drop the listeners lock before iterating that collection to call each (listener.on_change)(notifications); ensure you still use notification_matches_filter and preserve listener.filter when building the collection so all work that touches self.listeners happens under the lock and only the actual callback invocation runs after the lock is released.src/daybook_core/sync/full.rs-1729-1733 (1)
1729-1733:⚠️ Potential issue | 🟠 MajorIgnore stale blob-backoff messages.
Blob workers report back over an async channel, so a backoff message can arrive after the blob was already marked synced or stopped during a partition refresh.
expect("blob backoff requested by non-active worker")turns that race into a panic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/full.rs` around lines 1729 - 1733, Replace the panic-causing remove/expect with a safe conditional: check self.active_blobs.remove(&hash) with if let Some(active) = ... { await active.stop_token.stop().await?; } else { return Ok(()); } (or otherwise early-return/ignore) so stale blob-backoff messages for non-active workers are ignored instead of panicking; keep using active.stop_token.stop().await? to propagate stop errors when present.src/am_utils_rs/repo/partition.rs-802-814 (1)
802-814:⚠️ Potential issue | 🟠 MajorMake txid allocation atomic.
This is a split read/modify/write against a shared counter. With pooled concurrent writers, two transactions can observe the same
next_txidand then race into a lost update or busy/snapshot failure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/partition.rs` around lines 802 - 814, The alloc_txid function performs a non-atomic read/modify/write on big_repo_meta (key META_NEXT_TXID_KEY) and must be made atomic; change alloc_txid to perform the increment and fetch inside a single atomic DB operation or a transaction with a write lock: either use a single SQL statement that increments and returns the new value (e.g. "UPDATE big_repo_meta SET value = CAST(value AS INTEGER)+1 WHERE key = ? RETURNING value" and subtract 1 to get the allocated id) or begin a transaction with an immediate write lock (BEGIN IMMEDIATE or sqlx transaction via conn.begin().await), then SELECT the current value, UPDATE it, and commit before returning the old value; update references in alloc_txid accordingly to use the transaction/RETURNING path and ensure errors roll back.src/daybook_core/sync/full.rs-126-158 (1)
126-158:⚠️ Potential issue | 🟠 MajorFix the current Clippy gate failure on
SyncProgressMsg.CI is already red here: Clippy flags
enum_variant_namesbecause every variant starts withBlob. Rename the variants or add a targeted allow if the prefix is intentional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/full.rs` around lines 126 - 158, Clippy flags the SyncProgressMsg enum for enum_variant_names because every variant is prefixed with "Blob"; either rename the variants to remove the redundant "Blob" prefix (e.g., BlobWorkerStarted -> WorkerStarted, BlobDownloadStarted -> DownloadStarted, etc.) or add a targeted allow attribute to suppress the lint (place #[allow(clippy::enum_variant_names)] directly above the SyncProgressMsg declaration) so CI stops failing; update any matching pattern usages (match arms, constructors) to the new variant names if you rename them.
🟡 Minor comments (8)
AGENTS.md-35-35 (1)
35-35:⚠️ Potential issue | 🟡 MinorFix two small wording issues in guidance text.
These read as typos and reduce clarity.
✏️ Suggested patch
- - Concealing channel errors hides lifecycle and liveness issues that are hard to diagnose in an actors based program like this. + - Concealing channel errors hides lifecycle and liveness issues that are hard to diagnose in an actor-based program like this. ... -> You're begged not to be lazy wth solutions. +> You're begged not to be lazy with solutions.Also applies to: 96-96
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` at line 35, Replace the phrasing "an actors based program" with the corrected hyphenated form "an actor-based program" in the AGENTS.md guidance text (update the occurrence at the line containing "Concealing channel errors hides lifecycle and liveness issues that are hard to diagnose in an actors based program like this." and the duplicate at the other location). Ensure both instances use "actor-based" (hyphenated, singular "actor") for clarity and consistency.src/am_utils_rs/repo/iroh.rs-308-311 (1)
308-311:⚠️ Potential issue | 🟡 MinorSame typos repeated: "absruptly" → "abruptly", "befor" → "before".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/iroh.rs` around lines 308 - 311, Fix the typos in the user-facing/error strings in the iroh connection-close path: change "absruptly" to "abruptly" in the close reason passed to connection.close and change "befor" to "before" in the formatted AcceptError message that includes endpoint_id (the block creating iroh::protocol::AcceptError::from_boxed and calling connection.close with the reason string). Ensure both strings are corrected to read "abruptly" and "before" respectively.src/am_utils_rs/repo/iroh.rs-264-268 (1)
264-268:⚠️ Potential issue | 🟡 MinorSame typos in accept handler: "absruptly" → "abruptly", "befor" → "before".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/iroh.rs` around lines 264 - 268, Fix the typos in the accept handler error messages: change "absruptly" to "abruptly" and "befor" to "before" inside the string literals used in the log and in the AcceptError construction (look for the message "samod connection stream ended absruptly..." and the formatted message "connection stream ended befor disconnect with {endpoint_id}" in the accept handler in iroh.rs). Ensure both the logged message and the AcceptError::from_boxed(...) string use the corrected spellings.src/am_utils_rs/repo/iroh.rs-108-108 (1)
108-108:⚠️ Potential issue | 🟡 MinorSame typo: "absruptly" → "abruptly".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/iroh.rs` at line 108, Fix the typo in the connection close message: in the conn.close(...) call (the b"samod connection stream ended absruptly" literal), change "absruptly" to "abruptly" so the byte string becomes b"samod connection stream ended abruptly".src/am_utils_rs/repo/iroh.rs-80-84 (1)
80-84:⚠️ Potential issue | 🟡 MinorFix typos in error messages.
Multiple typos in these error messages: "absruptly" → "abruptly", "befor" → "before".
📝 Proposed fix
- conn.close(500u32.into(), b"samod connection stream ended absruptly"); + conn.close(500u32.into(), b"samod connection stream ended abruptly"); eyre::bail!( - "connection stream ended befor disconnect" + "connection stream ended before disconnect" );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/iroh.rs` around lines 80 - 84, Fix the typos in the error strings around the connection close and bail calls: update the message passed to conn.close(500u32.into(), ...) to "samod connection stream ended abruptly" and change the eyre::bail! invocation to "connection stream ended before disconnect" so the strings in the conn.close call and eyre::bail! are spelled correctly; locate these in the iroh.rs function where conn.close and eyre::bail! are used together.src/am_utils_rs/repo/iroh.rs-115-115 (1)
115-115:⚠️ Potential issue | 🟡 MinorTypo: "atttempts" → "attempts".
📝 Proposed fix
- reason: last_reason.unwrap_or_else(|| "max re-connention atttempts reached".into()), + reason: last_reason.unwrap_or_else(|| "max reconnection attempts reached".into()),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/iroh.rs` at line 115, The error message string passed in last_reason.unwrap_or_else(|| "...".into()) contains a typo; update the fallback literal to read "max reconnection attempts reached" (or similar correct wording) so the code using last_reason.unwrap_or_else(|| "...".into()) uses the corrected phrase.src/am_utils_rs/repo/iroh.rs-129-132 (1)
129-132:⚠️ Potential issue | 🟡 MinorAvoid
.unwrap()on the spawned future; log errors instead.Same issue as in
node.rs- the spawned task will panic if an error occurs.🐛 Proposed fix
let join_handle = tokio::spawn( async { - fut.await.unwrap(); + if let Err(err) = fut.await { + warn!(?err, "iroh connector task exited with error"); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/iroh.rs` around lines 129 - 132, The spawned task in iroh.rs currently calls fut.await.unwrap() (inside the tokio::spawn block), which will panic on errors; replace the unwrap with proper error handling by awaiting the future and matching its Result (e.g., if let Err(e) / match fut.await) and logging the error via the project's logger/tracing (instead of panicking); keep the spawn and join_handle usage but handle Err(e) from fut.await and call error!/tracing::error or processLogger.error with contextual message and the error value.src/am_utils_rs/sync/node.rs-130-130 (1)
130-130:⚠️ Potential issue | 🟡 MinorAvoid
.unwrap()on the spawned future; log errors instead.The current code will panic if the worker loop returns an error. This could cause unexpected crashes in production.
🐛 Proposed fix
- let join_handle = tokio::spawn(async { fut.await.unwrap() }); + let join_handle = tokio::spawn(async { + if let Err(err) = fut.await { + warn!(?err, "SyncNodeWorker exited with error"); + } + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/node.rs` at line 130, The spawned task currently calls fut.await.unwrap() which will panic on error; change the tokio::spawn closure to await the future and handle the Result instead of unwrapping — e.g. replace tokio::spawn(async { fut.await.unwrap() }) with an async block that matches on fut.await (or uses if let Err(e) = fut.await) and logs the error via the crate's logger (e.g. tracing::error! or log::error!) including context about the worker/loop, then return/exit gracefully; keep the variable names join_handle, tokio::spawn, and fut.await in place so you edit the same spawn site in node.rs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 28c3fd47-7945-491f-bf23-6f2723d4ea25
⛔ Files ignored due to path filters (5)
Cargo.lockis excluded by!**/*.lockand included by**/*docs/DEVDOC/log.mdis excluded by!docs/DEVDOC/**and included by**/*docs/DEVDOC/todo.mdis excluded by!docs/DEVDOC/**and included by**/*flake.lockis excluded by!**/*.lockand included by**/*src/daybook_compose/composeApp/src/commonMain/kotlin/org/example/daybook/uniffi/daybook_ffi.ktis excluded by!src/daybook_compose/composeApp/src/commonMain/kotlin/org/example/daybook/uniffi/**and included by**/*
📒 Files selected for processing (72)
.github/workflows/builds.ymlAGENTS.mdCargo.tomlsrc/am_utils_rs/Cargo.tomlsrc/am_utils_rs/changes.rssrc/am_utils_rs/codecs.rssrc/am_utils_rs/iroh.rssrc/am_utils_rs/lib.rssrc/am_utils_rs/repo.rssrc/am_utils_rs/repo/changes.rssrc/am_utils_rs/repo/changes/broker.rssrc/am_utils_rs/repo/iroh.rssrc/am_utils_rs/repo/iroh/codec.rssrc/am_utils_rs/repo/partition.rssrc/am_utils_rs/sync.rssrc/am_utils_rs/sync/node.rssrc/am_utils_rs/sync/peer.rssrc/am_utils_rs/sync/protocol.rssrc/am_utils_rs/sync/store.rssrc/daybook_cli/Cargo.tomlsrc/daybook_cli/main.rssrc/daybook_core/Cargo.tomlsrc/daybook_core/app.rssrc/daybook_core/blobs.rssrc/daybook_core/build.rssrc/daybook_core/config.rssrc/daybook_core/drawer.rssrc/daybook_core/drawer/facet_recovery.rssrc/daybook_core/drawer/types.rssrc/daybook_core/e2e.rssrc/daybook_core/e2e/doc_created_wflow.rssrc/daybook_core/index.rssrc/daybook_core/index/doc_blobs.rssrc/daybook_core/index/facet_ref.rssrc/daybook_core/local_state.rssrc/daybook_core/plugs.rssrc/daybook_core/repo.rssrc/daybook_core/repos.rssrc/daybook_core/rt.rssrc/daybook_core/rt/dispatch.rssrc/daybook_core/rt/switch.rssrc/daybook_core/rt/triage.rssrc/daybook_core/rt/wash_plugin.rssrc/daybook_core/stores.rssrc/daybook_core/sync.rssrc/daybook_core/sync/bootstrap.rssrc/daybook_core/sync/full.rssrc/daybook_core/sync/full/blob_worker.rssrc/daybook_core/sync/full/doc_worker.rssrc/daybook_core/sync/stress.rssrc/daybook_core/sync/tests/stress.rssrc/daybook_core/tables.rssrc/daybook_core/tincans.rssrc/daybook_ffi/repos/blobs.rssrc/daybook_ffi/repos/config.rssrc/daybook_ffi/repos/dispatch.rssrc/daybook_ffi/repos/drawer.rssrc/daybook_ffi/repos/plugs.rssrc/daybook_ffi/repos/tables.rssrc/daybook_ffi/rt.rssrc/daybook_fuse/lib.rssrc/daybook_server/main.rssrc/daybook_types/Cargo.tomlsrc/daybook_types/doc.rssrc/daybook_types/lib.rssrc/daybook_types/test.rssrc/daybook_types/wit.rssrc/infra/ollama.tfsrc/utils_rs/lib.rssrc/utils_rs/testing.rssrc/xtask/Cargo.tomlsrc/xtask/main.rs
💤 Files with no reviewable changes (4)
- src/daybook_core/repos.rs
- src/am_utils_rs/iroh.rs
- src/daybook_core/local_state.rs
- src/am_utils_rs/changes.rs
| Ok(exists == 1) | ||
| } | ||
|
|
||
| pub async fn list_partitions_for_peer(&self, _peer: &PeerKey) -> Res<Vec<PartitionSummary>> { |
There was a problem hiding this comment.
Actually enforce peer scoping on this API surface.
peer is ignored everywhere here, and get_docs_full_in_partitions() trusts a caller-supplied partition list. As written, any connected peer can enumerate partition IDs, request member/doc pages for arbitrary partitions, subscribe to those partitions, and then fetch the underlying docs, which leaves the AccessDenied / DocAccessDenied paths effectively unused.
Also applies to: 419-423, 442-446, 465-469, 553-558
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/am_utils_rs/repo/partition.rs` at line 372, The API is ignoring the
supplied PeerKey (e.g. in list_partitions_for_peer) and trusting caller-supplied
partition lists (e.g. get_docs_full_in_partitions), allowing any peer to
enumerate, subscribe, and fetch arbitrary partitions; fix this by enforcing peer
scoping: in list_partitions_for_peer, filter and return only partitions that the
supplied PeerKey is a member of using the internal membership/index data
structures; in get_docs_full_in_partitions and the other affected methods (the
ones around the referenced ranges) validate each requested partition against
that same membership check and reject with AccessDenied/DocAccessDenied if the
peer is not authorized instead of silently using the list, and do not accept
caller-provided partition lists unchecked—derive allowed partitions server-side
and apply permission checks before returning summaries, pages, subscriptions, or
docs.
| fn object_paths(&self, hash: &str) -> Res<ObjectPaths> { | ||
| if !hash.is_ascii() || hash.len() < 4 { | ||
| eyre::bail!("invalid blob hash: {hash}"); | ||
| } | ||
| let Some(l0) = hash.get(0..2) else { | ||
| eyre::bail!("invalid blob hash: {hash}"); | ||
| }; | ||
| let Some(l1) = hash.get(2..4) else { | ||
| eyre::bail!("invalid blob hash: {hash}"); | ||
| }; | ||
| let dir = self.root.join("objects").join(l0).join(l1); | ||
| Ok(ObjectPaths { | ||
| blob: dir.join(format!("{hash}.blob")), | ||
| meta: dir.join(format!("{hash}.meta")), | ||
| dir, | ||
| }) |
There was a problem hiding this comment.
Reject non-hash input before building filesystem paths.
object_paths() only checks ASCII and length, then joins raw hash bytes into directories and filenames. A caller can smuggle separators or an absolute prefix through here, and get_path() / put_from_store() will read or create paths outside self.root.
🔒 Proposed guard
fn object_paths(&self, hash: &str) -> Res<ObjectPaths> {
- if !hash.is_ascii() || hash.len() < 4 {
- eyre::bail!("invalid blob hash: {hash}");
- }
+ daybook_hash_to_iroh_hash(hash)
+ .map_err(|_| eyre::eyre!("invalid blob hash: {hash}"))?;
let Some(l0) = hash.get(0..2) else {
eyre::bail!("invalid blob hash: {hash}");
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/daybook_core/blobs.rs` around lines 255 - 270, The object_paths function
currently trusts the raw hash and can be abused to escape self.root; validate
and canonicalize the hash before building paths by rejecting any hash containing
non-hex characters or path separators (e.g. '/', '\\'), disallowing '.' or any
characters that could form relative/absolute paths, and enforce the minimum
length check already present; update object_paths (and ensure callers like
get_path and put_from_store use it) to only accept hashes where
hash.chars().all(|c| c.is_ascii_hexdigit()) (and no separator/period characters)
so the returned ObjectPaths are guaranteed to be under self.root.
There was a problem hiding this comment.
Actionable comments posted: 14
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
src/daybook_core/config.rs (1)
273-281:⚠️ Potential issue | 🟠 MajorDon't panic on drifted replicated config state.
hydrate_path_at_heads(...).expect(ERROR_INVALID_PATCH)is consuming synced Automerge data, so a missing or reshaped remoteconfigpath can still crash the worker. Handle thatNonecase as a normal error or logged rejection instead of panicking on external input. Based on learnings, "When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!()."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/config.rs` around lines 273 - 281, The current call to hydrate_path_at_heads(...).await?.expect(ERROR_INVALID_PATCH) can panic if the replicated config path is missing or reshaped; replace the expect panic with explicit error handling that returns an eyre error (e.g., eyre::bail! or Err(eyre::eyre!(...))) or logs and returns a controlled error instead. Locate the chain off big_repo.hydrate_path_at_heads::<ConfigStore>(&self.app_doc_id, &heads, automerge::ROOT, vec![ConfigStore::prop().into()]) and handle the Option result from .await? by matching None to an eyre::bail! with a clear message referencing ERROR_INVALID_PATCH (or similar) so external/replicated input no longer causes a panic.src/daybook_core/repo.rs (1)
204-235:⚠️ Potential issue | 🟠 MajorStop
big_repoon any post-boot failure.Once
BigRepo::bootsucceeds, every later?here can exit beforebig_repo_stopis stored inRepoCtx. A failed connector spawn, globals init, init dance, or marker write will leave the repo running in the background afteropen_innerreturnsErr. Add a cleanup block that callsbig_repo_stop.stop().awaitbefore propagating those errors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/repo.rs` around lines 204 - 235, After booting BigRepo via am_utils_rs::BigRepo::boot you must ensure big_repo_stop is awaited on failure paths to avoid leaving the repo running; modify open_inner so that any early-returning ? after boot (e.g. errors from big_repo.spawn_ws_connector, init_from_globals, Self::run_repo_init_dance, mark_repo_initialized, etc.) first calls big_repo_stop.stop().await (or equivalent) before propagating the error, and ensure that on the successful path big_repo_stop is stored in RepoCtx as before; in short, add a cleanup/guard that calls big_repo_stop.stop().await on all error exits after BigRepo::boot but before RepoCtx takes ownership of big_repo_stop.src/daybook_core/plugs.rs (1)
903-915:⚠️ Potential issue | 🟠 MajorDon't panic on drifted replicated plug manifests.
This hydration is consuming synced manifest data, but
expect(ERROR_INVALID_PATCH)will still take down the plugs worker if a remote doc shape drifts or the manifest path disappears. Return or log a normal error here instead of panicking on external input. Based on learnings, "When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!()."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/plugs.rs` around lines 903 - 915, The code currently calls expect(ERROR_INVALID_PATCH) after hydrate_path_at_heads(...) which will panic if the manifest path is missing; replace that panic with an eyre::bail! (or return Err(eyre::eyre(...))) path: match the Option returned by hydrate_path_at_heads for None and return a descriptive error using eyre::bail! including context values (e.g., self.app_doc_id, id.clone(), heads.0) so the worker returns an error instead of panicking; make this change in the block that uses self.big_repo.hydrate_path_at_heads::<Versioned<ThroughJson<Arc<manifest::PlugManifest>>>> and remove expect(ERROR_INVALID_PATCH).src/daybook_cli/main.rs (2)
702-711:⚠️ Potential issue | 🔴 CriticalDon't finalize the clone before the full-sync pass succeeds.
mark_repo_initialized()now runs before the secondRepoCtx::open(...)/wait_for_full_sync(...)phase. If that later phase fails, the destination is left flagged as initialized even though it only contains the bootstrap docs, so subsequent CLI runs will open a partial clone as if it were healthy. Delay the marker or roll it back on failure.Also applies to: 714-789
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_cli/main.rs` around lines 702 - 711, The code marks the destination repo initialized (mark_repo_initialized) before the second clone/full-sync phase (RepoCtx::open and wait_for_full_sync) completes; move the call to daybook_core::repo::mark_repo_initialized(&destination).await? so it runs only after RepoCtx::open(...)/wait_for_full_sync(...) have succeeded, or if you prefer, wrap the full-sync phase in a try/catch and roll back the initialized marker on any failure (i.e., call an unmark/cleanup path) to ensure a failed full-sync doesn't leave the repo flagged as initialized; adjust the sequence around daybook_core::app::globals::set_init_state, RepoCtx::open, wait_for_full_sync, and big_repo_stop.stop accordingly.
1192-1206:⚠️ Potential issue | 🔴 Critical
exec ... --branchis inverted and can panic.The explicit-branch path currently errors when the branch exists, then Line 1206 blindly
unwrap()s the missing-branch lookup. That makes valid--branchvalues unusable and turns invalid ones into a process crash; resolve the lookup withok_or_else(...)(or negate the condition) and return an error instead.Possible fix
Some(val) => { - if branches.branches.contains_key(val) { + if !branches.branches.contains_key(val) { eyre::bail!("branch not found for doc: {doc_id} - {val}"); } daybook_types::doc::BranchPath::from(val.as_str()) } None => { let Some(branch) = branches.main_branch_path() else { eyre::bail!("no branches found on doc: {doc_id}"); }; branch } }; - let heads = branches.branches.get(&branch_path.to_string()).unwrap(); + let branch_key = branch_path.to_string(); + let heads = branches + .branches + .get(&branch_key) + .ok_or_else(|| eyre::eyre!("branch not found for doc: {doc_id} - {branch_key}"))?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_cli/main.rs` around lines 1192 - 1206, The branch existence check is inverted and the subsequent branches.branches.get(...).unwrap() can panic; in the match that builds branch_path (the block using branch: Some(val) => { if branches.branches.contains_key(val) { eyre::bail!(...) } ... }), flip the contains_key condition (or remove the negation) so it bails when the branch is missing, and replace the final branches.branches.get(&branch_path.to_string()).unwrap() with a safe lookup that returns an error (e.g., .get(...).ok_or_else(|| eyre::eyre!("branch not found for doc: {doc_id} - {}", branch_path)) ?) to return a proper eyre error instead of panicking; keep use of daybook_types::doc::BranchPath::from and branches.main_branch_path() as-is.src/daybook_core/tables.rs (1)
411-418:⚠️ Potential issue | 🔴 CriticalDon't
expect()remote table entities at final heads.
TablesRepo::notifs_loophydrates add/change events at the batch's finalheads. A remote batch can legitimately contain add/change plus delete for the same id, which makeshydrate_path_at_heads(...)returnNonehere and currently panics the notif worker; even returningErrwould still tear the task down because the spawn wrapper does.expect("error handling notifs"). Handle this as a recoverable skip/log path inside the loop instead of taking the repo down.Based on learnings: When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!().
Also applies to: 478-500, 511-531, 541-562, 572-593
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/tables.rs` around lines 411 - 418, The notif worker currently treats hydrate_path_at_heads(...) returning None as a fatal error (leading to panic via expect in the tokio::spawn wrapper), so update TablesRepo::notifs_loop to treat a None from hydrate_path_at_heads (or other remote-validation failures) as a recoverable case: log a warning with context (id, batch heads, source) and continue the loop instead of returning Err or panicking; when you need to propagate validation failures from external inputs use eyre::bail!() inside validation helpers but handle those errors in notifs_loop by logging and skipping the problematic notif rather than terminating the task; also remove the .expect("error handling notifs") in the spawn wrapper (worker_handle) and replace it with non-panicking error handling (log the error and exit the task gracefully) and apply the same change to the other notif-handling sites mentioned (the other notifs_loop invocation blocks).src/daybook_core/rt/dispatch.rs (1)
137-144:⚠️ Potential issue | 🔴 CriticalHandle remote dispatch churn without panicking.
This repo still has two panic paths on shared automerge input: updates on an existing dispatch hit
panic!("dispatch update detected"), and add/remove churn at the same finalheadsmakeshydrate_path_at_heads(...).expect(ERROR_INVALID_PATCH)blow up onDispatchAdded. These need to be handled as recoverable branches inside the loop; returningErralone still panics because the spawned worker wrapsnotifs_loopwith.expect("error handling notifs").Based on learnings: When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() to return errors rather than panicking (panic!()) or leaving unimplemented!().
Also applies to: 205-235, 267-309
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/rt/dispatch.rs` around lines 137 - 144, Replace panic! and unwrap/expect paths with eyre::bail!/error handling and avoid panicking in the spawned worker: in notifs_loop replace the panic!("dispatch update detected") (and any other panic!/unimplemented!()) with eyre::bail!() including context (e.g., "dispatch update detected for dispatch_id=..."), and change hydrate_path_at_heads(...).expect(ERROR_INVALID_PATCH) to propagate a Result (use ? or map_err(|e| eyre::eyre!("{}: {}", ERROR_INVALID_PATCH, e))) so validation failures return Err rather than panicking; finally, update the tokio::spawn wrapper around repo.notifs_loop (the place calling repo.notifs_loop(notif_rx, cancel_token).await.expect("error handling notifs")) to remove the .expect and handle the Result explicitly (log the eyre::Report on Err and exit the task cleanly) so returned errors are recoverable and do not cause a panic.
♻️ Duplicate comments (2)
src/daybook_core/blobs.rs (1)
349-374:⚠️ Potential issue | 🟠 MajorReuse the overwrite-safe rename logic in
atomic_copy_file().
put_path_copy()still does a preflighttry_exists()before calling this helper, so two writers of the same hash can race. The loser will hit the baretokio::fs::rename(&temp, dest)here, which still fails on Windows once the winner has createddest; give this helper the same replace/retry path asatomic_write().Does `tokio::fs::rename` overwrite an existing destination on Windows, or does it fail while Unix replaces atomically?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/blobs.rs` around lines 349 - 374, atomic_copy_file currently does a plain tokio::fs::rename(&temp, dest) which races on Windows when dest is created by another writer; change atomic_copy_file to use the same overwrite/retry strategy as atomic_write: attempt tokio::fs::rename(&temp, dest) inside a short retry loop, and if rename fails due to the destination already existing (or a platform-specific error), remove the existing dest (tokio::fs::remove_file(dest).await?), then retry rename once (ensuring errors are propagated if retry fails); make sure to still sync the parent directory via self.sync_dir(dir).await and clean up the temp file on unrecoverable errors — reference atomic_copy_file, atomic_write, and put_path_copy to align behavior.src/am_utils_rs/repo/partition.rs (1)
375-376:⚠️ Potential issue | 🔴 CriticalEnforce peer scoping before partition list/read/subscribe operations.
_peeris ignored, and request paths trust caller-provided partition IDs. This leaves list/events/subscription/doc retrieval open to unauthorized partition access unless all callers pre-filter correctly.🔐 Suggested hardening pattern
- pub async fn list_partitions_for_peer(&self, _peer: &PeerKey) -> Res<Vec<PartitionSummary>> { + pub async fn list_partitions_for_peer(&self, peer: &PeerKey) -> Res<Vec<PartitionSummary>> { + let allowed = self.authorized_partitions_for_peer(peer).await?; + if allowed.is_empty() { + return Ok(Vec::new()); + } // existing query filtered by allowed set } @@ - pub async fn get_partition_member_events_for_peer( - &self, - _peer: &PeerKey, + pub async fn get_partition_member_events_for_peer( + &self, + peer: &PeerKey, req: &GetPartitionMemberEventsRequest, ) -> Res<GetPartitionMemberEventsResponse> { + self.assert_peer_allowed_partitions(peer, req.partitions.iter().map(|p| &p.partition_id)).await?; ... } @@ - pub async fn get_partition_doc_events_for_peer( - &self, - _peer: &PeerKey, + pub async fn get_partition_doc_events_for_peer( + &self, + peer: &PeerKey, req: &GetPartitionDocEventsRequest, ) -> Res<GetPartitionDocEventsResponse> { + self.assert_peer_allowed_partitions(peer, req.partitions.iter().map(|p| &p.partition_id)).await?; ... } @@ pub async fn subscribe_partition_events_for_peer( &self, peer: &PeerKey, reqs: &SubPartitionsRequest, capacity: usize, ) -> Res<tokio::sync::mpsc::Receiver<SubscriptionItem>> { + self.assert_peer_allowed_partitions(peer, reqs.partitions.iter().map(|p| &p.partition_id)).await?; ... }Also applies to: 423-467, 582-610
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/partition.rs` around lines 375 - 376, list_partitions_for_peer currently ignores the _peer argument and trusts caller-supplied partition IDs; update this and the other affected methods (e.g., list_partitions_for_peer, any public list/read/subscribe methods that accept a PeerKey or partition IDs) to enforce peer scoping by using the PeerKey to fetch the set of partitions the peer is allowed to access and filtering incoming requests against that set before returning PartitionSummary or performing reads/subscriptions; specifically, locate the function list_partitions_for_peer and replace the ignored _peer usage with a lookup of allowed partition IDs (via your existing ownership/permission store or an added helper), filter the partitions you would return or operate on to only those owned/allowed for that PeerKey, and return an empty/permission-denied result for any requests that would otherwise access partitions outside that allowed set.
🟡 Minor comments (11)
src/wflow_webui/Cargo.toml-63-63 (1)
63-63:⚠️ Potential issue | 🟡 MinorAdd
--cfg tokio_unstableto enable the tracing instrumentation.Line 63 correctly adds the
tracingfeature totokio, and it is compatible withrt-multi-thread. However, thetracinginstrumentation in Tokio is experimental and requires compiling with--cfg tokio_unstableto actually emit runtime/task events. Without this configuration, the feature flag will be present but non-functional. Ensure this cfg is set in your build configuration (e.g., in.cargo/config.tomlor via RUSTFLAGS).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/wflow_webui/Cargo.toml` at line 63, The tokio dependency in Cargo.toml enables the "tracing" feature but requires compiling with the tokio_unstable cfg to emit runtime/task tracing; update the build config to pass --cfg tokio_unstable (for example by adding the flag in .cargo/config.toml or exporting RUSTFLAGS) so the tracing instrumentation for the tokio = { workspace = true, features = ["rt-multi-thread", "tracing"], optional = true } dependency is actually enabled at compile time.src/daybook_ffi/Cargo.toml-50-50 (1)
50-50:⚠️ Potential issue | 🟡 MinorTokio's
tracingfeature is unstable; ensure--cfg tokio_unstableis configured.The
tracingfeature in Tokio is an unstable feature that requires--cfg tokio_unstableto work. The workspace has this configured in.cargo/config.tomlunder[build], so the change is compatible. This feature enables Tokio runtime instrumentation/events for tools liketokio-console, which aligns with the existingtracing/tracing-subscriberdependencies in this file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_ffi/Cargo.toml` at line 50, The tokio dependency in Cargo.toml enables the unstable "tracing" feature (the line with tokio = { workspace = true, features = ["rt-multi-thread","sync", "macros", "fs", "tracing"] }), which requires the compiler cfg flag; ensure the build is configured to pass --cfg tokio_unstable (add or verify in .cargo/config.toml under [build] that rustflags includes "--cfg" "tokio_unstable"), or alternatively gate the tracing feature behind a stable feature flag and only enable it when the cfg is set; update config or gating so building with the tokio "tracing" feature succeeds.src/mltools/Cargo.toml-37-37 (1)
37-37:⚠️ Potential issue | 🟡 MinorNote the unstable requirement for tokio's
tracingfeature on Line 37.Adding
tokio'stracingfeature is syntactically correct, but be aware that this feature is experimental in Tokio 1.x and requires compiling with--cfg tokio_unstablefor the tracing integration to actually be enabled. Without that compiler flag, specifying"tracing"in Cargo.toml alone won't activate the feature—it will be silently ignored due to Tokio's conditional dependency configuration.Verify that your build setup or CI pipeline includes this flag if you intend to use Tokio's tracing instrumentation (e.g., for
tokio-console).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mltools/Cargo.toml` at line 37, The Cargo.toml declares tokio = { workspace = true, features = ["rt", "tracing"] } but Tokio's "tracing" feature is unstable and requires compiling with --cfg tokio_unstable to actually enable it; either ensure your build/CI adds the compiler flag (e.g., RUSTFLAGS="--cfg tokio_unstable" or pass --cfg tokio_unstable to rustc in your pipeline) when building/running code that relies on tokio's tracing integration (tokio feature "tracing"), or remove the "tracing" feature from the tokio entry to avoid a silent no-op if you don't intend to enable the unstable tracing support.src/daybook_core/sync/stress.rs-70-83 (1)
70-83:⚠️ Potential issue | 🟡 MinorUse unique event indices in transfer phase.
At Line 82, every transfer-phase event uses the same
idx(EVENT_COUNT), which reduces mutation diversity (e.g., repeated"branch-mod-64"writes).🎯 Suggested fix
-for _ in 0..(EVENT_COUNT / 2) { +for transfer_idx in 0..(EVENT_COUNT / 2) { @@ - let _ = apply_event(node, kind, EVENT_COUNT, &mut rng).await?; + let _ = apply_event(node, kind, EVENT_COUNT + transfer_idx, &mut rng).await?; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/stress.rs` around lines 70 - 83, The transfer-phase loop currently calls apply_event(node, kind, EVENT_COUNT, &mut rng) which passes the same EVENT_COUNT index for every event; change this to generate or derive a unique event index per iteration (e.g., use the loop index, a counter, or rng.gen_range(0..EVENT_COUNT)) and pass that variable instead of EVENT_COUNT so each transfer uses a distinct idx; update the loop in src/daybook_core/sync/stress.rs where active, node, kind are chosen to compute let idx = ... and call apply_event(node, kind, idx, &mut rng).AGENTS.md-112-113 (1)
112-113:⚠️ Potential issue | 🟡 MinorTypo: "mantain" → "maintain".
✏️ Proposed fix
-Human auditors of this repo don't really read test code or mantain it. +Human auditors of this repo don't really read test code or maintain it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` around lines 112 - 113, Fix the spelling typo in the AGENTS.md sentence: change "mantain" to "maintain" in the line that reads "Human auditors of this repo don't really read test code or mantain it." — update that string in AGENTS.md so it reads "maintain" to correct the typo.docs/dict.md-19-19 (1)
19-19:⚠️ Potential issue | 🟡 MinorTypo: "peices" → "pieces".
✏️ Proposed fix
-Documents are mainly made up of facets which are JSON objects describing the different peices of the document. +Documents are mainly made up of facets which are JSON objects describing the different pieces of the document.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/dict.md` at line 19, Replace the typo "peices" with the correct spelling "pieces" in the sentence "Documents are mainly made up of facets which are JSON objects describing the different peices of the document." inside docs/dict.md so it reads "...describing the different pieces of the document."docs/dict.md-94-94 (1)
94-94:⚠️ Potential issue | 🟡 MinorTypo: "refefrence" → "reference".
✏️ Proposed fix
-If the heads set is empty, it's assumed that the referred to facet exists at the same change hash of the facet holding the refefrence. +If the heads set is empty, it's assumed that the referred-to facet exists at the same change hash of the facet holding the reference.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/dict.md` at line 94, Fix the typo "refefrence" to "reference" in the sentence that reads "If the heads set is empty, it's assumed that the referred to facet exists at the same change hash of the facet holding the refefrence." Update the word near the phrase "facet holding the reference" so the sentence reads correctly and keep surrounding punctuation and capitalization unchanged.docs/dict.md-10-11 (1)
10-11:⚠️ Potential issue | 🟡 MinorMinor grammar issues.
Line 10: "weather" → "whether"
Line 11: Sentence appears incomplete: "whenever a working connection between devices" — consider "whenever there is a working connection between devices" or similar.✏️ Proposed fix
-Changes can be made on any replicas of the same repo, weather it is to add a new document or to modify an existing one. -Daybook ensures then that these changes are synced to other replicas whenever a working connection between devices. +Changes can be made on any replicas of the same repo, whether it is to add a new document or to modify an existing one. +Daybook ensures that these changes are synced to other replicas whenever there is a working connection between devices.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/dict.md` around lines 10 - 11, Fix the two grammar issues in the two sentences shown: change "weather" to "whether" and adjust word order and completeness — e.g., rewrite the first sentence to "Changes can be made on any replica of the same repo, whether it is to add a new document or to modify an existing one." and rewrite the second sentence to "Daybook then ensures that these changes are synced to other replicas whenever there is a working connection between devices." Update the exact phrases in the doc to match this corrected wording.src/am_utils_rs/repo/iroh.rs-145-148 (1)
145-148:⚠️ Potential issue | 🟡 MinorPotential panic in spawned task hides error context.
The
.unwrap()onfut.awaitwill panic if the connection handling fails, potentially crashing the task without useful error context. Consider logging the error or using.unwrap_or_log()for consistency with other error handling in this codebase.Suggested fix
let join_handle = tokio::spawn( async { - fut.await.unwrap(); + fut.await.unwrap_or_log(); } .instrument(tracing::info_span!(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/iroh.rs` around lines 145 - 148, The spawned task currently calls fut.await.unwrap() inside tokio::spawn (creating join_handle), which can panic and lose error context; change the task to await the future and handle Err by logging the error (e.g., use fut.await.map_err(|e| { /* log with processLogger/tracing/error! */ }) or match on fut.await and call a logger) instead of unwrap so failures are recorded; update the closure used in tokio::spawn (the block that references fut and produces join_handle) to log the error with the repository's standard logger/unwrap_or_log helper and avoid panicking.src/am_utils_rs/sync/store.rs-163-163 (1)
163-163:⚠️ Potential issue | 🟡 MinorSpawned task panics on error.
Similar to the iroh.rs issue, the
.unwrap()on the spawned future will panic without context if an error occurs. Consider using.unwrap_or_log()for better observability.Suggested fix
- let join_handle = tokio::spawn(async { fut.await.unwrap() }); + let join_handle = tokio::spawn(async { fut.await.unwrap_or_log() });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/store.rs` at line 163, The spawned task currently calls fut.await.unwrap() inside tokio::spawn which will panic without context; update the spawn closure to handle errors instead of unwrap by calling fut.await.unwrap_or_log() (or otherwise match/map the Result and log the error) so the spawned task logs failures rather than panics—look for the tokio::spawn invocation that assigns join_handle and replace the inner fut.await.unwrap() with fut.await.unwrap_or_log() or an explicit error logging branch.src/daybook_core/plugs.rs-1723-1747 (1)
1723-1747:⚠️ Potential issue | 🟡 Minor
setup_repofixture leaks background repo workers.This fixture drops both
BigRepoStopTokenandRepoStopTokenwithout callingstop(). Neither type has aDropimplementation, so cleanup only occurs ifstop()is explicitly called. The background workers spawned byBigRepo::boot()andPlugsRepo::load()will continue running after the fixture returns, affecting all 12+ tests that usesetup_repo.Return both stop tokens from the fixture so tests can properly shut them down:
async fn setup_repo() -> Res<(SharedBigRepo, Arc<PlugsRepo>, DocumentId, tempfile::TempDir, BigRepoStopToken, RepoStopToken)>Then update each test to call
.stop()on the tokens before completion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/plugs.rs` around lines 1723 - 1747, The setup_repo fixture currently drops background stop tokens returned by BigRepo::boot and PlugsRepo::load, leaking workers; change setup_repo's signature to return the stop tokens (e.g. add BigRepoStopToken and RepoStopToken to the returned tuple) by capturing the second values from BigRepo::boot(...) and PlugsRepo::load(...), and update all tests that call setup_repo to receive those tokens and call .stop() on each token before test completion (use the BigRepoStopToken.stop() and RepoStopToken.stop() methods). Ensure the returned tuple ordering and types (SharedBigRepo, Arc<PlugsRepo>, DocumentId, tempfile::TempDir, BigRepoStopToken, RepoStopToken) match callers.
🧹 Nitpick comments (13)
src/daybook_core/drawer/facet_recovery.rs (1)
9-15: Add targeted tests forrecover_facet_heads_atand drift/error branches.The new
read_headsbranch (get_at+length_at) is currently unexercised in this file’s tests. Please add:
- a historical-heads test for
recover_facet_heads_at, and- at least one malformed/external-drift shape test asserting an
Errviabail!().✅ Suggested test additions
#[cfg(test)] mod tests { use super::*; @@ fn test_recover_facet_heads_merge() -> Res<()> { @@ Ok(()) } + + #[test] + fn test_recover_facet_heads_at_historical_heads() -> Res<()> { + let mut doc = Automerge::new(); + let facet_key = FacetKey::from(WellKnownFacetTag::Note); + let facet_key_str = facet_key.to_string(); + let dmeta_key = format!("{}/main", WellKnownFacetTag::Dmeta.as_str()); + + let mut tx = doc.transaction(); + let facets_id = tx.put_object(automerge::ROOT, "facets", ObjType::Map)?; + let dmeta_id = tx.put_object(&facets_id, &dmeta_key, ObjType::Map)?; + let dmeta_facets_id = tx.put_object(&dmeta_id, "facets", ObjType::Map)?; + let facet_meta_id = tx.put_object(&dmeta_facets_id, &facet_key_str, ObjType::Map)?; + let updated_at_id = tx.put_object(&facet_meta_id, "updatedAt", ObjType::List)?; + tx.insert(&updated_at_id, 0, 1000i64)?; + let first_hash = tx.commit().0.expect("commit should produce hash"); + + let historical_heads = doc.get_heads(); + + let mut tx2 = doc.transaction(); + let facets = tx2.get(automerge::ROOT, "facets")?.unwrap().1; + let dmeta = tx2.get(&facets, &dmeta_key)?.unwrap().1; + let dmeta_facets = tx2.get(&dmeta, "facets")?.unwrap().1; + let facet_meta = tx2.get(&dmeta_facets, &facet_key_str)?.unwrap().1; + let updated_at = tx2.get(&facet_meta, "updatedAt")?.unwrap().1; + tx2.insert(&updated_at, 1, 1001i64)?; + tx2.commit(); + + let heads_at_old = recover_facet_heads_at(&doc, &facet_key, &historical_heads)?; + assert_eq!(heads_at_old, vec![first_hash]); + Ok(()) + } + + #[test] + fn test_recover_facet_heads_drifted_shape_returns_err() { + let doc = Automerge::new(); + let facet_key = FacetKey::from(WellKnownFacetTag::Note); + let res = recover_facet_heads(&doc, &facet_key); + assert!(res.is_err()); + } }Based on learnings: “When validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), use eyre::bail!() …; consider adding unit tests that simulate external input drift and ensure bail!() paths are exercised.” and as per coding guidelines: “When working with Rust, use small tests in addition to
cargo clippyto validate ideas.”Also applies to: 50-53
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/drawer/facet_recovery.rs` around lines 9 - 15, The test suite lacks coverage for the new read_heads branch used by recover_facet_heads_at (which calls recover_facet_heads_inner with Some(heads)) and the external-drift/error path that should bail!(); add two unit tests: one "historical-heads" test that constructs an Automerge doc and facet metadata to exercise the get_at + length_at branch via recover_facet_heads_at and asserts the expected Vec<ChangeHash> result, and one "malformed/external-drift" test that feeds a deliberately malformed dmeta/Automerge shape to recover_facet_heads_at (or recover_facet_heads_inner via same call pattern) and asserts that the call returns an Err (triggering eyre::bail!()); use the function names recover_facet_heads_at, recover_facet_heads_inner, and the read_heads logic (get_at and length_at) to locate where to construct inputs and assert error handling.src/daybook_core/sync/stress.rs (1)
110-133: Add a focused invariant test for topology generation.
generate_connected_edgesis foundational for this stress suite. A small deterministic test (fixed seeds) for connectedness, canonical edge ordering, and no self-edges would make failures easier to localize.As per coding guidelines: “When working with Rust, use small tests in addition to
cargo clippyto validate ideas.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/stress.rs` around lines 110 - 133, Add a small deterministic unit test for the generate_connected_edges function that uses fixed StdRng seeds to validate the topology invariants: ensure the returned Vec<(usize, usize)> contains no self-edges (a != b), each tuple is canonical (first < second consistent with canon_edge), and the undirected graph is connected given NODE_COUNT (use a simple BFS/DFS on the edges to verify all nodes reachable). Run the test for a couple of different fixed seeds to catch RNG-dependent bugs and place it near other tests for sync/stress; reference generate_connected_edges, canon_edge and NODE_COUNT to locate the code.src/am_utils_rs/Cargo.toml (1)
49-56: Consider workspace-level version management for consistency.
camino,irpc,tokio-tungstenite, andfutures-bufferedare pinned directly here rather than via workspace. If these are used across multiple crates, consider promoting them to workspace dependencies for easier version management.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/Cargo.toml` around lines 49 - 56, Several dependencies (camino, irpc, tokio-tungstenite, futures-buffered) are pinned locally instead of managed at workspace level; move their version declarations to the workspace Cargo.toml and mark them as workspace dependencies here (or remove the explicit version and use workspace = true) so all crates share a single source of truth; specifically add entries for camino = "1.2.2", irpc = "0.12.0" (with features = ["derive"] if needed), tokio-tungstenite and futures-buffered into the workspace [dependencies] (or [patch.crates-io] if pinning), then update this crate’s Cargo.toml to reference them via workspace = true or make them optional here as needed (adjust features like "sqlite"/"runtime-tokio" only where required).src/am_utils_rs/sync/protocol.rs (2)
36-42: Redundanthas_morefield inPartitionCursorPage.The FIXME on line 40 correctly identifies that
has_moremay be redundant whennext_cursor: Option<CursorIndex>already indicates whether more data exists. Consider removinghas_moreto simplify the API, unless there's a specific use case wherenext_cursorisSomebuthas_moreisfalse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/protocol.rs` around lines 36 - 42, The PartitionCursorPage struct contains a redundant has_more boolean because next_cursor: Option<CursorIndex> already expresses whether there is a next page; remove the has_more field from the struct definition (PartitionCursorPage) and update any serialization/deserialization, constructors, uses, and tests that reference has_more to rely solely on next_cursor.is_some() (or equivalent) instead; if there are places that previously set has_more independently, replace those with has_more = next_cursor.is_some() logic or remove the parameter entirely to keep the API consistent.
5-9: Acknowledged FIXME: Consider efficiency improvements for frequently cloned types.The FIXME note about using
Arc<str>or[u8; 32]forPartitionIdandPeerKeyis valid. These types are cloned frequently in RPC paths. Consider addressing this in a follow-up to reduce allocation overhead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/protocol.rs` around lines 5 - 9, The PartitionId and PeerKey type aliases are currently String and are cloned frequently; change them to a cheaper, shareable representation such as Arc<str> (or a fixed-size [u8; 32] newtype if keys are fixed-length) to reduce allocation overhead: replace pub type PartitionId = String; and pub type PeerKey = String; with pub type PartitionId = Arc<str>; pub type PeerKey = Arc<str>; (or define a Key([u8;32]) newtype and use that) and then update all code paths that construct or clone these types to use Arc::from(s) or Key::from_bytes, add From<&str>/From<String> impls and adjust serde/serialization derives (or implement Serialize/Deserialize) so conversions remain ergonomic and cloning becomes cheap for CursorIndex leave as u64.src/daybook_core/index/doc_blobs.rs (2)
442-452: Test timeout of 60 seconds may be excessive for CI.The
wait_for_hashhelper uses a 60-second timeout. While this is generous for reliability, it could slow down test failure feedback. Consider a shorter timeout (e.g., 10-15 seconds) with the option to increase via environment variable for slow CI environments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/index/doc_blobs.rs` around lines 442 - 452, The wait_for_hash helper uses a hardcoded 60s deadline which slows CI; change the timeout in wait_for_hash (function) from 60s to a shorter default (e.g., 10–15s) and make it configurable via an environment variable (e.g., DOCBLOBS_WAIT_TIMEOUT_SECS) so DocBlobsIndexRepo tests can override in slow CI; update the deadline construction and document that list_hashes_for_doc polling still uses the same 50ms sleep interval and ensure the env value is parsed safely with a fallback to the new default.
202-233: Potential inefficiency: DELETE followed by INSERT for reindex.The
reindex_doc_hashesmethod deletes all existing entries for a doc and re-inserts them. For documents with many blob references where only one changes, this could be inefficient. However, for the expected use case (small number of blobs per doc), this simplicity is acceptable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/index/doc_blobs.rs` around lines 202 - 233, The reindex_doc_hashes function currently deletes all rows for a doc then re-inserts every hash which is inefficient for docs where only a subset of blob refs changed; instead, query existing blob_hashes for the given doc_id from doc_blob_refs, compute the set difference against the provided hashes to derive to_delete and to_insert, execute a single DELETE ... WHERE doc_id = ? AND blob_hash IN (...) for to_delete, and use QueryBuilder to insert only to_insert (and update origin_heads for rows that remain if heads changed); keep references to reindex_doc_hashes, doc_blob_refs, QueryBuilder, hashes, heads and serialized_heads when locating and changing the logic.src/daybook_server/main.rs (2)
102-103: Unused channel sender.
_doc_setup_req_txis created but never used (underscore prefix indicates intentional non-use). If the doc setup request flow is no longer needed, consider removing the channel entirely to reduce code complexity.Suggested cleanup
- let (_doc_setup_req_tx, mut doc_setup_req_rx) = - mpsc::unbounded_channel::<DocSetupRequest>(); + let (doc_setup_req_tx, mut doc_setup_req_rx) = + mpsc::unbounded_channel::<DocSetupRequest>();Or if the entire doc_setup flow is deprecated, remove the channel, worker, and related code.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_server/main.rs` around lines 102 - 103, The sender variable _doc_setup_req_tx is created but never used; either remove the entire doc_setup channel and related flow or wire it up: delete the let (_doc_setup_req_tx, mut doc_setup_req_rx) = mpsc::unbounded_channel::<DocSetupRequest>(); line and any code that references doc_setup_req_rx/DocSetupRequest if the doc setup flow is deprecated, or keep the channel but use the sender (rename to doc_setup_req_tx) where DocSetupRequest messages should be sent (e.g., from the component that initiates doc setup) and ensure receivers (doc_setup_req_rx) are polled; update related worker/task creation and imports accordingly.
104-111: BigRepo boot uses hardcoded path.The storage path
/tmp/samod-syncis hardcoded. Consider making this configurable via environment variable or config for production deployments.Suggested improvement
let (big_repo, _big_repo_stop) = am_utils_rs::BigRepo::boot(am_utils_rs::repo::Config { peer_id: "daybook_sync".to_string(), storage: am_utils_rs::repo::StorageConfig::Disk { - path: "/tmp/samod-sync".into(), + path: utils_rs::get_env_var("SAMOD_STORAGE_PATH") + .unwrap_or_else(|_| "/tmp/samod-sync".to_string()) + .into(), big_repo_sqlite_url: None, }, })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_server/main.rs` around lines 104 - 111, The BigRepo boot call currently hardcodes the storage path "/tmp/samod-sync" in the am_utils_rs::repo::StorageConfig::Disk config; change this to read from a configurable source (e.g., an environment variable like DAYBOOK_SYNC_PATH or a config struct) and fall back to "/tmp/samod-sync" if not set, then pass that variable into the path field in the BigRepo::boot invocation so deployments can override the storage location without code changes.src/am_utils_rs/repo/iroh.rs (1)
269-271: Consider propagating error instead of panicking on acceptor creation.The
.expect("error making acceptor")will panic if acceptor creation fails. Since this involves external peer connections, returning an error viaAcceptErrorwould be more robust and consistent with the error handling pattern used elsewhere in this function.Suggested fix
let acceptor = repo .make_acceptor( Url::parse(&format!( "{CONN_URL_SCHEME}:{}", utils_rs::hash::encode_base58_multibase(endpoint_id) )) .expect(ERROR_IMPOSSIBLE), ) - .expect("error making acceptor"); + .map_err(|err| { + let msg = format!("error making acceptor for {endpoint_id}: {err:?}"); + connection.close(500u32.into(), msg.as_bytes()); + iroh::protocol::AcceptError::from_boxed(Box::from(msg)) + })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/iroh.rs` around lines 269 - 271, Replace the panic-causing .expect("error making acceptor") with proper error propagation: change the acceptor creation call that currently chains .expect(ERROR_IMPOSSIBLE) and .expect("error making acceptor") to return a Result and propagate errors into the function's AcceptError path (e.g., use ? or .map_err(|e| AcceptError::new(...)) to convert the underlying error into AcceptError). Keep ERROR_IMPOSSIBLE logic as needed but ensure the final failure is returned rather than panicked so callers receive an AcceptError instead of a panic.src/am_utils_rs/sync/store.rs (1)
173-197: Schema lacks foreign key constraint.The
sync_partition_cursorstable referencespeer_keybut doesn't have a foreign key constraint tosync_known_peers. This is intentional based on the comment on line 220-221 (cursors are preserved across unregister/register), but could lead to orphaned cursor rows if peers are never re-registered. Consider adding a cleanup mechanism or documenting this behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/store.rs` around lines 173 - 197, The schema in ensure_schema defines sync_partition_cursors referencing peer_key but lacks a foreign key to sync_known_peers, which can leave orphaned cursor rows; fix by adding a cleanup path instead of changing current preservation behavior: implement a function (e.g., remove_peer_cursors or cleanup_orphaned_cursors) that deletes rows from sync_partition_cursors for a given peer_key (or deletes rows whose peer_key is not present in sync_known_peers) and call it from the peer-unregister flow (where peers are removed) or wire it into a periodic maintenance task; keep ensure_schema and table definitions unchanged but ensure unregister_peer (or equivalent) invokes the new cleanup routine or document this behavior clearly.src/am_utils_rs/repo/changes.rs (1)
196-264: De-duplicate broker spawn logic inadd_doc_listener.Both map-entry branches repeat nearly identical broker/lease creation code. Extracting a shared helper will reduce drift risk.
♻️ Refactor sketch
+ fn build_doc_broker_lease( + &self, + handle: DocHandle, + ) -> Res<Arc<DocChangeBrokerLease>> { + let (broker, stop_token) = broker::spawn_doc_listener( + handle, + self.cancel_token.child_token(), + self.change_tx.clone(), + self.head_tx.clone(), + Arc::new({ + let listeners = Arc::clone(&self.listeners); + move |doc_id, origin| { + let listeners = listeners.lock().expect(ERROR_MUTEX); + listeners.iter().any(|listener| { + let doc_ok = listener + .filter + .doc_id + .as_ref() + .map(|target| target.doc_id == *doc_id) + .unwrap_or(true); + let origin_ok = listener + .filter + .origin + .as_ref() + .map(|target| origin_matches_filter(origin, *target)) + .unwrap_or(true); + doc_ok && origin_ok + }) + } + }), + )?; + Ok(Arc::new(DocChangeBrokerLease { + handle: Arc::new(broker), + stop_token: std::sync::Mutex::new(Some(stop_token)), + })) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/repo/changes.rs` around lines 196 - 264, The two branches in add_doc_listener duplicate broker and lease creation; extract that logic into a helper (e.g., spawn_doc_change_lease or make_doc_broker_lease) that takes the shared inputs (handle, self.cancel_token.child_token(), self.change_tx.clone(), self.head_tx.clone(), Arc::clone(&self.listeners)) and returns Result<Arc<DocChangeBrokerLease>, Error> so both Entry::Occupied and Entry::Vacant call it, insert Arc::downgrade(&lease) into the map, and return Ok(lease); ensure the helper builds the same closure (locking listeners with ERROR_MUTEX), forwards the ? from broker::spawn_doc_listener, wraps broker and stop_token into DocChangeBrokerLease with Arc/Mutex as before, and preserves ownership/cloning semantics.src/daybook_core/sync/full.rs (1)
1660-1679: Avoid retry/backoff churn when there are zero blob peers.At Line 1662, retry state is advanced even though no peer can satisfy the download. Consider deferring scheduling until peer availability changes, instead of burning retry budget.
♻️ Suggested simplification
if peers.is_empty() { - let prior = prior_pending.unwrap_or(PendingBlobSyncState { - attempt_no: 0, - last_backoff: Duration::from_millis(0), - last_attempt_at: std::time::Instant::now(), - due_at: std::time::Instant::now(), - }); - let delay = next_backoff_delay(prior.last_backoff, Duration::from_millis(500)); - self.pending_blobs.insert( - hash.clone(), - PendingBlobSyncState { - attempt_no: prior.attempt_no + 1, - last_backoff: delay, - last_attempt_at: std::time::Instant::now(), - due_at: std::time::Instant::now() + delay, - }, - ); + self.pending_blobs.entry(hash.clone()).or_insert_with(|| PendingBlobSyncState { + attempt_no: 0, + last_backoff: Duration::from_millis(0), + last_attempt_at: std::time::Instant::now(), + due_at: std::time::Instant::now(), + }); } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/full.rs` around lines 1660 - 1679, The code currently advances retry/backoff when peers.is_empty() by inserting a new PendingBlobSyncState (using prior_pending.unwrap_or and next_backoff_delay) which burns retry budget even though no peers can satisfy the download; change the branch so that when peers.is_empty() you do not advance or replace pending state—i.e., do not call self.pending_blobs.insert(...) and instead leave prior_pending untouched (or skip scheduling entirely) so retries are only advanced when there are actual peers; use the existing symbols peers, prior_pending, PendingBlobSyncState, next_backoff_delay, pending_blobs, and hash to locate and implement this guard.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 71dec451-8b05-425c-80be-af38191d3fab
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockand included by**/*docs/DEVDOC/todo.mdis excluded by!docs/DEVDOC/**and included by**/*
📒 Files selected for processing (50)
AGENTS.mdCONTRIBUTING.mdREADME.mddocs/README.mddocs/dict.mdsrc/am_utils_rs/Cargo.tomlsrc/am_utils_rs/repo.rssrc/am_utils_rs/repo/changes.rssrc/am_utils_rs/repo/changes/broker.rssrc/am_utils_rs/repo/iroh.rssrc/am_utils_rs/repo/partition.rssrc/am_utils_rs/sync.rssrc/am_utils_rs/sync/node.rssrc/am_utils_rs/sync/peer.rssrc/am_utils_rs/sync/protocol.rssrc/am_utils_rs/sync/store.rssrc/api_utils_rs/Cargo.tomlsrc/daybook_cli/Cargo.tomlsrc/daybook_cli/main.rssrc/daybook_core/blobs.rssrc/daybook_core/build.rssrc/daybook_core/config.rssrc/daybook_core/drawer.rssrc/daybook_core/drawer/facet_recovery.rssrc/daybook_core/e2e.rssrc/daybook_core/index/doc_blobs.rssrc/daybook_core/plugs.rssrc/daybook_core/repo.rssrc/daybook_core/rt/dispatch.rssrc/daybook_core/stores.rssrc/daybook_core/sync.rssrc/daybook_core/sync/bootstrap.rssrc/daybook_core/sync/full.rssrc/daybook_core/sync/full/doc_worker.rssrc/daybook_core/sync/stress.rssrc/daybook_core/tables.rssrc/daybook_ffi/Cargo.tomlsrc/daybook_fuse/lib.rssrc/daybook_server/Cargo.tomlsrc/daybook_server/main.rssrc/daybook_types/doc.rssrc/mltools/Cargo.tomlsrc/pauperfuse/Cargo.tomlsrc/tests_http/Cargo.tomlsrc/utils_rs/Cargo.tomlsrc/utils_rs/lib.rssrc/wash_plugin_pglite/Cargo.tomlsrc/wflow/Cargo.tomlsrc/wflow_webui/Cargo.tomlsrc/xtask/Cargo.toml
✅ Files skipped from review due to trivial changes (1)
- docs/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- src/daybook_core/build.rs
- src/daybook_cli/Cargo.toml
- src/xtask/Cargo.toml
| fn map_store_err(err: eyre::Report) -> PartitionSyncError { | ||
| map_repo_err(err) | ||
| } | ||
|
|
||
| fn map_repo_err(err: eyre::Report) -> PartitionSyncError { | ||
| PartitionSyncError::Internal { | ||
| message: err.to_string(), | ||
| } |
There was a problem hiding this comment.
Don't send raw internal errors back to peers.
err.to_string() forwards the full eyre report over the RPC boundary. That can disclose local paths, SQL details, or other internals to remote peers. Log the detailed error locally and return a stable, sanitized protocol message instead.
Suggested sanitization
fn map_repo_err(err: eyre::Report) -> PartitionSyncError {
+ warn!(?err, "sync node repo error");
PartitionSyncError::Internal {
- message: err.to_string(),
+ message: "internal sync node error".into(),
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/am_utils_rs/sync/node.rs` around lines 335 - 342, The current
map_store_err/map_repo_err functions leak the full eyre::Report to peers by
using err.to_string(); instead, log or report the full error locally (e.g. via
tracing::error! or log::error! with the err/Debug) and return a stable,
sanitized PartitionSyncError payload (for example a fixed message like "internal
server error" or an opaque error code) from PartitionSyncError::Internal so no
local paths/SQL/details are sent over RPC; update map_store_err and map_repo_err
to perform local logging of err and construct the sanitized message instead of
err.to_string().
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/daybook_types/doc.rs (1)
83-90:⚠️ Potential issue | 🔴 CriticalMake
Dmeta.actorsbackward-compatible.Adding
actorsas a required field means older docs without it now fail to deserialize, so mixed-version devices can no longer hydrate the sameDmeta. Give it#[serde(default)]so missing maps deserialize as empty.Based on learnings: Data structures in automerge documents (like dmeta metadata) are external input that can have version drift across devices.Suggested fix
Dmeta struct { pub id: DocId, // FIXME: unix timestamp codec pub created_at: Timestamp, pub updated_at: Vec<Timestamp>, + #[serde(default)] pub actors: HashMap<String, UserMeta>, pub facet_uuids: HashMap<Uuid,FacetKey>, pub facets: HashMap<FacetKey, FacetMeta> },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_types/doc.rs` around lines 83 - 90, The Dmeta struct's actors field is required and will fail deserialization for older docs; update the actors field (the HashMap<String, UserMeta> member on struct Dmeta in doc.rs) to derive a default for serde by adding #[serde(default)] so missing actors maps deserialize as empty, ensuring backward compatibility across mixed-version devices.src/daybook_core/repo.rs (1)
210-241:⚠️ Potential issue | 🟠 MajorShut
big_repo_stopdown on every post-boot failure path.Once
BigRepo::boot()succeeds, any later error here (spawn_ws_connector,init_from_globals,run_repo_init_dance, ormark_repo_initialized) returns early while the stop token is still just a local variable. That leaks the repo task tree and any opened resources wheneveropen_inner()fails after boot. Wrap the rest of the initialization in a fallible block and callbig_repo_stop.stop().awaitbefore propagating the error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/repo.rs` around lines 210 - 241, After successful BigRepo::boot (which yields big_repo and big_repo_stop) ensure big_repo_stop.stop().await is invoked on every error path that occurs after boot: wrap the subsequent fallible operations (spawn_ws_connector, init_from_globals, cloning doc_app/doc_drawer, run_repo_init_dance, mark_repo_initialized) in a try/fallible block and on any early return or propagated error call big_repo_stop.stop().await before returning the error; do the same if any awaited spawn_ws_connector or init_from_globals call fails so the BigRepo task tree and resources are always cleanly shut down.src/daybook_core/rt/dispatch.rs (1)
298-313:⚠️ Potential issue | 🟠 MajorDon't panic on synced dispatch updates.
A non-nil
vtagcan arrive from another peer or an older client. Panicking here tears down the notification worker instead of surfacing a recoverable compatibility path. Please emit an update/upsert event or return an error with context instead. Based on learnings, when validating data that comes from external inputs (such as automerge documents and related metadata like dmeta), useeyre::bail!()to return errors rather than panicking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/rt/dispatch.rs` around lines 298 - 313, The code currently panics when a non-nil VersionTag is encountered in src/daybook_core/rt/dispatch.rs (the match that calls VersionTag::hydrate_bytes produces vtag and then panics with "dispatch update detected"); instead, handle synced/remote updates by emitting an update/upsert DispatchEvent (e.g., a DispatchUpdated or DispatchUpsert variant) or return a contextual error via eyre::bail!() so the worker isn't torn down. Modify the branch after vtag.version.is_nil() to produce the appropriate DispatchEvent variant carrying id and heads (and any needed vtag info) or replace the panic! with eyre::bail!("unexpected dispatch update: {:?}", vtag) so callers can handle it; ensure you touch the branch that currently constructs DispatchAdded and the else that contains the panic and VersionTag::hydrate_bytes usage.src/daybook_core/config.rs (1)
344-364:⚠️ Potential issue | 🟠 Major
facet_displayanduserswrites still never emitConfigEvent::Changed.The
patch.path.len() == 2guard only matches directVersionedfields. Entry-level writes underfacet_displayandusersare one segment deeper, soset_facet_display_hint()and remote user-meta updates fall through this matcher and listeners never refresh.💡 Targeted fix
- } if patch.path.len() == 2 && key == "vtag" => { + } if key == "vtag" => { let Some((_obj, automerge::Prop::Map(section_key))) = patch.path.get(1) else { return Ok(()); }; + let is_supported_path = match section_key.as_ref() { + "facet_display" | "users" => patch.path.len() == 3, + "mltools" | "global_props_doc_id" => patch.path.len() == 2, + _ => false, + }; + if !is_supported_path { + return Ok(()); + } if !matches!( val, automerge::Value::Scalar(scalar)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/config.rs` around lines 344 - 364, The current matcher only triggers when patch.path.len() == 2, so deeper entry-level writes under the "facet_display" and "users" maps are skipped; change the guard to allow any depth (e.g., patch.path.len() >= 2) or otherwise accept path lengths >2 when key == "vtag" and the second path segment (section_key from patch.path.get(1)) matches "facet_display" or "users", so that the automerge::PatchAction::PutMap branch pushes ConfigEvent::Changed (affecting set_facet_display_hint and remote user-meta updates). Ensure you still validate val is a Bytes scalar as before.
♻️ Duplicate comments (3)
src/daybook_core/sync/stress.rs (2)
174-203:⚠️ Potential issue | 🟠 MajorRollback partially opened test resources on setup failures.
Both loops start async resources and then
bail!/?before the matchingshutdown()/stop(). A clone-validation failure skipsctx.shutdown(), and a lateropen_sync_nodefailure leaves every earlier node running. That can poison the rest of the test process. Apply the same scoped-cleanup pattern here too.Also applies to: 212-215
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/stress.rs` around lines 174 - 203, The clone validation path can early-return without calling ctx.shutdown(), leaving resources running; update the RepoCtx::open usage so that any partially opened ctx is always cleaned up on error (e.g., create a short-lived guard or scope that calls ctx.shutdown().await in Drop or use a try/finally-like pattern: perform validations after open and if any check fails call ctx.shutdown().await before returning the error). Apply the same pattern for other resources opened later (e.g., open_sync_node) so every opened resource (ctx, sync nodes) is shutdown/stopped on all error paths.
99-101:⚠️ Potential issue | 🟠 MajorDon't let teardown
stop()hide the real failure.These
stop().await?calls run after a storedresult, so the first shutdown failure replaces the actual test/init error and aborts cleanup of the remaining nodes. Make teardown best-effort here and return the originalresultafterward.Also applies to: 208-209
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/stress.rs` around lines 99 - 101, The teardown loop currently calls node.stop().await? which can overwrite and return a teardown error instead of the original test/init `result`; change both occurrences (the loop over `nodes` where `node.stop().await?` is used, including the duplicate at lines ~208-209) to run shutdown as best-effort: iterate through `nodes.into_iter().flatten()`, call `node.stop().await` but do not propagate the first stop error—capture/log any stop errors (or convert them to warnings) and continue stopping remaining nodes, then return the original `result` (not the teardown error); ensure the code references the existing `nodes` iterator, `node.stop().await`, and the `result` variable so the original outcome is preserved while still attempting full cleanup.src/daybook_types/doc.rs (1)
371-381:⚠️ Potential issue | 🟠 Major
new()and the accessors no longer agree on the path layout.
new("dev", Some("plug"), Some("routine"))yields/dev/plug/routine, butplug()reads segment 3 androutine()reads segment 4, so the accessor pair only works forfor_plug_routine()-style paths. That silently misidentifies plug/routine on the default constructor.Also applies to: 406-419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_types/doc.rs` around lines 371 - 381, UserPath::new builds "/device/plug/routine" but the accessors plug() and routine() expect the "for_plug_routine()" layout (they read segments 3 and 4), so update new (and the similar constructor around the 406-419 region) to construct the same path shape as for_plug_routine(): start with the root, push the "for" segment (or whatever literal that for_plug_routine uses), then push device, then optional plug, then optional routine so plug() and routine() index the same segments; adjust the same logic in the other constructor to match.
🧹 Nitpick comments (5)
docs/dict.md (1)
11-11: Tighten wording/typos in core definitions to avoid ambiguity.There are several wording issues in key conceptual lines (e.g., Line 11, Line 15, Line 44, Line 51, Line 76, Line 89) that make the model harder to parse quickly for contributors.
Suggested wording edits
-Daybook ensures then that these changes are synced to other replicas whenever a working connection between devices. +Daybook ensures these changes are synced to other replicas whenever there is a working connection between devices. -Documents are a the main units of information in a repo and have a unique ids. +Documents are the main units of information in a repo and each has a unique id. -CRDTs are a family of data structures allowing concurrent edits acoss devices that can then be resolved to a final, merged state in a repeatable and unsupervised manner. +CRDTs are a family of data structures allowing concurrent edits across devices that can be resolved to a final merged state in a repeatable, unsupervised manner. -A change can be taught of as a single git commit with a hash used to refer to it, the change hash. +A change can be thought of as a single git commit with a hash used to refer to it (the change hash). -It's also the gatekeeper for all docs weather it's reads or writes. +It's also the gatekeeper for all docs, whether for reads or writes. -As a special convention, `self` can be used instead of the document id to indicates that the reference is to the same doc containing the facet making holding the URL. +As a special convention, `self` can be used instead of the document id to indicate that the reference is to the same doc containing the facet that holds the URL.Also applies to: 15-15, 44-45, 51-52, 76-76, 89-89
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/dict.md` at line 11, Rewrite the awkward sentence "Daybook ensures then that these changes are synced to other replicas whenever a working connection between devices." to a clear, grammatical form such as "Daybook ensures that these changes are synced to other replicas whenever a working connection exists between devices." Also scan and tighten the other flagged lines (original lines referenced around the same paragraphs) to fix typos, remove extraneous words, and make definitions concise and unambiguous (e.g., clarify subject/verb order, replace informal words like "then"/"whenever" with precise phrasing, and ensure consistent tense and punctuation across the core definitions in the same section).src/daybook_core/sync/full/blob_worker.rs (2)
364-368: Consider distinguishing log levels for dropped lifecycle vs progress messages.Using
try_sendwith backpressure is appropriate for progress reporting. However, whenBlobWorkerStartedorBlobWorkerFinishedmessages are dropped, it may complicate debugging worker lifecycle issues. Consider logging dropped lifecycle messages atwarnlevel while keeping intermediate progress drops atdebug.Alternatively, if all
SyncProgressMsgvariants are truly non-critical for correctness (with authoritative state managed via theMsgchannel), the current approach is acceptable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/full/blob_worker.rs` around lines 364 - 368, In send_progress, distinguish lifecycle messages from intermediate progress: when sync_progress_tx.try_send(msg) fails, match on the SyncProgressMsg value (check for BlobWorkerStarted and BlobWorkerFinished variants) and log those dropped lifecycle messages with tracing::warn!(?err, hash = %self.hash, ...) while keeping other variants logged at debug as currently done; locate this behavior around sync_progress_tx.try_send and the send_progress async fn to implement the conditional logging.
176-176: Consider moving import to file level.The
use futures::StreamExt;import inside the async block works but is unconventional. Moving it to the top of the file (or relying onsuper::*if it's re-exported there) improves consistency and discoverability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/full/blob_worker.rs` at line 176, The local import use futures::StreamExt; found inside the async block in blob_worker.rs should be moved to the module/file scope: remove the inline import and add use futures::StreamExt; at the top of the file (or import it via super::* if already re-exported) so that the StreamExt trait is visible consistently across functions such as the async block where it was used; ensure there are no duplicate imports and that the trait remains in scope for methods like .next() or other StreamExt extensions.src/am_utils_rs/sync/peer.rs (1)
322-322: Remove the placeholderXXXlog before merge.This logs the full partition list at info level with a throwaway message, which will add noisy startup logs in a hot path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/peer.rs` at line 322, The placeholder info!(?partitions, "XXX"); is logging the entire partitions list at info level; remove this noisy line or change it to a lower log level and a meaningful message (e.g., debug!("partitions: {} entries", partitions.len()) or debug!(partitions = ?partitions, "partitions snapshot") if full content is needed for debugging). Update the call to either delete info!(?partitions, "XXX"); or replace it with a debug-level log that avoids verbose startup noise while still conveying useful context, referencing the info!(?partitions, "XXX") call and the partitions variable.src/daybook_core/sync/stress.rs (1)
466-469: GiveCreateBrancha collision-free branch path.The
/stress/{0..32}namespace is small relative to the run size, so this event will often revisit an existing branch instead of exercising branch creation. Usingidxand/or a random suffix keeps this path higher-signal.♻️ Suggested tweak
- let new_branch = daybook_types::doc::BranchPath::from(format!( - "/stress/{}", - rng.random_range(0..32) - )); + let new_branch = daybook_types::doc::BranchPath::from(format!( + "/stress/{idx}-{}", + rng.random::<u64>() + ));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/sync/stress.rs` around lines 466 - 469, The branch path for CreateBranch uses a small namespace "/stress/{0..32}" which causes collisions; update the construction of new_branch (created via daybook_types::doc::BranchPath::from) to include the loop index (idx) and/or a random suffix from rng (e.g., combine idx and rng.random_range or a hex/random token) so each generated path is effectively unique (e.g., "/stress/{idx}-{rand}"); ensure the same variable new_branch is used where CreateBranch is emitted so branch creation events target collision-free paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/dict.md`:
- Line 86: Replace the indented code example with a fenced code block: wrap the
URL db+facet://self/org.example.daybook.title/main?at=hash1|hash2 inside triple
backticks (```) so it becomes a proper Markdown fenced code block, ensuring
there's a blank line before and after the fenced block to satisfy markdownlint
MD046.
- Around line 28-38: Example object literals in the docs are missing trailing
commas which makes them invalid JavaScript; update the snippets (notably the
object under "org.example.daybook.dmeta/main" and the other similar examples) by
adding the missing commas after property entries (e.g., add a comma after the id
property value and after any other single-line properties shown) so every
property in those example object literals is comma-separated and the snippets
are valid JS.
In `@src/am_utils_rs/sync/peer.rs`:
- Around line 199-203: The select branch handling worker.samod_ack_rx.recv()
currently calls eyre::bail! when the channel is closed, which converts samod
backpressure into a bootstrap failure; instead, change the behavior so that when
recv yields None you do not abort bootstrap but pause the affected partition and
return control so other partitions can continue (e.g., set a paused flag or
requeue the RequestSamodSync state tied to doc_cursor and skip advancing it), or
await/loop until ack slots free before proceeding; apply the same change pattern
in RequestSamodSync-related logic (references: worker.samod_ack_rx.recv(),
worker.handle_samod_ack(ack), RequestSamodSync, doc_cursor) and the other
similar sites mentioned (around the 450-481 and 648-668 blocks) so pending samod
acks cause partition pause/retry instead of eyre::bail!.
- Around line 584-607: The code currently inserts an "empty slot" into
unresolved_by_partition before calling reduce_doc_event which lets a later
successful cursor (N+1) advance and drop a failed cursor (N); change the
sentinel semantics so we record a blocking/unresolved state instead of a no-op
slot: replace the current entry insertion prior to reduce_doc_event with a mark
that this partition+cursor is blocked (e.g., an explicit enum state or a
distinct value) and only clear/pop that blocked state when
apply_doc_event_decision successfully handles that cursor; update logic in
note_doc_cursor_seen, reduce_doc_event handling, apply_doc_event_decision, and
any code that reads unresolved_by_partition (the branches around
PartitionDocEventDeets::DocChanged and the later pop logic) to treat the blocked
state as non-ready; add a small regression test that simulates a failure at
cursor N (reduce_doc_event returns Err) then a success at N+1 and asserts that N
remains unresolved (blocked) until N is retried and succeeds.
- Around line 729-745: The code currently takes the first element from
response.docs (response.docs.into_iter().next()) without verifying it matches
the requested doc_id; update this to search response.docs for an entry whose
doc_id equals the requested doc_id (the doc_id variable passed into
GetDocsFullRequest) and use that matched doc, otherwise eyre::bail! that the
remote did not return the requested doc; keep the subsequent parsing of
doc.doc_id (parse::<samod::DocumentId>()) and its ferr! mapping unchanged so the
error path remains the same.
In `@src/daybook_core/blobs.rs`:
- Around line 77-80: The copy path is racy: put_path_copy() checks try_exists()
then calls atomic_copy_file(), but atomic_copy_file() performs a plain rename
which can fail if another concurrent writer created the target; modify
atomic_copy_file() to treat a rename failure with ErrorKind::AlreadyExists (or
an OS-specific "file exists") as success (i.e., return Ok(())) the same way
atomic_write() does, so duplicate ingests become idempotent; apply the same
change to the other copy site that also calls atomic_copy_file() (the second
occurrence around the other blob-ingest function) to ensure both paths handle
AlreadyExists gracefully.
- Around line 127-145: The read-modify-write of metadata (using build_meta,
read_meta, write_meta and ingest_path_with_iroh) races when two
put_path_reference calls run concurrently and can drop source_paths; serialize
the merge per hash by acquiring a per-hash lock (e.g., an async Mutex keyed by
the blob hash or a file lock on object_paths.meta) before calling read_meta,
merging source_paths, updating meta, writing it back, performing
ingest_path_with_iroh, setting meta.iroh_ingested and writing final meta, and
then release the lock so concurrent calls cannot interleave the read/merge/write
sequence.
In `@src/daybook_core/plugs.rs`:
- Around line 810-811: The listener registered via
PlugsStore::register_change_listener(&big_repo, &app_doc_id, vec![]) is
currently unfiltered so local changes are emitted twice (once via add() calling
self.registry.notify(...) and again when BigRepo emits DocChanged), so change
the registration to filter out BigRepoChangeOrigin::Local (or add that origin to
the vec! filter) for the big_repo listeners referenced here and at the similar
block around lines 985-994; alternatively remove the direct
self.registry.notify(...) call in add() so that notifs_loop alone emits
events—use the symbols PlugsStore::register_change_listener,
BigRepoChangeOrigin::Local, add(), self.registry.notify(...), notifs_loop,
DocChanged and PlugsEvent to locate and implement the fix.
In `@src/daybook_core/rt/dispatch.rs`:
- Around line 269-278: The current filter only skips the excluded remote peer
but still lets local-origin patches through, causing double notifications
(because add()/remove() already notify synchronously); update the origin check
so that any local-origin change (match
am_utils_rs::repo::BigRepoChangeOrigin::Local or origin == None if applicable)
returns early as well as the existing remote exclude check — modify the pattern
matching around `origin` (the block that currently matches
`BigRepoChangeOrigin::Remote { peer_id, .. }`) to also match and short-circuit
for `BigRepoChangeOrigin::Local` (and/or None) and keep the existing
`exclude_peer` remote comparison; apply the same change for the code paths that
emit DispatchAdded/DispatchDeleted so local mutations aren’t re-emitted from
`notifs_loop`.
In `@src/daybook_core/sync/full.rs`:
- Around line 886-911: The code is incorrectly overwriting
PeerSyncState.partitions with only big-repo partitions from
desired_big_repo_partitions_for_peer(), which must remain the full membership
set; instead, stop mutating peer_state.partitions here and update the
per-session partition set stored in peer_partition_sessions (or the Session
struct) so session-level desired_parts are tracked separately. Concretely:
replace assignments like peer_state.partitions = desired_parts and
peer_state.partitions.clear() with updates to session.partitions (or
insert/modify the entry in self.peer_partition_sessions to reflect
desired_parts), insert/remove session IDs in self.peer_sessions_to_refresh as
before, and apply the same change to the analogous block referenced at 920-934
so PeerSyncState.partitions remains the authoritative full membership only
updated by SetPeer/DelPeer message handling.
- Around line 300-309: The current startup only seeds worker.known_blob_set with
PartitionKey::DocBlobsFullSync which allows delete/update handlers (which
consult worker.doc_to_known_hashes) to mistakenly drop the partition bit for
shared hashes or for pre-existing docs because doc_to_known_hashes is empty; fix
by making the blob-tracking per-doc: populate worker.doc_to_known_hashes at
startup (for each doc that references a hash from
worker.doc_blobs_index_repo.list_all_hashes()) or replace known_blob_set entries
with a per-doc refcount map (e.g., hash -> map<DocId, count>) so that the
delete/update paths consult per-doc counts before removing
PartitionKey::DocBlobsFullSync, or as a simpler alternative check the live index
(list_all_hashes) before clearing the partition bit in the delete/update
handlers (ensure PartitionKey::DocBlobsFullSync is only cleared if the hash is
truly absent from the global index); update the delete/update handlers and any
uses of known_blob_set and doc_to_known_hashes accordingly.
In `@src/daybook_core/tables.rs`:
- Around line 396-398: The listener is receiving local mutations as well as the
eager local notifications, causing duplicate TablesEvent emissions; in the call
to TablesStore::register_change_listener(&big_repo, &app_doc_id, vec![]) (and
similarly at the other location) restrict the origin filter so local-origin
changes are ignored—i.e., pass a filter that excludes BigRepoChangeOrigin::Local
or, inside the notifs_loop that handles ticket/notif_rx from big_repo,
explicitly drop notifications whose origin is BigRepoChangeOrigin::Local before
calling self.registry.notify; alternatively, stop emitting the eager local
notifications from the mutator paths (the direct self.registry.notify calls) so
events are only emitted once. Ensure you update both the
register_change_listener usage and the notifs_loop handler (and the mutator
notify calls) to avoid double notifications.
In `@src/daybook_types/doc.rs`:
- Around line 390-404: The constructors for for_repo and for_plug_routine
currently build paths via format! which allows multi-segment values and
double-slashes; instead validate each input segment (repo_scope, user_id,
device_id, plug_id, routine_id) is non-empty and contains no '/' or path
traversal (e.g. ".."), then build the UserPath by starting from base_user_path
(or UserPath::root for the plug routine) and calling push() for each validated
segment, and finally call parse() or return the constructed UserPath; update
error messages to indicate which segment is invalid and ensure base_user_path ==
"/" is handled by using push rather than string concatenation.
---
Outside diff comments:
In `@src/daybook_core/config.rs`:
- Around line 344-364: The current matcher only triggers when patch.path.len()
== 2, so deeper entry-level writes under the "facet_display" and "users" maps
are skipped; change the guard to allow any depth (e.g., patch.path.len() >= 2)
or otherwise accept path lengths >2 when key == "vtag" and the second path
segment (section_key from patch.path.get(1)) matches "facet_display" or "users",
so that the automerge::PatchAction::PutMap branch pushes ConfigEvent::Changed
(affecting set_facet_display_hint and remote user-meta updates). Ensure you
still validate val is a Bytes scalar as before.
In `@src/daybook_core/repo.rs`:
- Around line 210-241: After successful BigRepo::boot (which yields big_repo and
big_repo_stop) ensure big_repo_stop.stop().await is invoked on every error path
that occurs after boot: wrap the subsequent fallible operations
(spawn_ws_connector, init_from_globals, cloning doc_app/doc_drawer,
run_repo_init_dance, mark_repo_initialized) in a try/fallible block and on any
early return or propagated error call big_repo_stop.stop().await before
returning the error; do the same if any awaited spawn_ws_connector or
init_from_globals call fails so the BigRepo task tree and resources are always
cleanly shut down.
In `@src/daybook_core/rt/dispatch.rs`:
- Around line 298-313: The code currently panics when a non-nil VersionTag is
encountered in src/daybook_core/rt/dispatch.rs (the match that calls
VersionTag::hydrate_bytes produces vtag and then panics with "dispatch update
detected"); instead, handle synced/remote updates by emitting an update/upsert
DispatchEvent (e.g., a DispatchUpdated or DispatchUpsert variant) or return a
contextual error via eyre::bail!() so the worker isn't torn down. Modify the
branch after vtag.version.is_nil() to produce the appropriate DispatchEvent
variant carrying id and heads (and any needed vtag info) or replace the panic!
with eyre::bail!("unexpected dispatch update: {:?}", vtag) so callers can handle
it; ensure you touch the branch that currently constructs DispatchAdded and the
else that contains the panic and VersionTag::hydrate_bytes usage.
In `@src/daybook_types/doc.rs`:
- Around line 83-90: The Dmeta struct's actors field is required and will fail
deserialization for older docs; update the actors field (the HashMap<String,
UserMeta> member on struct Dmeta in doc.rs) to derive a default for serde by
adding #[serde(default)] so missing actors maps deserialize as empty, ensuring
backward compatibility across mixed-version devices.
---
Duplicate comments:
In `@src/daybook_core/sync/stress.rs`:
- Around line 174-203: The clone validation path can early-return without
calling ctx.shutdown(), leaving resources running; update the RepoCtx::open
usage so that any partially opened ctx is always cleaned up on error (e.g.,
create a short-lived guard or scope that calls ctx.shutdown().await in Drop or
use a try/finally-like pattern: perform validations after open and if any check
fails call ctx.shutdown().await before returning the error). Apply the same
pattern for other resources opened later (e.g., open_sync_node) so every opened
resource (ctx, sync nodes) is shutdown/stopped on all error paths.
- Around line 99-101: The teardown loop currently calls node.stop().await? which
can overwrite and return a teardown error instead of the original test/init
`result`; change both occurrences (the loop over `nodes` where
`node.stop().await?` is used, including the duplicate at lines ~208-209) to run
shutdown as best-effort: iterate through `nodes.into_iter().flatten()`, call
`node.stop().await` but do not propagate the first stop error—capture/log any
stop errors (or convert them to warnings) and continue stopping remaining nodes,
then return the original `result` (not the teardown error); ensure the code
references the existing `nodes` iterator, `node.stop().await`, and the `result`
variable so the original outcome is preserved while still attempting full
cleanup.
In `@src/daybook_types/doc.rs`:
- Around line 371-381: UserPath::new builds "/device/plug/routine" but the
accessors plug() and routine() expect the "for_plug_routine()" layout (they read
segments 3 and 4), so update new (and the similar constructor around the 406-419
region) to construct the same path shape as for_plug_routine(): start with the
root, push the "for" segment (or whatever literal that for_plug_routine uses),
then push device, then optional plug, then optional routine so plug() and
routine() index the same segments; adjust the same logic in the other
constructor to match.
---
Nitpick comments:
In `@docs/dict.md`:
- Line 11: Rewrite the awkward sentence "Daybook ensures then that these changes
are synced to other replicas whenever a working connection between devices." to
a clear, grammatical form such as "Daybook ensures that these changes are synced
to other replicas whenever a working connection exists between devices." Also
scan and tighten the other flagged lines (original lines referenced around the
same paragraphs) to fix typos, remove extraneous words, and make definitions
concise and unambiguous (e.g., clarify subject/verb order, replace informal
words like "then"/"whenever" with precise phrasing, and ensure consistent tense
and punctuation across the core definitions in the same section).
In `@src/am_utils_rs/sync/peer.rs`:
- Line 322: The placeholder info!(?partitions, "XXX"); is logging the entire
partitions list at info level; remove this noisy line or change it to a lower
log level and a meaningful message (e.g., debug!("partitions: {} entries",
partitions.len()) or debug!(partitions = ?partitions, "partitions snapshot") if
full content is needed for debugging). Update the call to either delete
info!(?partitions, "XXX"); or replace it with a debug-level log that avoids
verbose startup noise while still conveying useful context, referencing the
info!(?partitions, "XXX") call and the partitions variable.
In `@src/daybook_core/sync/full/blob_worker.rs`:
- Around line 364-368: In send_progress, distinguish lifecycle messages from
intermediate progress: when sync_progress_tx.try_send(msg) fails, match on the
SyncProgressMsg value (check for BlobWorkerStarted and BlobWorkerFinished
variants) and log those dropped lifecycle messages with tracing::warn!(?err,
hash = %self.hash, ...) while keeping other variants logged at debug as
currently done; locate this behavior around sync_progress_tx.try_send and the
send_progress async fn to implement the conditional logging.
- Line 176: The local import use futures::StreamExt; found inside the async
block in blob_worker.rs should be moved to the module/file scope: remove the
inline import and add use futures::StreamExt; at the top of the file (or import
it via super::* if already re-exported) so that the StreamExt trait is visible
consistently across functions such as the async block where it was used; ensure
there are no duplicate imports and that the trait remains in scope for methods
like .next() or other StreamExt extensions.
In `@src/daybook_core/sync/stress.rs`:
- Around line 466-469: The branch path for CreateBranch uses a small namespace
"/stress/{0..32}" which causes collisions; update the construction of new_branch
(created via daybook_types::doc::BranchPath::from) to include the loop index
(idx) and/or a random suffix from rng (e.g., combine idx and rng.random_range or
a hex/random token) so each generated path is effectively unique (e.g.,
"/stress/{idx}-{rand}"); ensure the same variable new_branch is used where
CreateBranch is emitted so branch creation events target collision-free paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0c4365a0-65b0-4be0-aafa-74d9e482947c
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lockand included by**/*
📒 Files selected for processing (23)
Cargo.tomldocs/dict.mdsrc/am_utils_rs/Cargo.tomlsrc/am_utils_rs/repo/changes.rssrc/am_utils_rs/sync/peer.rssrc/daybook_core/Cargo.tomlsrc/daybook_core/blobs.rssrc/daybook_core/config.rssrc/daybook_core/drawer.rssrc/daybook_core/index/doc_blobs.rssrc/daybook_core/plugs.rssrc/daybook_core/repo.rssrc/daybook_core/rt/dispatch.rssrc/daybook_core/stores.rssrc/daybook_core/sync/bootstrap.rssrc/daybook_core/sync/full.rssrc/daybook_core/sync/full/blob_worker.rssrc/daybook_core/sync/stress.rssrc/daybook_core/tables.rssrc/daybook_ffi/Cargo.tomlsrc/daybook_server/Cargo.tomlsrc/daybook_types/doc.rssrc/utils_rs/lib.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/daybook_ffi/Cargo.toml
| self.note_doc_cursor_seen(&event.partition_id, event.cursor); | ||
| let decision = match self.reduce_doc_event(event).await { | ||
| Ok(decision) => decision, | ||
| Err(err) => { | ||
| if let PartitionDocEventDeets::DocChanged { doc_id, .. } = &event.deets { | ||
| let entry = out | ||
| .unresolved_by_partition | ||
| .entry(event.partition_id.clone()) | ||
| .or_insert(0); | ||
| *entry = entry.saturating_add(1); | ||
| warn!( | ||
| ?err, | ||
| partition = %event.partition_id, | ||
| doc_id, | ||
| cursor = event.cursor, | ||
| "failed reducing remote doc event" | ||
| ); | ||
| continue; | ||
| } | ||
| return Err(err); | ||
| } | ||
| }; | ||
| self.apply_doc_event_decision(event, decision, &mut out) | ||
| .await?; |
There was a problem hiding this comment.
Unresolved doc events can be skipped permanently.
Line 584 inserts an empty slot before reduction/import succeeds. If a doc change then hits the unresolved paths at Lines 587-601 or Lines 685-699, that cursor still looks "ready", and Lines 927-937 will pop it the next time a later event succeeds. A failure at cursor N followed by a success at cursor N+1 can therefore advance the stored cursor to N+1 and drop N forever. Unresolved cursors need a blocking state, not an empty slot. Please add a small regression test for this ordering.
As per coding guidelines: "When working with Rust, use small tests in addition to cargo clippy to validate ideas."
Also applies to: 685-699, 921-937
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/am_utils_rs/sync/peer.rs` around lines 584 - 607, The code currently
inserts an "empty slot" into unresolved_by_partition before calling
reduce_doc_event which lets a later successful cursor (N+1) advance and drop a
failed cursor (N); change the sentinel semantics so we record a
blocking/unresolved state instead of a no-op slot: replace the current entry
insertion prior to reduce_doc_event with a mark that this partition+cursor is
blocked (e.g., an explicit enum state or a distinct value) and only clear/pop
that blocked state when apply_doc_event_decision successfully handles that
cursor; update logic in note_doc_cursor_seen, reduce_doc_event handling,
apply_doc_event_decision, and any code that reads unresolved_by_partition (the
branches around PartitionDocEventDeets::DocChanged and the later pop logic) to
treat the blocked state as non-ready; add a small regression test that simulates
a failure at cursor N (reduce_doc_event returns Err) then a success at N+1 and
asserts that N remains unresolved (blocked) until N is retried and succeeds.
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/daybook_core/rt/dispatch.rs (1)
312-323:⚠️ Potential issue | 🟡 MinorConsider using
eyre::bail!()instead ofpanic!()for dispatch update detection.If a dispatch update patch arrives from an external peer (e.g., due to version drift or a future protocol change), this
panic!will crash the application. Per the retrieved learnings, external input validation should useeyre::bail!()to handle version drift safely.🛡️ Proposed fix
out.push(if vtag.version.is_nil() { DispatchEvent::DispatchAdded { id: dispatch_id.clone(), heads: dispatch_heads, } } else { - panic!("dispatch update detected") - // DispatchEvent::DispatchUpdated { - // id: dispatch_id.clone(), - // heads: dispatch_heads, - // } + // Dispatch updates are not currently supported; log and skip + warn!(dispatch_id = %dispatch_id, "ignoring unexpected dispatch update from external source"); + return Ok(()); });Based on learnings: "When validating data that comes from external inputs (such as automerge documents), use eyre::bail!() to return errors rather than panicking."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/rt/dispatch.rs` around lines 312 - 323, Replace the panic!("dispatch update detected") with an eyre::bail!() so external/peer-driven dispatch update patches return an error instead of aborting; import eyre::bail!, return a Result from the surrounding function (or propagate the error) where out.push(...) is created, and use a clear error message like "dispatch update detected" (you can later decide whether to handle DispatchEvent::DispatchUpdated or treat it as an error); keep the existing DispatchEvent::DispatchAdded branch and reference vtag.version.is_nil(), dispatch_id, and dispatch_heads when making the decision.
♻️ Duplicate comments (3)
docs/dict.md (2)
28-38:⚠️ Potential issue | 🟡 MinorFix invalid JavaScript object examples (missing commas).
At line 33, the property is missing a trailing comma, making this snippet invalid JS that will fail if copied.
📝 Proposed fix
"org.example.daybook.dmeta/main": { - id: "<the id that the drawer knows it by>", + id: "<the id that the drawer knows it by>", createdAt: "timestamp", updatedAt: "timestamp", // more metadata🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/dict.md` around lines 28 - 38, The example JSON-like object in docs/dict.md has missing trailing comma(s) making it invalid JS; update the snippet so each property entry is comma-separated—specifically ensure a comma after the "org.example.daybook.path/main" entry and after the closing brace for the "org.example.daybook.dmeta/main" object (and any other entries that follow) so the object with keys "org.example.daybook.title/main", "org.example.daybook.path/main", and "org.example.daybook.dmeta/main" is valid JavaScript.
109-125:⚠️ Potential issue | 🟡 MinorFix invalid JavaScript object examples (missing commas).
At lines 114 and 121, properties are missing trailing commas, making these snippets invalid JS object literals that will fail if copied.
📝 Proposed fix
"org.example.daybook.blob/main": { mime: "image/png", lengthOctets: 1024, - digest: "<hash>", + digest: "<hash>", inlineBase64: "small blobs are stored inline as base64 strings but not all blobs", urls: [ "db+blob://<digest>" ], }, "org.example.daybook.imagemetadata/main": { - facetRef: "db+facet://self/org.example.daybook.blob/main", + facetRef: "db+facet://self/org.example.daybook.blob/main", refHeads: [], }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/dict.md` around lines 109 - 125, The example objects for "org.example.daybook.blob/main" and "org.example.daybook.imagemetadata/main" have missing trailing commas that make the JavaScript invalid; update the literal for "org.example.daybook.blob/main" to include a comma after the mime property value and ensure the urls array line ends with a comma, and update the "org.example.daybook.imagemetadata/main" object to include a trailing comma after the refHeads property so all object properties (mime, lengthOctets, digest, inlineBase64, urls, facetRef, refHeads) are properly comma-separated.src/am_utils_rs/sync/peer.rs (1)
460-491:⚠️ Potential issue | 🟠 MajorPause ack-gated bootstrap pages instead of aborting them.
When
apply_doc_events()queuesRequestDocSync,doc_cursoris supposed to stay put until a Samod ack arrives. If that page also reportshas_more = true, Line 486 turns that expected wait into a hard failure before the select loop can ever drainsamod_ack_rx. A partition with more thanDEFAULT_EVENT_PAGE_LIMITalready-local docs still dies here.🛠️ Minimal fix
self.emit_doc_sync_status(synced_docs, unresolved); - if any_more && next_cursor == cursor.doc_cursor { + let waiting_on_samod_acks = self + .doc_cursor_state + .get(part) + .is_some_and(|state| { + state.slots.values().any(|slot| { + matches!( + slot, + CursorSlotState::Pending(pending_doc_ids) + if !pending_doc_ids.is_empty() + ) + }) + }); + if any_more && next_cursor == cursor.doc_cursor && waiting_on_samod_acks { + debug!( + partition = %part, + cursor = ?next_cursor, + "pausing bootstrap pagination until samod acks advance the cursor" + ); + break; + } + if any_more && next_cursor == cursor.doc_cursor { eyre::bail!(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/peer.rs` around lines 460 - 491, The code currently treats a page with has_more=true and an unchanged doc_cursor as an error, but apply_doc_events() may legitimately queue a RequestDocSync and keep doc_cursor steady until a Samod ack arrives; instead of eyre::bail! in the block checking any_more && next_cursor == cursor.doc_cursor, change behavior to pause advancing this partition's bootstrap (do not error) so the select loop can receive from samod_ack_rx: detect this stalled-but-ack-gated condition in the place around apply_doc_events()/next_cursor/doc_cursor, set a local flag or break out of the partition loop (keeping unresolved_total/synced_docs state) to defer further cursor advancement, and ensure emit_doc_sync_status is still called; reference apply_doc_events, RequestDocSync, doc_cursor, samod_ack_rx and emit_doc_sync_status when making the change.
🧹 Nitpick comments (5)
AGENTS.md (1)
112-119: Rephrase test-policy framing to avoid lowering quality expectations.Line 112 can be interpreted as “tests are less important.” Consider reframing this section to keep the same intent (gardening/improving tests) without implying reduced rigor.
🧹 Suggested rewrite
-Human auditors of this repo don't really read test code or mantain it. -So when working on tests, take a gardening hand to it and assume they've been less strictly audited in the past. +Test code is often less consistently audited than production code. +When touching tests, proactively improve them: prune shallow tests, reduce duplication, and strengthen assertions. See a useless or too shallow a test? Flag it for removal. Are tests repeating too much setup code? Consider using TDD or common cleanup code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` around lines 112 - 119, The sentence "Human auditors of this repo don't really read test code or mantain it." in AGENTS.md should be rephrased to avoid implying tests are unimportant; replace it with a constructive guideline that encourages contributors to proactively improve and maintain tests (e.g., emphasize "when working on tests, assume they need cleanup and strengthen them" and suggest concrete actions like removing shallow tests, consolidating setup, and using snapshot/TDD macros), and update surrounding lines mentioning "gardening" and tools (snapshot tests, TDD, macros) to frame these as practices to preserve test quality rather than tolerate lower standards.docs/dict.md (1)
94-94: Consider rewording for clarity."the facet making holding the URL" is awkward. Consider "the facet that holds the URL" or "the facet containing the URL".
📝 Proposed fix
-As a special convention, `self` can be used instead of the document id to indicates that the reference is to the same doc containing the facet making holding the URL. +As a special convention, `self` can be used instead of the document id to indicate that the reference is to the same doc containing the facet that holds the URL.Note: Also changed "indicates" to "indicate" to match the infinitive form.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/dict.md` at line 94, Reword the sentence describing the special convention for `self`: replace the awkward phrase "the facet making holding the URL" with a clearer phrase such as "the facet that holds the URL" or "the facet containing the URL", and change "indicates" to "indicate" so the clause reads properly (e.g., "As a special convention, `self` can be used instead of the document id to indicate that the reference is to the same doc containing the facet that holds the URL.").src/daybook_core/index/doc_blobs.rs (2)
178-182: Consider logging when facets are unexpectedly missing.
unwrap_or_default()silently treats aNoneresult as an empty map. Ifselected_blob_keyswas non-empty butget_at_heads_with_facets_arcreturnsNone, this might indicate an unexpected state worth logging for debugging purposes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/index/doc_blobs.rs` around lines 178 - 182, The code currently calls self.drawer_repo.get_at_heads_with_facets_arc(...).await? and immediately unwrap_or_default into facets, which hides when the repo returned None; change this to explicitly match the Option result from get_at_heads_with_facets_arc (or keep the interim variable) and if it is None while selected_blob_keys is non-empty, emit a warning/error log containing doc_id, heads and selected_blob_keys (and any repo error context) before falling back to an empty map; keep the existing behavior of using an empty map when None but ensure the unexpected None is logged for debugging (referencing get_at_heads_with_facets_arc, facets, selected_blob_keys, doc_id and heads).
131-133: Index ondoc_idmay be redundant.Since
doc_idis the first column of the composite primary key(doc_id, blob_hash), SQLite can use the primary key index for queries filtering bydoc_idalone. The separate index on line 131 may be unnecessary overhead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/daybook_core/index/doc_blobs.rs` around lines 131 - 133, The CREATE INDEX call for idx_doc_blob_refs_doc_id in doc_blobs.rs is redundant because doc_id is the leading column of the composite primary key (doc_id, blob_hash); remove the sqlx::query("CREATE INDEX IF NOT EXISTS idx_doc_blob_refs_doc_id ON doc_blob_refs(doc_id)")...execute(db_pool).await?; statement to avoid unnecessary overhead, or if you decide to keep it for a documented performance reason, add a clear comment above it referencing doc_blob_refs, doc_id, and blob_hash explaining why the extra index is required and add a test/benchmark proving the benefit.src/am_utils_rs/sync/peer.rs (1)
332-332: Remove theXXXinfo log before merging.This logs the full partition list on every worker start at
infoand looks like leftover debugging.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/am_utils_rs/sync/peer.rs` at line 332, Remove the leftover info-level debug log that prints the full partition list: delete or change the call to info!(?partitions, "XXX"); in peer.rs (the logging invocation referencing the partitions variable and the info! macro). If you want to keep the diagnostic, convert it to debug!(?partitions, "partitions") or otherwise redact sensitive details; otherwise simply remove the line before merging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Around line 103-107: Fix the spelling mistakes in the governance text of
AGENTS.md: replace "sensebilites" with "sensibilities", "tires" with "tries",
and "mantain" with "maintain"; search the newly added governance paragraph
containing phrases like "Avoid cheating through hacks that violate common
sense..." and the nearby lines referenced as 112-113 to ensure all occurrences
are corrected and update surrounding wording if necessary for grammatical
consistency.
In `@docs/dict.md`:
- Line 44: In the sentence starting "CRDTs are a family of data structures
allowing concurrent edits acoss devices..." update the misspelled word "acoss"
to "across" so the line reads "...allowing concurrent edits across devices...";
locate that exact sentence in docs/dict.md and correct the typo.
- Line 21: Replace the awkward phrase in the sentence beginning "The facet tag
indicates expected schema of the value of under that key." by removing "of" and
adding "the" where needed so it reads: "The facet tag indicates the expected
schema of the value under that key." Update that exact sentence in docs/dict.md
to use this corrected phrasing.
- Line 11: The sentence in docs/dict.md is incomplete; update the sentence
"Daybook ensures then that these changes are synced to other replicas whenever a
working connection between devices." to include the missing verb—e.g., change it
to "whenever a working connection between devices exists" or "whenever a working
connection is established between devices" so the sentence reads properly;
locate the sentence text "Daybook ensures then that these changes are synced to
other replicas whenever a working connection between devices." and replace it
accordingly.
- Line 76: Fix the typo "weather" to "whether" in the sentence "It's also the
gatekeeper for all docs weather it's reads or writes." within docs/dict.md;
search for that sentence (or the word "weather") and replace it with "whether"
and also consider correcting surrounding grammar ("it's reads" → "its reads" or
rephrase) to ensure the sentence reads correctly.
- Line 51: Fix the typo in the sentence in docs/dict.md where "taught" is used
incorrectly; change the word "taught" to "thought" in the sentence "A change can
be thought of as a single git commit with a hash used to refer to it, the change
hash." to correct the meaning.
- Line 15: Fix the typo in the sentence "Documents are a the main units of
information in a repo and have a unique ids." by removing the extra "a" so it
reads "Documents are the main units of information in a repo" and also remove
the extraneous "a" before "unique ids" (resulting clause: "and have unique
ids"); update that exact line accordingly.
In `@src/am_utils_rs/sync.rs`:
- Around line 767-790: The test currently waits only for
dst.partition_member_count(&p_ok) to reach 1, which can happen before docs are
synced; change the wait condition to also poll
dst_store.get_partition_cursor(peer_key.clone(), p_ok.clone()) and require that
the returned ok_cursor.doc_cursor.is_some() before proceeding. Update the
wait_until closure (the one that currently calls partition_member_count) to
fetch the partition cursor for p_ok and return Ok(count == 1 &&
ok_cursor.doc_cursor.is_some()), so the test only continues once the p_ok
document cursor has advanced. Ensure you reference the existing wait_until,
dst.partition_member_count, dst_store.get_partition_cursor, and p_ok symbols
when making this change.
In `@src/am_utils_rs/sync/peer.rs`:
- Around line 69-72: PeerSyncWorkerHandle exposes optional progress_rx and
events_rx so sending should not panic when receivers are dropped; replace all
`.expect(ERROR_CHANNEL)` uses on broadcast::Sender::send with non-panicking
sends (e.g., `let _ = events_tx.send(...)` / `let _ = progress_tx.send(...)`) so
the worker continues if there are zero listeners—apply this change to the three
events_tx send sites for Bootstrapped, LiveReady, and AbnormalExit and the three
progress_tx send sites inside emit_phase_started, emit_phase_finished, and
emit_doc_sync_status.
In `@src/daybook_core/index/doc_blobs.rs`:
- Around line 444-451: The code currently calls index_repo.enqueue_delete(...)
when self.drawer_repo.get_facet_keys_if_latest(...) returns None (a stale
event); instead, mirror the DocAdded behavior and skip stale events by returning
early without deleting. Change the else branch for get_facet_keys_if_latest(id,
&branch_path, heads, drawer_heads) to simply return Ok(outcome) (no
enqueue_delete), leaving index data intact so a newer event can reconcile state.
In `@src/daybook_core/sync/full.rs`:
- Around line 1582-1592: remove_blob_tracking_if_unused currently deletes
bookkeeping for unreferenced hashes but worker callbacks like
BlobSyncMarkedSynced and BlobSyncRequestBackoff can later reinsert state for
those hashes; fix by adding a guard in those worker handlers (or any code that
mutates synced_blobs, pending_blobs, blobs_to_boot) to first check that
self.known_blob_set.contains(hash) (or a generation/version token if you prefer)
and return/ignore the callback if the hash is not present, ensuring no
reinsertion of removed hashes by functions handling BlobSyncMarkedSynced /
BlobSyncRequestBackoff and other blob worker callbacks.
- Around line 1188-1193: The current parse::<DocumentId>() call in the full-sync
handling path should not propagate a parse error up and kill the worker;
instead, detect a malformed peer doc_id, log a warning/debug with the offending
peer_key and raw doc_id, and drop the message by returning Ok(()) so the worker
continues; update the block around parse::<DocumentId>() and the surrounding
logic that uses self.endpoint_for_peer_key(&peer_key) to early-return Ok(()) on
parse failures (and similarly at the other sites you noted), replacing map_err
propagation with a non-fatal drop-and-log behavior.
---
Outside diff comments:
In `@src/daybook_core/rt/dispatch.rs`:
- Around line 312-323: Replace the panic!("dispatch update detected") with an
eyre::bail!() so external/peer-driven dispatch update patches return an error
instead of aborting; import eyre::bail!, return a Result from the surrounding
function (or propagate the error) where out.push(...) is created, and use a
clear error message like "dispatch update detected" (you can later decide
whether to handle DispatchEvent::DispatchUpdated or treat it as an error); keep
the existing DispatchEvent::DispatchAdded branch and reference
vtag.version.is_nil(), dispatch_id, and dispatch_heads when making the decision.
---
Duplicate comments:
In `@docs/dict.md`:
- Around line 28-38: The example JSON-like object in docs/dict.md has missing
trailing comma(s) making it invalid JS; update the snippet so each property
entry is comma-separated—specifically ensure a comma after the
"org.example.daybook.path/main" entry and after the closing brace for the
"org.example.daybook.dmeta/main" object (and any other entries that follow) so
the object with keys "org.example.daybook.title/main",
"org.example.daybook.path/main", and "org.example.daybook.dmeta/main" is valid
JavaScript.
- Around line 109-125: The example objects for "org.example.daybook.blob/main"
and "org.example.daybook.imagemetadata/main" have missing trailing commas that
make the JavaScript invalid; update the literal for
"org.example.daybook.blob/main" to include a comma after the mime property value
and ensure the urls array line ends with a comma, and update the
"org.example.daybook.imagemetadata/main" object to include a trailing comma
after the refHeads property so all object properties (mime, lengthOctets,
digest, inlineBase64, urls, facetRef, refHeads) are properly comma-separated.
In `@src/am_utils_rs/sync/peer.rs`:
- Around line 460-491: The code currently treats a page with has_more=true and
an unchanged doc_cursor as an error, but apply_doc_events() may legitimately
queue a RequestDocSync and keep doc_cursor steady until a Samod ack arrives;
instead of eyre::bail! in the block checking any_more && next_cursor ==
cursor.doc_cursor, change behavior to pause advancing this partition's bootstrap
(do not error) so the select loop can receive from samod_ack_rx: detect this
stalled-but-ack-gated condition in the place around
apply_doc_events()/next_cursor/doc_cursor, set a local flag or break out of the
partition loop (keeping unresolved_total/synced_docs state) to defer further
cursor advancement, and ensure emit_doc_sync_status is still called; reference
apply_doc_events, RequestDocSync, doc_cursor, samod_ack_rx and
emit_doc_sync_status when making the change.
---
Nitpick comments:
In `@AGENTS.md`:
- Around line 112-119: The sentence "Human auditors of this repo don't really
read test code or mantain it." in AGENTS.md should be rephrased to avoid
implying tests are unimportant; replace it with a constructive guideline that
encourages contributors to proactively improve and maintain tests (e.g.,
emphasize "when working on tests, assume they need cleanup and strengthen them"
and suggest concrete actions like removing shallow tests, consolidating setup,
and using snapshot/TDD macros), and update surrounding lines mentioning
"gardening" and tools (snapshot tests, TDD, macros) to frame these as practices
to preserve test quality rather than tolerate lower standards.
In `@docs/dict.md`:
- Line 94: Reword the sentence describing the special convention for `self`:
replace the awkward phrase "the facet making holding the URL" with a clearer
phrase such as "the facet that holds the URL" or "the facet containing the URL",
and change "indicates" to "indicate" so the clause reads properly (e.g., "As a
special convention, `self` can be used instead of the document id to indicate
that the reference is to the same doc containing the facet that holds the
URL.").
In `@src/am_utils_rs/sync/peer.rs`:
- Line 332: Remove the leftover info-level debug log that prints the full
partition list: delete or change the call to info!(?partitions, "XXX"); in
peer.rs (the logging invocation referencing the partitions variable and the
info! macro). If you want to keep the diagnostic, convert it to
debug!(?partitions, "partitions") or otherwise redact sensitive details;
otherwise simply remove the line before merging.
In `@src/daybook_core/index/doc_blobs.rs`:
- Around line 178-182: The code currently calls
self.drawer_repo.get_at_heads_with_facets_arc(...).await? and immediately
unwrap_or_default into facets, which hides when the repo returned None; change
this to explicitly match the Option result from get_at_heads_with_facets_arc (or
keep the interim variable) and if it is None while selected_blob_keys is
non-empty, emit a warning/error log containing doc_id, heads and
selected_blob_keys (and any repo error context) before falling back to an empty
map; keep the existing behavior of using an empty map when None but ensure the
unexpected None is logged for debugging (referencing
get_at_heads_with_facets_arc, facets, selected_blob_keys, doc_id and heads).
- Around line 131-133: The CREATE INDEX call for idx_doc_blob_refs_doc_id in
doc_blobs.rs is redundant because doc_id is the leading column of the composite
primary key (doc_id, blob_hash); remove the sqlx::query("CREATE INDEX IF NOT
EXISTS idx_doc_blob_refs_doc_id ON
doc_blob_refs(doc_id)")...execute(db_pool).await?; statement to avoid
unnecessary overhead, or if you decide to keep it for a documented performance
reason, add a clear comment above it referencing doc_blob_refs, doc_id, and
blob_hash explaining why the extra index is required and add a test/benchmark
proving the benefit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bb3f9b40-a4f8-4a19-bd67-8b84b38ea985
📒 Files selected for processing (12)
AGENTS.mddocs/dict.mdsrc/am_utils_rs/sync.rssrc/am_utils_rs/sync/peer.rssrc/daybook_core/blobs.rssrc/daybook_core/drawer.rssrc/daybook_core/index/doc_blobs.rssrc/daybook_core/plugs.rssrc/daybook_core/rt/dispatch.rssrc/daybook_core/sync/full.rssrc/daybook_core/tables.rssrc/daybook_types/doc.rs
| let doc_id = doc_id | ||
| .parse::<DocumentId>() | ||
| .map_err(|err| ferr!("invalid doc id '{}': {err}", doc_id))?; | ||
| let Some(endpoint_id) = self.endpoint_for_peer_key(&peer_key) else { | ||
| debug!(peer_key, %doc_id, "dropping samod doc sync request for unknown peer"); | ||
| return Ok(()); |
There was a problem hiding this comment.
Do not let malformed peer doc_id kill the full-sync worker.
doc_id.parse::<DocumentId>() errors from peer input currently propagate, and this can terminate the worker task via fut.await.unwrap() at Line 415. Treat malformed peer payloads as non-fatal and drop them with a warning.
💡 Suggested fix
@@
- let doc_id = doc_id
- .parse::<DocumentId>()
- .map_err(|err| ferr!("invalid doc id '{}': {err}", doc_id))?;
+ let doc_id = match doc_id.parse::<DocumentId>() {
+ Ok(doc_id) => doc_id,
+ Err(err) => {
+ warn!(peer_key, raw_doc_id = %doc_id, ?err, "dropping samod doc sync request with invalid doc id");
+ return Ok(());
+ }
+ };
@@
- let doc_id = doc_id
- .parse::<DocumentId>()
- .map_err(|err| ferr!("invalid doc id '{}': {err}", doc_id))?;
+ let doc_id = match doc_id.parse::<DocumentId>() {
+ Ok(doc_id) => doc_id,
+ Err(err) => {
+ warn!(peer_key, raw_doc_id = %doc_id, ?err, "dropping samod doc-deleted request with invalid doc id");
+ return Ok(());
+ }
+ };Based on learnings: “When validating data that comes from external inputs … use eyre::bail!() to return errors rather than panicking … handle version drift across devices safely.”
Also applies to: 1243-1248, 413-416
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/daybook_core/sync/full.rs` around lines 1188 - 1193, The current
parse::<DocumentId>() call in the full-sync handling path should not propagate a
parse error up and kill the worker; instead, detect a malformed peer doc_id, log
a warning/debug with the offending peer_key and raw doc_id, and drop the message
by returning Ok(()) so the worker continues; update the block around
parse::<DocumentId>() and the surrounding logic that uses
self.endpoint_for_peer_key(&peer_key) to early-return Ok(()) on parse failures
(and similarly at the other sites you noted), replacing map_err propagation with
a non-fatal drop-and-log behavior.
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests