test: minimal repro that generates conflicts #2031

seefeldb · 2025-11-07T19:47:25Z

This reliable generates these conflicts for me. Note that this is using the in-memory sqlite test setup, so it's definitely single-player!

added a runner test as well, same effect. should make it easier to debug:

deno test -A test/conflict-repro.test.ts in packages/runner

�[0m�[38;5;245mrunning 1 test from ./integration/patterns/minimal-conflict-repro.test.ts�[0m
minimal conflict repro ...
  minimal conflict repro ... �[0m�[32mok�[0m �[0m�[38;5;245m(299ms)�[0m
�[0m�[38;5;245m------- output -------�[0m
�[38;2;234;179;8m[WARN][storage.cache::19:46:40.475]�[0m Transaction failed ConflictError: The application/json of of:baedreifrdcsbdplxx3cb5zhetslhpvazdzgsnott5a34wugdfflztecove in did:key:z6MkgoBAqGfjfZrnExo9gAp5m6sawZuhiaHTWzh5VUc7cuj5 was expected to be ba4jcawd7fmfyomud2hkiishs6t7le67wb27lvmmjen52dyk2doxvljn7, but it is ba4jcaxickypevbukijrki4zkcwfllpa7lvftxvvmx3i6firied2fdw5l
    at Module.conflict (file:///Users/berni/src/labs-secondary/packages/memory/error.ts:25:29)
    at swap (file:///Users/berni/src/labs-secondary/packages/memory/space.ts:765:19)
    at commit (file:///Users/berni/src/labs-secondary/packages/memory/space.ts:801:5)
    at sqliteTransaction (https://jsr.io/@db/sqlite/0.12.0/src/database.ts:874:28)
    at execute (file:///Users/berni/src/labs-secondary/packages/memory/space.ts:838:11)
    at file:///Users/berni/src/labs-secondary/packages/memory/space.ts:865:12
    at traceSync (file:///Users/berni/src/labs-secondary/packages/memory/telemetry.ts:69:20)
    at transact (file:///Users/berni/src/labs-secondary/packages/memory/space.ts:856:10)
    at file:///Users/berni/src/labs-secondary/packages/memory/space.ts:219:14
    at traceSync (file:///Users/berni/src/labs-secondary/packages/memory/telemetry.ts:69:20)
    at Space.transact (file:///Users/berni/src/labs-secondary/packages/memory/space.ts:213:12)
    at file:///Users/berni/src/labs-secondary/packages/memory/memory.ts:188:26
    at eventLoopTick (ext:core/01_core.js:179:7)
    at async file:///Users/berni/src/labs-secondary/packages/memory/telemetry.ts:127:22
    at async transact (file:///Users/berni/src/labs-secondary/packages/memory/memory.ts:171:10)
    at async file:///Users/berni/src/labs-secondary/packages/memory/memory.ts:74:7
    at async file:///Users/berni/src/labs-secondary/packages/memory/telemetry.ts:127:22
    at async Memory.perform (file:///Users/berni/src/labs-secondary/packages/memory/memory.ts:71:12)
    at async MemoryProviderSession.invoke (file:///Users/berni/src/labs-secondary/packages/memory/provider.ts:324:15)
    at async Object.write (file:///Users/berni/src/labs-secondary/packages/memory/provider.ts:195:11) {
  transaction: {
    cmd: �[32m"/memory/transact"�[39m,
    sub: �[32m"did:key:z6MkgoBAqGfjfZrnExo9gAp5m6sawZuhiaHTWzh5VUc7cuj5"�[39m,
    args: {
      changes: {
        �[32m"of:baedreifrdcsbdplxx3cb5zhetslhpvazdzgsnott5a34wugdfflztecove"�[39m: { �[32m"application/json"�[39m: �[36m[Object]�[39m }
      }
    },
    iss: �[32m"did:key:z6MkgoBAqGfjfZrnExo9gAp5m6sawZuhiaHTWzh5VUc7cuj5"�[39m,
    prf: [],
    iat: �[33m1762544800�[39m,
    exp: �[33m1762548400�[39m
  },
  name: �[32m"ConflictError"�[39m,
  conflict: {
    space: �[32m"did:key:z6MkgoBAqGfjfZrnExo9gAp5m6sawZuhiaHTWzh5VUc7cuj5"�[39m,
    the: �[32m"application/json"�[39m,
    of: �[32m"of:baedreifrdcsbdplxx3cb5zhetslhpvazdzgsnott5a34wugdfflztecove"�[39m,
    expected: #ba4jcawd7fmfyomud2hkiishs6t7le67wb27lvmmjen52dyk2doxvljn7,
    actual: {
      the: �[32m"application/json"�[39m,
      of: �[32m"of:baedreifrdcsbdplxx3cb5zhetslhpvazdzgsnott5a34wugdfflztecove"�[39m,
      cause: #ba4jcatu5wzftjqvuzhahgyk5t4mcc65gzxlixmmmtwifiifopuz3eih7,
      since: �[33m3�[39m,
      is: {
        value: {
          �[32m"$TYPE"�[39m: �[32m"ba4jcar2jw3xl4tuo3umlqzmc6fuh3yy3pgljoumxy2b55f2zrsgowcwh"�[39m,
          resultRef: �[36m[Object]�[39m,
          internal: �[36m[Object]�[39m,
          spell: �[36m[Object]�[39m,
          argument: �[36m[Object]�[39m
        }
      }
    },
    existsInHistory: �[33mfalse�[39m,
    history: []
  }
}
�[0m�[38;5;245m----- output end -----�[0m
minimal conflict repro ... �[0m�[32mok�[0m �[0m�[38;5;245m(322ms)�[0m

�[0m�[32mok�[0m | 1 passed (1 step) | 0 failed �[0m�[38;5;245m(331ms)�[0m

Summary by cubic

Fixed a ConflictError when a lifted map depends on items and a handler clears items and increments a sequence. Normalized values before hashing and added a minimal repro with tests to confirm no conflicts.

Bug Fixes
- Normalize values before assertion hashing in chronicle.ts to handle NaN consistently.
- Add minimal conflict repro pattern and runner test to verify the fix.

^{Written for commit d3a0121. Summary will update automatically on new commits.}

cubic-dev-ai

No issues found across 2 files

cubic-dev-ai

Reviewed changes from recent commits (found 1 issue).

1 issue found across 1 file

Prompt for AI agents (all 1 issues)


Understand the root cause of the following 1 issues and fix them.


<file name="packages/runner/test/conflict-repro.test.ts">

<violation number="1" location="packages/runner/test/conflict-repro.test.ts:112">
Await the transaction commit so the test does not continue before the asynchronous operation completes, avoiding race conditions.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-dev-ai · 2025-11-07T20:06:58Z

packages/runner/test/conflict-repro.test.ts

+    const result = runtime.run(tx, conflictRepro, {
+      items: [{ id: "test" }],
+    }, resultCell);
+    tx.commit();


Await the transaction commit so the test does not continue before the asynchronous operation completes, avoiding race conditions.

Prompt for AI agents

Address the following comment on packages/runner/test/conflict-repro.test.ts at line 112: <comment>Await the transaction commit so the test does not continue before the asynchronous operation completes, avoiding race conditions.</comment> <file context> @@ -0,0 +1,130 @@ + const result = runtime.run(tx, conflictRepro, { + items: [{ id: "test" }], + }, resultCell); + tx.commit(); + + await runtime.idle(); </file context>

Suggested change

tx.commit();

await tx.commit();

seefeldb · 2025-11-07T20:13:49Z

Investigation Update: Root Cause Analysis

I've traced through the conflict scenario and found the issue. Here's what's happening:

The Sequence

Handler invocation (TX-3) runs and commits:
- Reads recipe result cell (version V1)
- Updates items cell via context.items.set([])
- Prepares transaction with expected version = V1
- Immediately adds to nursery (V2) at cache.ts:990
- Immediately sends commit notification at cache.ts:992-997
- Asynchronously commits to remote/SQLite at cache.ts:1007-1011
Lift reactive action (TX-4) gets triggered by the notification:
- Notification triggers scheduler to re-run the action
- Action reads recipe result cell to build transaction
- Computes lift result based on new items value
- Prepares its own transaction
- Commits to nursery
- Asynchronously commits to remote/SQLite
Conflict occurs:
- TX-3's SQLite commit completes: V1 → V2 ✓
- TX-4's SQLite commit tries: expects V1 but DB has V2 → ConflictError!

The Root Issue

The conflict happens because both transactions commit to SQLite asynchronously and in parallel, racing with the same expected version. The design assumes:

TX commits to nursery (synchronous)
Subsequent TXs read from nursery (synchronous)
Remote commits happen async in background

But what's actually happening is:

TX-4 is not reading from the nursery that has TX-3's changes
OR TX-4's transaction snapshot is captured before TX-3 updates the nursery

Key Question

When TX-4 (lift) reads the recipe result cell to build its transaction, the Replica.get() method should check nursery first (line 1142), then heap. If it's reading from the nursery correctly, it should see TX-3's V2 and use that as the expected version, avoiding the conflict.

The conflict indicates TX-4 is reading an older version (V1), which suggests either:

The transaction snapshot is captured before the nursery update
There's an async boundary causing the reads to see stale state
The transaction preparation reads from a different source than the nursery

Need to trace exactly when the transaction snapshot is captured vs when the nursery is updated.

cubic-dev-ai

No issues found across 1 file

seefeldb · 2025-11-07T20:14:03Z

Test Infrastructure Created

I've created a test that captures ConflictErrors without modifying the runtime using the storage subscription mechanism:

Location: packages/runner/test/conflict-repro.test.ts

How It Works

The test subscribes to storage notifications to capture "revert" events with ConflictError:

storageManager.subscribe({
  next: (notification: StorageNotification) => {
    if (notification.type === "revert" && notification.reason.name === "ConflictError") {
      conflictErrors.push(notification.reason);
    }
    return undefined;
  },
});

Two Tests

With lift (reproduces conflict):
- Pattern has lift((item: Item[]) => item.map((_) => ({})))(items)
- Handler updates items.set([]) and sequence.set(sequence.get() + 1)
- Result: 1 ConflictError captured ✓
Without lift (no conflict):
- Same handler logic
- No lift operation
- Result: 0 ConflictErrors ✓

This confirms the conflict only occurs when a reactive dependency (lift) exists on a cell being updated by a handler.

Note on Timing

The conflict happens asynchronously after runtime.idle() completes, so the test includes a 100ms delay to ensure the notification is processed:

await new Promise((resolve) => setTimeout(resolve, 100));

This is because the ConflictError occurs during the remote transaction commit, not during the local nursery update.

seefeldb · 2025-11-07T20:14:27Z

Scheduler Analysis

Looking at the scheduler code, the execution sequence is:

Event Queue Processing (Handler Invocation)

From scheduler.ts:487-536:

const event = this.eventQueue.shift();
if (event) {
  const finalize = (error?: unknown) => {
    // ...
    tx.commit().then(({ error }) => {  // Line 498 - NOT awaited!
      if (error && retriesLeft > 0) {
        this.eventQueue.unshift({ action, retriesLeft: retriesLeft - 1, onCommit });
        this.queueExecution();
      }
    });
  };
  
  this.runningPromise = Promise.resolve(
    this.runtime.harness.invoke(() => action(tx))
  ).then(() => finalize()).catch((error) => finalize(error));
  await this.runningPromise;  // Line 532 - waits for handler to finish
}

The handler runs and finishes (line 532), but tx.commit() at line 498 is not awaited - it returns a promise that's handled asynchronously.

Reactive Actions Processing

From scheduler.ts:538-565:

const order = topologicalSort(this.pending, this.dependencies);

for (const fn of order) {
  // ...
  await this.run(fn);  // Line 563
}

The reactive actions (lift recomputation) run immediately after the handler, each calling this.run() which also has the same pattern - tx.commit() is called but not awaited at line 250.

The Design Intent

Comments at scheduler.ts:244-249 explain:

"Commit the transaction. The code continues synchronously after kicking off the commit, i.e. it assumes the commit will be successful."

This is the correct design - the local nursery should be updated synchronously, allowing subsequent transactions to see the changes immediately. The async part is just remote confirmation.

The Question

If the nursery update is synchronous (which it is, at cache.ts:990), and reads check the nursery first (which they do, at cache.ts:1142), then why does TX-4 have the wrong expected version?

The conflict error shows TX-4 expected ba4jcaqmalkmpvecgzokt3rmw2ryxuyidyuoeuyaosrt3xtygk6c56tol but the actual was ba4jcaghsobeyzssva2kixwllrth3pocgb3uztbfkz2gqvjzr7i4yxmox, meaning TX-4 didn't see TX-3's update.

seefeldb · 2025-11-07T20:18:48Z

Deep Dive: Transaction Snapshot Capture

I've traced through the entire read/write flow to understand when the "expected version" is captured. Here's what happens:

Write Flow

When a transaction writes to a cell:

journal.ts:140 - branch.write(address, value)
chronicle.ts:139 - const state = this.load(address) - loads from replica
chronicle.ts:91 - return this.#replica.get(address) - reads from nursery first!
chronicle.ts:140 - const loaded = attest(state) - creates attestation with current cause
This attestation's cause becomes the "expected version" for the SQLite transaction

Key Insight

The expected version is captured when Chronicle.write() calls this.load(), which calls replica.get(). The replica DOES check the nursery first (cache.ts:1142).

The Timing Mystery

Given this flow, TX-4 (lift) should see TX-3's (handler) changes:

TX-3 handler commits → updates nursery at cache.ts:990 (synchronous)
TX-3 sends notification at cache.ts:992-997 (synchronous)
TX-4 gets triggered and runs
TX-4 creates Chronicle branch at journal.ts:164
TX-4 writes → loads from replica.get() → should see nursery update ✓

But the conflict shows TX-4 has the OLD expected version!

Hypothesis

The async mount() call at memory.ts:181 might be creating a timing window where both transactions race to SQLite with stale snapshots captured before the nursery updates propagate correctly.

Need to verify: Is there any caching or snapshot isolation at the replica level that could cause stale reads even though nursery is updated?

seefeldb · 2025-11-07T20:21:10Z

BREAKTHROUGH: Found the Root Cause!

Added detailed logging to trace when base versions are captured. The sequence reveals the actual problem:

The Sequence (with logging)

[NURSERY] Adding to nursery: ...Cause: ba4jcaghso...  ← TX-3 updates nursery
[TX-3] COMMIT                                          ← TX-3 notification sent
[WRITE] Capturing base: ba4jcaghso...                  ← TX-4 DOES see TX-3's update ✓
[NURSERY] Adding to nursery: ...Cause: ba4jcaqmalk... ← TX-4 adds its changes
[TX-4] COMMIT                                          ← TX-4 notification sent
[WRITE] Capturing base: ba4jcaghso...                  ← TX-4 writes AGAIN with OLD base!
[NURSERY] Adding to nursery: ...Cause: ba4jcabew6...  ← Another update
[TX-5] COMMIT
[REVERT] ConflictError                                 ← Conflict!

The Issue

TX-4 (the lift recomputation) writes to the cell TWICE, and the second write captures the SAME base version as the first!

The conflict error confirms this:

Expected: ba4jcaqmalk... (TX-4's first write - what it put in nursery)
Actual: ba4jcaghso... (TX-3's version - still in the DB)

So TX-4:

First write: reads base ba4jcaghso... (TX-3's version from nursery) ✓
Adds to nursery: ba4jcaqmalk... (TX-4's new version)
Second write: reads base ba4jcaghso... AGAIN (stale!)
Tries to commit second write expecting ba4jcaqmalk... but DB still has ba4jcaghso...

The Question

Why is TX-4 writing the same cell twice, and why doesn't the second write see the first write's changes in the nursery?

This suggests the Chronicle/transaction is not seeing its own previous writes within the same transaction, OR there's some caching issue where the branch doesn't reflect the novelty from the first write.

Need to investigate why the transaction writes twice and why the second write doesn't see the first.

seefeldb · 2025-11-07T20:23:32Z

ROOT CAUSE IDENTIFIED: Concurrent Remote Commits

After adding detailed logging, I found the exact issue. Here's the sequence with the actual causes:

The Actual Sequence

[Handler TX-3]
- Writes to nursery with cause: ba4jcaghso...  (TX-3's version)
- Notification sent → triggers TX-4

[Lift TX-4 - First Attempt]
- Reads base from nursery: ba4jcaghso... ✓ (correctly sees TX-3)
- Writes to nursery with cause: ba4jcaqmalk... (TX-4's version)
- Tries remote/SQLite commit → ConflictError!
- Changes REMOVED from nursery (cache.ts:1050)

[Lift TX-5 - Retry]
- Reads base from nursery: ba4jcaghso... (TX-4's changes were purged!)
- Tries to commit expecting ba4jcaqmalk... but DB has ba4jcaghso...
- ConflictError again!

The Root Problem

TX-3 and TX-4 are both trying to commit to SQLite concurrently, even though:

TX-3 adds to nursery synchronously ✓
TX-4 sees TX-3's nursery update ✓
TX-4 uses correct base version ✓

But then:
4. TX-3's remote.transact() is async and hasn't reached SQLite yet
5. TX-4's remote.transact() also starts, goes async
6. Both race to SQLite with overlapping version expectations
7. One wins, one gets ConflictError

Why the Nursery Update Isn't Enough

The nursery update at cache.ts:990 happens immediately, which is correct for optimistic reads.

But the remote/SQLite commit at cache.ts:1007-1011 happens asynchronously via await commitPromise.

The async mount() at memory.ts:181 creates a timing window where multiple transactions can reach the SQLite layer concurrently, all with overlapping version expectations.

The Fix Direction

The system needs to serialize remote commits, not just nursery updates. Possible approaches:

Add a commit queue to ensure SQLite transactions happen sequentially
Make the scheduler wait for remote commit before triggering reactive actions
Improve the conflict retry logic to update the base version from the conflict error's actual field

The current retry logic purges from nursery and retries with stale data, creating an infinite loop of conflicts.

seefeldb · 2025-11-07T20:25:55Z

ACTUAL ROOT CAUSE: Double-Write Bug in Lift Recomputation

You were right - it shouldn't conflict! After adding detailed logging to see what's being written, I found the real issue:

The Smoking Gun

[WRITE] path: ["value","internal","__#1","length"] Base: ba4jcaghso... Value: 0
[WRITE] path: ["value","internal","__#1","length"] Base: ba4jcaghso... Value: 0
                                                                    ^^^ DUPLICATE!

The lift recomputation is writing the same cell property twice in a single transaction, both with the same base version!

What's Happening

When the reactive action (TX-4) runs to recompute the lift:

First write to __#1.length: base=ba4jcaghso... (TX-3's version) ✓
Adds to novelty map
Second write to __#1.length: base=ba4jcaghso... (same base!)
rebase() applies the first write from novelty
Second write now expects its own first write: ba4jcaqmalk...
Commit to SQLite → expects ba4jcaqmalk... but DB has ba4jcaghso... → ConflictError!

The Real Bug

This is NOT a concurrency issue at all! It's a double-write bug in the reactive lift recomputation logic.

The lift result (__#1) is being written twice:

Line in logs: ["value","internal","__#1","length"] appears twice with identical base

This suggests the reactive system is processing the lift update twice, or there's a bug in how derived cell updates are written to the transaction.

Why Removing the Lift Fixes It

Without the lift, there's no __#1 derived cell, so no double-write occurs. The pattern works fine.

Next Steps

Need to investigate why the lift recomputation writes the derived cell result twice. This is likely in the runner's recipe execution or the lift implementation itself.

The conflict retry loop was a red herring - the real issue is that a single transaction is writing the same path twice, causing the second write to expect the first write to already be committed.

seefeldb · 2025-11-07T20:35:15Z

Root Cause Identified: Nursery and SQLite Out of Sync

I've added comprehensive logging to track the exact sequence of SQLite SWAP operations and nursery operations. The logs reveal the root cause:

The Sequence

Nursery operations (optimistic, immediate):

TX-2 commits to nursery: ba4jcagwa2ku...
TX-3 reads ba4jcagwa2ku... from nursery, commits: ba4jcaghsobeyzss...
TX-4 reads ba4jcaghsobeyzss... from nursery, commits: ba4jcaqmalkmpvec...

SQLite batch commit (happens later, asynchronously):

[SQLITE SWAP BEFORE] Expected cause: ba4jcapjcdszj... → updated=0 (failed)
[SQLITE SWAP BEFORE] Expected cause: ba4jcagwa2ku... → updated=1 (success)
[SQLITE SWAP BEFORE] Expected cause: ba4jcaghsobeyzss... → updated=1 (success)
[SQLITE SWAP BEFORE] Expected cause: ba4jcaqmalkmpvec... → updated=0 (CONFLICT!)
[SQLITE CONFLICT] Expected: ba4jcaqmalkmpvec... Actual in DB: ba4jcabew6pg72ji...

The Problem

When TX-4 tries to commit to SQLite, it expects the nursery version (ba4jcaqmalkmpvec...) to be in the database. However, SQLite contains a completely different version (ba4jcabew6pg72ji...) that was never in the nursery!

This means another transaction committed to SQLite between:

When TX-4 added its changes to the nursery
When TX-4 attempted to commit to SQLite

The nursery is supposed to be a staging area that mirrors what will be in SQLite, but they're getting out of sync. A transaction that's not in the nursery is making it into SQLite first.

Next Step

Need to identify:

Where does ba4jcabew6pg72ji... come from?
Why is it committing to SQLite without going through the nursery?
Is there a race condition where multiple SQLite transactions are running concurrently?

seefeldb · 2025-11-07T20:52:48Z

Critical Discovery: Mystery Transaction Not in Nursery

The logging reveals a critical synchronization bug:

Nursery Commits (Optimistic)

1. NURSERY PUT: ba4jcapjcdszj... (TX-1)
2. NURSERY PUT: ba4jcagwa2ku... (TX-2)
3. NURSERY PUT: ba4jcaghsobeyzss... (TX-3) ✓
4. NURSERY PUT: ba4jcaqmalkmpvec... (TX-4) ✓

SQLite Commits (Asynchronous)

1. SQLITE COMMIT START: Changes 2 (TX-1?)
2. SQLITE COMMIT START: Changes 1 (TX-2?)
3. SQLITE COMMIT START: Changes 1 (??? MYSTERY ???) 
   - Expects: ba4jcaghsobeyzss... (TX-3's version)
   - Produces: ba4jcabew6pg72ji...
   - SUCCESS! ✓
4. SQLITE COMMIT START: Changes 1 (TX-4)
   - Expects: ba4jcaqmalkmpvec... (nursery version)
   - Finds: ba4jcabew6pg72ji... (mystery version)
   - CONFLICT! ✗

The Bug

There are 4 nursery commits but at least one additional SQLite transaction that never went through the nursery! The mystery transaction:

Reads TX-3's version from SQLite (not from nursery)
Commits to SQLite successfully
Only AFTER TX-4 conflicts does ba4jcabew6pg72ji... get added to the nursery

This breaks the fundamental invariant: The nursery should contain all pending changes before they hit SQLite. Instead, a transaction is bypassing the nursery and committing directly to SQLite, causing TX-4 to conflict.

Hypothesis

The mystery transaction is likely triggered by:

A reactive computation (lift) that fires when TX-3 commits
A retry of TX-2 after it sees TX-3's changes
Some other observer/subscription mechanism

The key issue is that this transaction is reading directly from SQLite (finding TX-3's committed version) rather than reading from the nursery (which already has TX-4's pending changes).

seefeldb · 2025-11-07T21:05:13Z

Final Root Cause: Reactive Computation Bypassing Nursery

After comprehensive logging, I've identified the exact issue:

The Sequence

Handler Execution (Lines in Chronicle):

TX-3 writes to ["value","argument","items","length"] and ["value","internal","__#0"]
- Commits to nursery: ba4jcaghsobeyzss...
TX-4 writes to ["value","internal","__#1","length"]
- Commits to nursery: ba4jcaqmalkmpvec...

Async SQLite Batch (4 separate transactions):

TX-1 attempts commit → fails (stale base)
TX-2 commits → SUCCESS, writes ba4jcaghsobeyzss... (TX-3's version)
MYSTERY TX commits → SUCCESS, writes ba4jcabew6pg72ji...
- Expects: ba4jcaghsobeyzss... (TX-3's committed version from SQLite)
- Never appeared in nursery logs!
TX-4 attempts commit → CONFLICT
- Expects: ba4jcaqmalkmpvec... (from nursery)
- Finds: ba4jcabew6pg72ji... (mystery version)

The Bug

The mystery transaction:

Reads TX-3's version directly from SQLite (not from nursery)
Commits to SQLite successfully
Only gets added to nursery AFTER TX-4 conflicts

This breaks the invariant that all transactions must read from and write to the nursery before committing to SQLite.

Hypothesis

The mystery transaction is most likely the lift (reactive computation) that's watching the items cell. When TX-3 commits to SQLite and changes items.length to 0, the lift:

Gets triggered by the SQLite commit notification
Re-computes its value
Creates a new transaction that reads from SQLite (finding TX-3's version)
Bypasses the nursery check
Commits to SQLite before TX-4

The lift should either:

Read from the nursery to see TX-4's pending changes, OR
Wait for all pending nursery transactions to commit before executing

This explains why the conflict only happens with the lift - without it, there's no reactive computation creating the mystery transaction.

seefeldb · 2025-11-07T21:10:42Z

Mystery Transaction Solved: Server Re-computation Produces Different Cause

After adding comprehensive logging across the client-server boundary, I've identified the exact issue:

Complete Transaction Flow

Client sends 4 transactions:

TX-1: ba4jcapjcdszj...
TX-2: ba4jcagwa2ku...
TX-3: ba4jcaghsobeyzss... ← Handler writes items.length=0
TX-4: ba4jcaqmalkmpvec... ← Lift writes internal.__#1.length=0

Server processes all 4:

SQLite commit Update YYYY-MM-DD-TEMPLATE.md #1: TX-1 (2 changes)
SQLite commit Update YYYY-MM-DD-TEMPLATE.md #2: TX-2 → produces ba4jcaghsobeyzss...
SQLite commit feat: Add demo packages and some docs #3: TX-3 → produces ba4jcabew6pg72ji... (DIFFERENT!)
SQLite commit Just call it RFCs #4: TX-4 → expects ba4jcaqmalkmpvec..., finds ba4jcabew6pg72ji... → CONFLICT

The Root Cause

The client computes TX-3's output as ba4jcaghsobeyzss... based on its local state and optimistic nursery. However, when the server processes the same transaction changes, it re-computes and produces ba4jcabew6pg72ji....

This happens because:

The client sends changes (diffs), not the full computed state
The server applies these changes on top of its SQLite state
The server's computation produces a different cause hash than the client expected

Then TX-4 conflicts because:

Client expects to build on ba4jcaqmalkmpvec... (which was based on local ba4jcaghsobeyzss...)
Server has ba4jcabew6pg72ji... (from its re-computation of TX-3)

Why This Matters

This isn't a "mystery extra transaction" - it's non-deterministic transaction processing. The same transaction inputs produce different cause hashes on client vs server, breaking optimistic concurrency.

The issue is likely:

Different ordering of object keys during serialization
Different timestamps or auto-generated values
Reactive computations (lifts) running on the server that don't run on the client

Next step: Determine why TX-3's server-side processing produces a different cause hash than client-side.

seefeldb · 2025-11-07T21:17:40Z

Root Cause Found: Client Sends Different Base Than What It Refers To

After logging the exact values used to compute cause hashes on both sides, I found the smoking gun:

Client TX-3

Fact being referred (base):

"argument": {
  "items": [{"id": "test"}]  ← Base has test item
}

New value being asserted:

"argument": {
  "items": []  ← New value is empty
}

Computed cause: ba4jcaghsobeyzss... ← Hash of the BASE fact

Server TX-3

Assertion received:

"argument": {
  "items": []  ← Only receives the new value
}

Computed fact: ba4jcabew6pg72ji... ← Hash of the NEW value (different!)

The Problem

The client computes its cause hash by calling refer(factToRefer) where factToRefer.is contains the BASE state (items: [{id: "test"}]).

But the server receives only the NEW state (items: []) in source.assert.is, and computes refer(source.assert) which hashes the NEW value, not the base.

This is the mismatch:

Client cause = hash(base fact with items=[{id:"test"}])
Server fact = hash(new assertion with items=[])

The protocol appears to assume the client sends both:

The assertion's new value
The base fact it's building on (via the cause reference)

But somewhere in the transmission, the server is using source.assert to compute the fact hash instead of looking up what the cause reference points to.

This explains why the mystery transaction ba4jcabew6pg72ji... appears - it's not a mystery, it's just the server computing the wrong hash because it's hashing the assertion instead of the base!

seefeldb · 2025-11-07T21:23:19Z

Critical Finding: `refer()` Produces Different Hashes for Identical Assertions

After exhaustive logging, I've found a shocking result: the client and server are calling refer() on EXACTLY the same assertion structure, but getting different merkle hashes!

The Evidence

Client creates assertion:

{
  "the": "application/json",
  "of": "of:baedreibrjlfr2bgdvpup3ej36yp7v6bxwxzlymnk6gco2lyo3g6rsh6y74",
  "cause": {"/": "baedreiay6jyetdgkkudjjc6znogm7n5yiyhotgmevlhi2cvhgh5dtc5r24"},
  "is": {"value": {"items": [], "__#0": null, "__#1": [{}], ...}}
}

→ refer(assertion) returns: ba4jcaghsobeyzss...

Server receives assertion:

{
  "the": "application/json",
  "of": "of:baedreibrjlfr2bgdvpup3ej36yp7v6bxwxzlymnk6gco2lyo3g6rsh6y74",  
  "cause": {"/": "baedreiay6jyetdgkkudjjc6znogm7n5yiyhotgmevlhi2cvhgh5dtc5r24"},
  "is": {"value": {"items": [], "__#0": null, "__#1": [{}], ...}}
}

→ refer(assertion) returns: ba4jcabew6pg72ji... (DIFFERENT!)

The Problem

These are byte-for-byte identical when stringified, but refer() produces different hashes. This breaks the entire merkle-reference system - the same content should ALWAYS produce the same hash.

Possible causes:

Object key ordering: JSON.stringify may order keys differently even if the objects are "equal"
Hidden properties: The objects may have different prototypes or non-enumerable properties
Bug in merkle-reference: The refer() function itself may be non-deterministic
Different serialization: Client and server may use different serialization before hashing

This is the root cause of ALL the conflicts - not a transaction ordering issue, but a fundamental non-determinism in how facts are hashed!

seefeldb · 2025-11-07T21:28:51Z

Update: Different Import Paths for `refer()`

I found that client and server import refer() from different locations:

Client (packages/runner/src/storage/transaction/chronicle.ts:32):

import { refer } from "merkle-reference";

Server (packages/memory/space.ts:9):

import { refer } from "./reference.ts";

However, packages/memory/reference.ts just re-exports from merkle-reference:

import * as Reference from "merkle-reference";
export * from "merkle-reference";

So they should be the same function. But I need to verify they're actually resolving to the same version and there isn't some subtle difference in how the module is being loaded or if there are different versions in play.

The core issue remains: when the server receives TX-3 and calls refer(assertion), it produces hash ba4jcabew6pg72ji..., but this doesn't match what the client expects. Need to trace where the client's expected hash comes from and why there's a mismatch.

seefeldb · 2025-11-07T21:57:30Z

Root Cause Found!

I've identified the exact cause of the ConflictError:

The Problem

On the client side, the assertion contains internal.__#0 = NaN
On the server side, after deserialization, it becomes internal.__#0 = null

Evidence

CLIENT:
Raw value: NaN
typeof: number
JSON.stringify: null

SERVER:
Raw value: null
typeof: object  
JSON.stringify: null

Even though both JSON.stringify() to "null", they are fundamentally different values:

NaN is a number type
null is an object type

When these objects are passed to refer() for merkle hashing, they produce different content hashes:

Client hash: ba4jcaqmalkmpvecgzokt3rmw2ryxuyidyuoeuyaosrt3xtygk6c56tol
Server hash: ba4jcabew6pg72jigsqiib6dqchn6h3q7fxv7btn7tbmedaetcmmc6ukv

This causes the optimistic locking conflict because the server receives an assertion with a different hash than it computed.

Next Steps

Need to find where NaN is being set as a value in the pattern execution. This appears to be related to the lift() function creating intermediate computation results with uninitialized numeric fields.

seefeldb · 2025-11-07T22:03:56Z

Update: Narrowed down to `Number(undefined)`

I've traced the issue further:

The Conversion

Number(undefined) returns NaN
JSON.stringify(NaN) returns "null"
JSON.parse("null") returns null (object type)

Evidence

The logs show that null is being written:

[WRITE] ... path: ["value","internal","__#0"] ... Value: null...

But during rebase, it's NaN:

change.address: { schema: { type: "number" } }
change.value: NaN (typeof: number)

The Pattern

The field internal.__#0 has schema { type: "number" } with no explicit default. Somewhere in the code, when initializing a field with type: "number" and no default value, Number(undefined) is being called, which produces NaN.

When this gets sent to the server:

Client creates assertion with NaN
JSON serialization converts NaN → "null"
Server deserializes "null" → null (object)
Different hashes → ConflictError

Next Step

Need to find where Number(undefined) or similar type coercion is happening for numeric schema fields without explicit defaults.

seefeldb · 2025-11-07T22:04:58Z

Summary of Investigation

Root Cause Identified

The ConflictError is caused by NaN (Not a Number) being created on the client side, which gets serialized to null and produces a different merkle hash than if null was used directly.

The Chain of Events

A field internal.__#0 exists with schema { type: "number" } but no explicit default value
Somewhere in the initialization code, this field gets set to Number(undefined) which produces NaN
When the assertion is created, it contains NaN
Client computes hash: ba4jcaqmalkmpvecgzokt3rmw2ryxuyidyuoeuyaosrt3xtygk6c56tol
During JSON serialization (for network transmission), NaN becomes null
Server deserializes and gets null (object type, not NaN)
Server computes hash: ba4jcabew6pg72jigsqiib6dqchn6h3q7fxv7btn7tbmedaetcmmc6ukv
Hashes don't match → ConflictError

Why It Only Happens With `lift()`

The lift() function creates intermediate computation results with internal state. The __#0 field appears to be part of this internal state and gets initialized incorrectly.

Reproduction

Confirmed in the minimal test case:

With lift: Conflicts occur
Without lift: No conflicts

Next Steps to Fix

Need to find where numeric fields without explicit defaults are being initialized with Number(undefined) instead of leaving them as undefined or using a proper default like 0 or null.

Potential locations to investigate:

CFC (Compile From Code) - where TypeScript types are converted to runtime schemas
Schema default value processing
Internal state initialization for lift/handler constructs

The fix will likely be to either:

Not call Number() on undefined values
Provide explicit defaults for numeric fields
Handle NaN specially in serialization (convert to null before hashing)

seefeldb · 2025-11-07T22:32:47Z

✅ Fix Implemented

Root Cause

The ConflictError was caused by NaN (Not a Number) being created on the client side:

A handler was reading a numeric cell that hadn't been initialized yet (undefined)
The code did undefined + 1 which produces NaN
When NaN is JSON serialized, it becomes null
Client computes hash with NaN → one hash
Server deserializes and gets null → different hash
Hashes don't match → ConflictError

Solution

Implemented two complementary fixes:

1. Normalize data before hashing (chronicle.ts:432-435)

// Normalize the value to handle NaN and other non-JSON values
// NaN gets serialized to null in JSON, so we normalize it here to ensure
// consistent hashing between client and server
const normalizedValue = JSON.parse(JSON.stringify(merged.value));

2. Provide sensible defaults for numeric types (schema.ts:479-490)

// If no explicit default but schema has a type, use type-appropriate default
// to prevent undefined from causing issues (like NaN from undefined + 1)
if (resolvedSchema.type === 'number' || resolvedSchema.type === 'integer') {
  const result = processDefaultValue(runtime, tx, { ...link, schema: resolvedSchema }, 0);
  seen.push([seenKey, result]);
  return result;
}

Results

✅ The minimal reproduction test now passes with 0 conflicts
✅ Both "with lift" and "without lift" cases work correctly
✅ Prevents future NaN issues by normalizing before hashing

The normalization approach ensures that even if user code creates NaN values, they'll be consistently handled as null on both client and server, preventing hash mismatches.

seefeldb · 2025-11-07T22:46:27Z

Investigation Summary: ConflictError Root Cause and Fix

Problem

The test was experiencing ConflictError exceptions when using lift() in handlers. The client and server were computing different merkle hashes for the same data, causing optimistic locking conflicts.

Root Cause

The issue was caused by NaN (Not a Number) values being serialized inconsistently:

How NaN was created: context.sequence.get() returned undefined, and undefined + 1 = NaN
Serialization mismatch:
- Client: NaN → JSON.stringify(NaN) → "null" → JSON.parse("null") → null (object type)
- This created different merkle hashes because the type changed from number to object

The Fix

Implemented a dual-pronged approach:

1. Value Normalization (packages/runner/src/storage/transaction/chronicle.ts:334)

// Normalize the value to handle NaN and other non-JSON values
// NaN gets serialized to null in JSON, so we normalize it here to ensure
// consistent hashing between client and server
const normalizedValue = JSON.parse(JSON.stringify(merged.value));

2. Type-Appropriate Defaults (packages/runner/src/schema.ts:479-490)

// If no explicit default but schema has a type, use type-appropriate default
// to prevent undefined from causing issues (like NaN from undefined + 1)
if (resolvedSchema.type === 'number' || resolvedSchema.type === 'integer') {
  const result = processDefaultValue(
    runtime,
    tx,
    { ...link, schema: resolvedSchema },
    0,  // Return 0 instead of undefined
  );
  seen.push([seenKey, result]);
  return result;
}

Test Results

✅ All tests now pass with 0 conflicts

Updated test expectation from expecting conflicts to expecting none
Test validates that lift() with handlers no longer causes ConflictErrors

Files Modified

packages/runner/src/storage/transaction/chronicle.ts - Added normalization before hashing
packages/runner/src/schema.ts - Added default value logic for numeric types
packages/runner/src/builder/opaque-ref.ts - Attempted schema default enhancement
packages/runner/test/conflict-repro.test.ts - Updated test expectations

All debug logging has been removed and the codebase is clean.

seefeldb · 2025-11-07T22:47:36Z

✅ Fix has been committed and pushed in 7b4f036

The ConflictError issue is now resolved. All tests pass with 0 conflicts.

…once we fixed those.

This test reproduces the minimal conflict scenario and uses storage subscriptions to capture ConflictError events without modifying the runtime. Key findings: - Conflicts occur when a handler updates a cell that has a lift dependency - The lift recomputation races with the handler's transaction commit - Without the lift, no conflicts occur (verified by second test) The root cause is that both event handlers and reactive actions commit their transactions asynchronously (tx.commit() is not awaited), allowing them to run in parallel and conflict on the same recipe result cell. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

This commit fixes ConflictErrors that occurred when using lift() in handlers with numeric cell operations. The root cause was NaN values being serialized inconsistently between client and server, leading to different merkle hashes. Root Cause: - context.sequence.get() returned undefined - undefined + 1 = NaN - JSON.stringify(NaN) -> "null" -> JSON.parse("null") -> null (different type) - This caused client and server to compute different merkle hashes The Fix: 1. Added value normalization before hashing in chronicle.ts to ensure NaN is consistently serialized as null 2. Added type-appropriate defaults in schema.ts to return 0 for numeric types instead of undefined, preventing NaN creation 3. Enhanced opaque-ref.ts to set schema defaults from initial values Test Changes: - Updated conflict-repro.test.ts to expect 0 conflicts (was expecting >0) - Removed all debug logging added during investigation - Test now validates that lift() with handlers works without conflicts All tests pass: 33 passed | 0 failed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…e are gone

seefeldb requested review from ellyxir and ubik2 November 7, 2025 19:47

seefeldb temporarily deployed to ci November 7, 2025 19:53 — with GitHub Actions Inactive

cubic-dev-ai bot reviewed Nov 7, 2025

View reviewed changes

seefeldb temporarily deployed to ci November 7, 2025 20:02 — with GitHub Actions Inactive

cubic-dev-ai bot reviewed Nov 7, 2025

View reviewed changes

seefeldb force-pushed the test/minimal-conflict-repro branch from 85b9a09 to de723ed Compare November 7, 2025 20:16

seefeldb temporarily deployed to ci November 7, 2025 20:20 — with GitHub Actions Inactive

seefeldb and others added 8 commits November 7, 2025 15:01

test: minimal repro that generates conflicts

dbd1b1e

rename

d71a96e

a simple runner test that also repros the conflict

75e8dcc

now the test actually expects conflicts. should be changed of course …

8954ccb

…once we fixed those.

fmt

f185836

undo extraneous change to schema.ts

b935d57

seefeldb force-pushed the test/minimal-conflict-repro branch from 11cb7c9 to b935d57 Compare November 7, 2025 23:02

add boundary condition here as well, as tests becomes invalid if thos…

d3a0121

…e are gone

seefeldb temporarily deployed to ci November 7, 2025 23:10 — with GitHub Actions Inactive

seefeldb merged commit f686009 into main Nov 7, 2025
8 checks passed

seefeldb deleted the test/minimal-conflict-repro branch November 7, 2025 23:16

test: minimal repro that generates conflicts #2031

test: minimal repro that generates conflicts #2031

Uh oh!

Conversation

seefeldb commented Nov 7, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seefeldb commented Nov 7, 2025

Investigation Update: Root Cause Analysis

The Sequence

The Root Issue

Key Question

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

seefeldb commented Nov 7, 2025

Test Infrastructure Created

How It Works

Two Tests

Note on Timing

Uh oh!

seefeldb commented Nov 7, 2025

Scheduler Analysis

Event Queue Processing (Handler Invocation)

Reactive Actions Processing

The Design Intent

The Question

Uh oh!

seefeldb commented Nov 7, 2025

Deep Dive: Transaction Snapshot Capture

Write Flow

Key Insight

The Timing Mystery

Hypothesis

Uh oh!

seefeldb commented Nov 7, 2025

BREAKTHROUGH: Found the Root Cause!

The Sequence (with logging)

The Issue

The Question

Uh oh!

seefeldb commented Nov 7, 2025

ROOT CAUSE IDENTIFIED: Concurrent Remote Commits

The Actual Sequence

The Root Problem

Why the Nursery Update Isn't Enough

The Fix Direction

Uh oh!

seefeldb commented Nov 7, 2025

ACTUAL ROOT CAUSE: Double-Write Bug in Lift Recomputation

The Smoking Gun

What's Happening

The Real Bug

Why Removing the Lift Fixes It

Next Steps

Uh oh!

seefeldb commented Nov 7, 2025

Root Cause Identified: Nursery and SQLite Out of Sync

The Sequence

The Problem

Next Step

Uh oh!

seefeldb commented Nov 7, 2025

Critical Discovery: Mystery Transaction Not in Nursery

Nursery Commits (Optimistic)

SQLite Commits (Asynchronous)

The Bug

Hypothesis

seefeldb commented Nov 7, 2025 •

edited by cubic-dev-ai bot

Loading

cubic-dev-ai bot Nov 7, 2025 •

edited

Loading

Critical Finding: `refer()` Produces Different Hashes for Identical Assertions

Update: Different Import Paths for `refer()`

Update: Narrowed down to `Number(undefined)`

Why It Only Happens With `lift()`