-
Notifications
You must be signed in to change notification settings - Fork 10
test: minimal repro that generates conflicts #2031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 2 files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed changes from recent commits (found 1 issue).
1 issue found across 1 file
Prompt for AI agents (all 1 issues)
Understand the root cause of the following 1 issues and fix them.
<file name="packages/runner/test/conflict-repro.test.ts">
<violation number="1" location="packages/runner/test/conflict-repro.test.ts:112">
Await the transaction commit so the test does not continue before the asynchronous operation completes, avoiding race conditions.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
| const result = runtime.run(tx, conflictRepro, { | ||
| items: [{ id: "test" }], | ||
| }, resultCell); | ||
| tx.commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Await the transaction commit so the test does not continue before the asynchronous operation completes, avoiding race conditions.
Prompt for AI agents
Address the following comment on packages/runner/test/conflict-repro.test.ts at line 112:
<comment>Await the transaction commit so the test does not continue before the asynchronous operation completes, avoiding race conditions.</comment>
<file context>
@@ -0,0 +1,130 @@
+ const result = runtime.run(tx, conflictRepro, {
+ items: [{ id: "test" }],
+ }, resultCell);
+ tx.commit();
+
+ await runtime.idle();
</file context>
| tx.commit(); | |
| await tx.commit(); |
Investigation Update: Root Cause AnalysisI've traced through the conflict scenario and found the issue. Here's what's happening: The Sequence
The Root IssueThe conflict happens because both transactions commit to SQLite asynchronously and in parallel, racing with the same expected version. The design assumes:
But what's actually happening is:
Key QuestionWhen TX-4 (lift) reads the recipe result cell to build its transaction, the Replica.get() method should check nursery first (line 1142), then heap. If it's reading from the nursery correctly, it should see TX-3's V2 and use that as the expected version, avoiding the conflict. The conflict indicates TX-4 is reading an older version (V1), which suggests either:
Need to trace exactly when the transaction snapshot is captured vs when the nursery is updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 1 file
Test Infrastructure CreatedI've created a test that captures ConflictErrors without modifying the runtime using the storage subscription mechanism: Location: packages/runner/test/conflict-repro.test.ts How It WorksThe test subscribes to storage notifications to capture "revert" events with ConflictError: storageManager.subscribe({
next: (notification: StorageNotification) => {
if (notification.type === "revert" && notification.reason.name === "ConflictError") {
conflictErrors.push(notification.reason);
}
return undefined;
},
});Two Tests
This confirms the conflict only occurs when a reactive dependency (lift) exists on a cell being updated by a handler. Note on TimingThe conflict happens asynchronously after await new Promise((resolve) => setTimeout(resolve, 100));This is because the ConflictError occurs during the remote transaction commit, not during the local nursery update. |
Scheduler AnalysisLooking at the scheduler code, the execution sequence is: Event Queue Processing (Handler Invocation)From scheduler.ts:487-536: const event = this.eventQueue.shift();
if (event) {
const finalize = (error?: unknown) => {
// ...
tx.commit().then(({ error }) => { // Line 498 - NOT awaited!
if (error && retriesLeft > 0) {
this.eventQueue.unshift({ action, retriesLeft: retriesLeft - 1, onCommit });
this.queueExecution();
}
});
};
this.runningPromise = Promise.resolve(
this.runtime.harness.invoke(() => action(tx))
).then(() => finalize()).catch((error) => finalize(error));
await this.runningPromise; // Line 532 - waits for handler to finish
}The handler runs and finishes (line 532), but Reactive Actions ProcessingFrom scheduler.ts:538-565: const order = topologicalSort(this.pending, this.dependencies);
for (const fn of order) {
// ...
await this.run(fn); // Line 563
}The reactive actions (lift recomputation) run immediately after the handler, each calling The Design IntentComments at scheduler.ts:244-249 explain:
This is the correct design - the local nursery should be updated synchronously, allowing subsequent transactions to see the changes immediately. The async part is just remote confirmation. The QuestionIf the nursery update is synchronous (which it is, at cache.ts:990), and reads check the nursery first (which they do, at cache.ts:1142), then why does TX-4 have the wrong expected version? The conflict error shows TX-4 expected |
85b9a09 to
de723ed
Compare
Deep Dive: Transaction Snapshot CaptureI've traced through the entire read/write flow to understand when the "expected version" is captured. Here's what happens: Write FlowWhen a transaction writes to a cell:
Key InsightThe expected version is captured when The Timing MysteryGiven this flow, TX-4 (lift) should see TX-3's (handler) changes:
But the conflict shows TX-4 has the OLD expected version! HypothesisThe async Need to verify: Is there any caching or snapshot isolation at the replica level that could cause stale reads even though nursery is updated? |
BREAKTHROUGH: Found the Root Cause!Added detailed logging to trace when base versions are captured. The sequence reveals the actual problem: The Sequence (with logging)The IssueTX-4 (the lift recomputation) writes to the cell TWICE, and the second write captures the SAME base version as the first! The conflict error confirms this:
So TX-4:
The QuestionWhy is TX-4 writing the same cell twice, and why doesn't the second write see the first write's changes in the nursery? This suggests the Chronicle/transaction is not seeing its own previous writes within the same transaction, OR there's some caching issue where the branch doesn't reflect the novelty from the first write. Need to investigate why the transaction writes twice and why the second write doesn't see the first. |
ROOT CAUSE IDENTIFIED: Concurrent Remote CommitsAfter adding detailed logging, I found the exact issue. Here's the sequence with the actual causes: The Actual SequenceThe Root ProblemTX-3 and TX-4 are both trying to commit to SQLite concurrently, even though:
But then: Why the Nursery Update Isn't EnoughThe nursery update at cache.ts:990 happens immediately, which is correct for optimistic reads. But the remote/SQLite commit at cache.ts:1007-1011 happens asynchronously via The async The Fix DirectionThe system needs to serialize remote commits, not just nursery updates. Possible approaches:
The current retry logic purges from nursery and retries with stale data, creating an infinite loop of conflicts. |
ACTUAL ROOT CAUSE: Double-Write Bug in Lift RecomputationYou were right - it shouldn't conflict! After adding detailed logging to see what's being written, I found the real issue: The Smoking GunThe lift recomputation is writing the same cell property twice in a single transaction, both with the same base version! What's HappeningWhen the reactive action (TX-4) runs to recompute the lift:
The Real BugThis is NOT a concurrency issue at all! It's a double-write bug in the reactive lift recomputation logic. The lift result (
This suggests the reactive system is processing the lift update twice, or there's a bug in how derived cell updates are written to the transaction. Why Removing the Lift Fixes ItWithout the lift, there's no Next StepsNeed to investigate why the lift recomputation writes the derived cell result twice. This is likely in the runner's recipe execution or the lift implementation itself. The conflict retry loop was a red herring - the real issue is that a single transaction is writing the same path twice, causing the second write to expect the first write to already be committed. |
Root Cause Identified: Nursery and SQLite Out of SyncI've added comprehensive logging to track the exact sequence of SQLite SWAP operations and nursery operations. The logs reveal the root cause: The SequenceNursery operations (optimistic, immediate):
SQLite batch commit (happens later, asynchronously): The ProblemWhen TX-4 tries to commit to SQLite, it expects the nursery version ( This means another transaction committed to SQLite between:
The nursery is supposed to be a staging area that mirrors what will be in SQLite, but they're getting out of sync. A transaction that's not in the nursery is making it into SQLite first. Next StepNeed to identify:
|
Critical Discovery: Mystery Transaction Not in NurseryThe logging reveals a critical synchronization bug: Nursery Commits (Optimistic)SQLite Commits (Asynchronous)The BugThere are 4 nursery commits but at least one additional SQLite transaction that never went through the nursery! The mystery transaction:
This breaks the fundamental invariant: The nursery should contain all pending changes before they hit SQLite. Instead, a transaction is bypassing the nursery and committing directly to SQLite, causing TX-4 to conflict. HypothesisThe mystery transaction is likely triggered by:
The key issue is that this transaction is reading directly from SQLite (finding TX-3's committed version) rather than reading from the nursery (which already has TX-4's pending changes). |
Final Root Cause: Reactive Computation Bypassing NurseryAfter comprehensive logging, I've identified the exact issue: The SequenceHandler Execution (Lines in Chronicle):
Async SQLite Batch (4 separate transactions):
The BugThe mystery transaction:
This breaks the invariant that all transactions must read from and write to the nursery before committing to SQLite. HypothesisThe mystery transaction is most likely the lift (reactive computation) that's watching the
The lift should either:
This explains why the conflict only happens with the lift - without it, there's no reactive computation creating the mystery transaction. |
Mystery Transaction Solved: Server Re-computation Produces Different CauseAfter adding comprehensive logging across the client-server boundary, I've identified the exact issue: Complete Transaction FlowClient sends 4 transactions:
Server processes all 4:
The Root CauseThe client computes TX-3's output as This happens because:
Then TX-4 conflicts because:
Why This MattersThis isn't a "mystery extra transaction" - it's non-deterministic transaction processing. The same transaction inputs produce different cause hashes on client vs server, breaking optimistic concurrency. The issue is likely:
Next step: Determine why TX-3's server-side processing produces a different cause hash than client-side. |
Root Cause Found: Client Sends Different Base Than What It Refers ToAfter logging the exact values used to compute cause hashes on both sides, I found the smoking gun: Client TX-3Fact being referred (base): "argument": {
"items": [{"id": "test"}] ← Base has test item
}New value being asserted: "argument": {
"items": [] ← New value is empty
}Computed cause: Server TX-3Assertion received: "argument": {
"items": [] ← Only receives the new value
}Computed fact: The ProblemThe client computes its cause hash by calling But the server receives only the NEW state ( This is the mismatch:
The protocol appears to assume the client sends both:
But somewhere in the transmission, the server is using This explains why the mystery transaction |
Critical Finding:
|
Update: Different Import Paths for
|
Root Cause Found!I've identified the exact cause of the ConflictError: The ProblemOn the client side, the assertion contains EvidenceEven though both
When these objects are passed to
This causes the optimistic locking conflict because the server receives an assertion with a different hash than it computed. Next StepsNeed to find where |
Update: Narrowed down to
|
Summary of InvestigationRoot Cause IdentifiedThe ConflictError is caused by The Chain of Events
Why It Only Happens With
|
✅ Fix ImplementedRoot CauseThe ConflictError was caused by
SolutionImplemented two complementary fixes: 1. Normalize data before hashing (chronicle.ts:432-435) // Normalize the value to handle NaN and other non-JSON values
// NaN gets serialized to null in JSON, so we normalize it here to ensure
// consistent hashing between client and server
const normalizedValue = JSON.parse(JSON.stringify(merged.value));2. Provide sensible defaults for numeric types (schema.ts:479-490) // If no explicit default but schema has a type, use type-appropriate default
// to prevent undefined from causing issues (like NaN from undefined + 1)
if (resolvedSchema.type === 'number' || resolvedSchema.type === 'integer') {
const result = processDefaultValue(runtime, tx, { ...link, schema: resolvedSchema }, 0);
seen.push([seenKey, result]);
return result;
}Results
The normalization approach ensures that even if user code creates |
Investigation Summary: ConflictError Root Cause and FixProblemThe test was experiencing Root CauseThe issue was caused by
The FixImplemented a dual-pronged approach: 1. Value Normalization (packages/runner/src/storage/transaction/chronicle.ts:334)// Normalize the value to handle NaN and other non-JSON values
// NaN gets serialized to null in JSON, so we normalize it here to ensure
// consistent hashing between client and server
const normalizedValue = JSON.parse(JSON.stringify(merged.value));2. Type-Appropriate Defaults (packages/runner/src/schema.ts:479-490)// If no explicit default but schema has a type, use type-appropriate default
// to prevent undefined from causing issues (like NaN from undefined + 1)
if (resolvedSchema.type === 'number' || resolvedSchema.type === 'integer') {
const result = processDefaultValue(
runtime,
tx,
{ ...link, schema: resolvedSchema },
0, // Return 0 instead of undefined
);
seen.push([seenKey, result]);
return result;
}Test Results✅ All tests now pass with 0 conflicts
Files Modified
All debug logging has been removed and the codebase is clean. |
|
✅ Fix has been committed and pushed in 7b4f036 The ConflictError issue is now resolved. All tests pass with 0 conflicts. |
…once we fixed those.
This test reproduces the minimal conflict scenario and uses storage subscriptions to capture ConflictError events without modifying the runtime. Key findings: - Conflicts occur when a handler updates a cell that has a lift dependency - The lift recomputation races with the handler's transaction commit - Without the lift, no conflicts occur (verified by second test) The root cause is that both event handlers and reactive actions commit their transactions asynchronously (tx.commit() is not awaited), allowing them to run in parallel and conflict on the same recipe result cell. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes ConflictErrors that occurred when using lift() in handlers
with numeric cell operations. The root cause was NaN values being serialized
inconsistently between client and server, leading to different merkle hashes.
Root Cause:
- context.sequence.get() returned undefined
- undefined + 1 = NaN
- JSON.stringify(NaN) -> "null" -> JSON.parse("null") -> null (different type)
- This caused client and server to compute different merkle hashes
The Fix:
1. Added value normalization before hashing in chronicle.ts to ensure NaN
is consistently serialized as null
2. Added type-appropriate defaults in schema.ts to return 0 for numeric
types instead of undefined, preventing NaN creation
3. Enhanced opaque-ref.ts to set schema defaults from initial values
Test Changes:
- Updated conflict-repro.test.ts to expect 0 conflicts (was expecting >0)
- Removed all debug logging added during investigation
- Test now validates that lift() with handlers works without conflicts
All tests pass: 33 passed | 0 failed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
11cb7c9 to
b935d57
Compare
This reliable generates these conflicts for me. Note that this is using the in-memory sqlite test setup, so it's definitely single-player!
added a runner test as well, same effect. should make it easier to debug:
deno test -A test/conflict-repro.test.tsin packages/runnerSummary by cubic
Fixed a ConflictError when a lifted map depends on items and a handler clears items and increments a sequence. Normalized values before hashing and added a minimal repro with tests to confirm no conflicts.
Written for commit d3a0121. Summary will update automatically on new commits.