Skip to content

refactor the reconciliation process to be simpler and more maintainable #634

@samwillis

Description

@samwillis

Reconciliation Refactor

The current reconciliation pipeline has accumulated complexity after several rounds of fixes for truncate behaviour, optimistic rebasing, and duplicate-suppression bugs. Maintaining it has become risky: much of the flow is synchronous/CPU-bound but still entwined with batching flags and per-key delete replays for truncates, making the code brittle even without heavy IO. This plan aligns the system around a simpler contract—explicit truncate events, a single optimistic rebase path, and a shared change stream—so we can reason about the data flow, reduce footguns, and unblock future improvements.

1. Goals & Scope

  • Reduce reconciliation complexity while preserving correctness guarantees: no missing/phantom events, consistent indexes, stable lifecycle state.
  • Unify the change stream for live queries and index maintenance, adding explicit truncate semantics.
  • Keep optimistic UI responsive with a predictable, easy-to-audit rebasing routine.
  • Make the observable behaviour for insert/update/delete compatible with today’s tests, while documenting any intentional relaxations (e.g. allowing extra update noise when values truly change).

Out of scope: rewriting transaction orchestration, changing persistence APIs, or altering subscription filtering beyond what the new event contract requires.

2. Change-Event Contract

  1. Extend ChangeMessage.type to insert | update | delete | truncate.
    • truncate has shape { type: 'truncate', metadata? } with no key, value, or previousValue.
    • insert/update/delete retain their current fields.
  2. CollectionChangesManager.emitEvents and CollectionSubscription.filterAndFlipChanges treat truncate as a stream-wide reset:
    • emit it immediately (bypass deferral—don't wait for pending sync transactions),
    • skip key-based flipping logic,
    • allow subsequent inserts in the same batch (from the same commit),
    • expect subscriptions to clear caches and live queries to resnapshot on receiving truncate.
  3. Indexes gain a reset() entry point so a single truncate clears their internal state before replaying subsequent inserts.
  4. Live-query consumers clear their local caches and resnapshot on truncate rather than depending on synthetic delete batches. Add regression tests for this behaviour.

3. Core State & Simplifications

  • Retain syncedData, syncedMetadata, transactions, optimisticUpserts, optimisticDeletes, and size.
  • Drop persistent recentlySyncedKeys/lastSyncContext. Callers pass syncedKeys into rebaseOptimistic; for sync commits this set equals the keys present in syncedDiff.
  • During a sync commit build syncedDiff: Map<TKey, { before?: TOutput; after?: TOutput }> while applying server operations. This map drives event generation, duplicate suppression, and provides syncedKeys for the optimistic pass.
  • Capture visibleBeforeByKey per commit (by calling this.get(key) before applying changes). No long-lived map required.

4. High-Level Flow

4.1 Local transaction changes

  1. Transaction mutates → onTransactionStateChange fires.
  2. shouldBatchEvents = pendingSyncedTransactions.length > 0 (unchanged heuristic).
  3. Invoke rebaseOptimistic({ reason: 'local-change', truncate: false, syncedKeys: emptySet, syncedDiff: null }).
  4. Using the returned events: update size, call indexes.updateIndexes(events), then emit (forceEmit = shouldBatchEvents === false).

4.2 Sync transaction commit

  1. Collect & capture
    • Collapse committed pending sync transactions into an ordered list of operations plus truncateSeen.
    • Build visibleBeforeByKey by calling this.get(key) for each touched key before mutating state.
  2. Apply authoritative changes
    • If truncateSeen, push a truncate event into syncEvents, clear syncedData, syncedMetadata, and syncedKeys, and mark that indexes must reset() before processing subsequent events.
    • For each operation:
      • Update syncedMetadata and syncedData (respecting rowUpdateMode).
      • Record { before, after } in syncedDiff using visibleBeforeByKey and the mutated authoritative value.
      • Add key to syncedKeys.
      • If key has an optimistic overlay (in optimisticDeletes or optimisticUpserts): skip creating a syncEvent. The rebase in phase 3 will handle this key since it's in syncedKeys and will generate the appropriate event for the user-visible change.
      • Otherwise push { type, key, value: type === 'delete' ? before : after, previousValue: before } into syncEvents.
  3. Reapply optimistic layer & emit once
    • Call rebaseOptimistic({ reason: 'sync-commit', truncate: truncateSeen, syncedKeys, syncedDiff }) to rebuild optimistic state and obtain optimisticEvents.
    • Concatenate syncEvents + optimisticEvents into a single ordered batch.
    • Use that batch to: (a) reset indexes if needed, (b) call indexes.updateIndexes(batch), (c) adjust size (truncate ⇒ reset to 0 first, then apply per-event deltas), and finally (d) emit with forceEmit = true.
    • Lifecycle hooks (markReady, first-commit callbacks, GC timers) run after the combined batch is emitted.

5. rebaseOptimistic Specification

Inputs

  • reason: 'local-change' | 'sync-commit' (diagnostic only).
  • truncate: boolean.
  • syncedKeys: Set<TKey> (empty for local-only recomputes).
  • syncedDiff: Map<TKey, { before?: TOutput; after?: TOutput }> or null.

Algorithm

  1. Snapshot previous optimistic state:

    • If truncate, treat previousUpserts/previousDeletes as empty.
    • Otherwise reference the current optimisticUpserts/optimisticDeletes (no cloning yet).
  2. Seed new maps:

    • nextUpserts starts as a shallow clone of previousUpserts, excluding keys in syncedKeys (because these keys' authoritative base just changed and must be recomputed).
    • nextDeletes clones previousDeletes and drops entries in syncedKeys for the same reason.
    • Keys NOT in syncedKeys preserve their object references, keeping the "same ref ⇒ no event" fast path.
  3. Apply active transactions (in createdAt order via SortedMap).

    • Optimization: when reason === 'sync-commit', only reprocess mutations whose keys are in syncedKeys (their authoritative base changed). Keys outside that set were already cloned verbatim in step 2.
    • Inserts/updates: merge with authoritative base (syncedData) when needed and write into nextUpserts. We do not run a deep equality check afterwards—the merged object becomes the new reference. Trade-off: this can emit extra update events compared to today’s deep-equals suppression, but it removes one of the most expensive comparisons from the hot path. Existing tests will highlight any unacceptable noise.
    • Deletes: remove from nextUpserts, add to nextDeletes.

    Frequency note: rebaseOptimistic runs after each transaction mutate/cleanup and directly after each sync commit. By restricting the sync-commit case to syncedKeys, we avoid full rescans of unrelated optimistic data.

  4. Swap this.optimisticUpserts/this.optimisticDeletes with the new maps.

  5. Derive events using the matrix below. Evaluate authoritativeAfter(key) as syncedDiff?.get(key)?.after ?? this.syncedData.get(key) and authoritativeBefore(key) as syncedDiff?.get(key)?.before ?? this.syncedData.get(key).

    Note: For sync commits, syncedDiff contains the captured before from visibleBeforeByKey; for local changes, both fall back to the unchanged this.syncedData.

Previous \ New new upsert present new delete present not in new optimistic state
previous upsert present • same ref ⇒ no event
• different ref ⇒ update (value: new upsert, previousValue: old upsert)
delete (value: previous upsert) • key ∈ syncedKeys AND previous upsert === authoritativeAfter ⇒ no event
• authoritativeAfter exists ⇒ update (value: authoritativeAfter, previousValue: previous upsert)
• no authoritative ⇒ delete (value: previous upsert)
previous delete present insert (value: new upsert) no event • authoritativeAfter exists ⇒ insert (value: authoritativeAfter)
• no authoritative ⇒ no event
not previously optimistic • authoritativeBefore exists ⇒ update (value: new upsert, previousValue: authoritativeBefore)
• no authoritativeBefore ⇒ insert (value: new upsert)
• authoritativeBefore exists ⇒ delete (value: authoritativeBefore)
• no authoritativeBefore ⇒ no event

Notes:

  • The matrix shows which event type to emit and what values to use based on the before/after state
  • "Same ref" comparison uses === (reference equality)
  • The deduplication check for synced keys (row 1, col 3) prevents duplicate events when an optimistic change matches what the server committed
  • "no event" cases occur when there's no user-visible change (undefined → undefined, or same reference)
  1. Return { events, touchedKeys: allTouched }, where allTouched is the union of all keys from previousUpserts, nextUpserts, previousDeletes, nextDeletes, and syncedKeys. The caller uses this for index updates and size deltas.

6. Metadata, Indexes, and Size

  • Metadata updates stay in the authoritative apply phase. truncate clears the entire map; inserts/updates refresh entries; deletes remove them.
  • Index workflow per emission:
    1. Inspect the batch; if it contains a truncate, call indexes.reset() once before applying subsequent events.
    2. Call indexes.updateIndexes(events) with the full ordered batch (local or sync).
  • size maintenance mirrors the index workflow:
    • When a batch contains truncate, reset size = 0 before applying deltas.
    • For each event in the batch: insert+1, delete-1, update0.
    • Apply the net delta once per batch (local or sync) before emitting to listeners.

7. Duplicate Suppression Workflow

  • Authoritative phase: we omit per-key sync events when there is an active optimistic overlay, ensuring the optimistic layer owns emission for those keys.
  • Optimistic phase: during matrix evaluation we skip events for syncedKeys when the optimistic state now matches the authoritative after. This covers cases where a server commit completes an optimistic mutation without re-emitting the same payload.
  • Combined effect: when the server value differs from the optimistic one, the optimistic pass emits the change; when they match, neither phase emits duplicates. Tests should include examples such as an optimistic insert followed by an identical server insert.

8. Testing & Validation

  1. Unit tests for rebaseOptimistic covering:
    • Insert/update/delete permutations, rollback scenarios, truncate rebuild.
    • Behaviour when syncedKeys is empty vs populated.
    • Cases where optimistic overlays exist during sync commits to confirm the deduplication flow described above.
  2. Integration tests for sync flow:
    • Server-only updates.
    • Server updates clashing with optimistic inserts and deletes.
    • Truncate followed by rebuild: subscribers see [truncate, …inserts], indexes reset once, size returns to match the new data.
  3. Regression tests for transaction rollback and concurrent mutations to confirm we still emit the correct rebased sequences.
  4. Subscription tests verifying live queries clear and resnapshot on truncate.

9. Implementation Order

  1. Land the event-contract updates (ChangeMessage type, subscription/index handling, live-query truncate support) with accompanying tests.
  2. Refactor live query/subscription plumbing to honour truncate (see §10) while keeping existing reconciliation logic.
  3. Replace the reconciliation internals (phased commit + new rebaseOptimistic) once the new event contract and live-query plumbing are in place.

This sequencing keeps the blast radius manageable: we teach every consumer about truncate first, then simplify the reconciliation logic.

10. Live Query & Subscription Refactor

CollectionChangesManager adjustments

  • Detect truncate in every emission. When present, bypass batching (forceEmit semantics) and deliver the event immediately after notifying subscribers to reset state.
  • Introduce a handleTruncate() helper that iterates active subscriptions before emitting. This helper should:
    • call each subscription’s handleTruncate hook (see below),
    • reuse the existing cleanup() logic for batching (clear batchedEvents, set shouldBatchEvents = false) instead of duplicating that reset code,
    • treat the emission as a synchronous flush (no microtasks) so subsequent inserts flow through as a fresh stream.
  • When truncate is emitted alongside other changes in the same batch, ensure the single-event appears first so subscribers reset prior to applying new inserts.
  • Keep the reset logic synchronous—no microtasks between handleTruncate() and event emission—so there is no window where consumers see post-truncate inserts before their local caches clear.

CollectionSubscription adjustments

  • Add a handleTruncate() method that reuses the same state-reset logic we hit during unsubscribe/cleanup (factor that code into a shared helper):
    • clear sentKeys,
    • reset snapshotSent and loadedInitialState flags so the next snapshot behaves like an initial load,
    • forward the truncate change unfiltered to the callback (skip filterAndFlipChanges).
  • Update emitEvents to handle truncate specially when present in a batch:
    • call handleTruncate() once,
    • emit the batch as-is (truncate should already be first), bypassing standard flip/filter logic,
    • subsequent inserts in the same batch flow through unfiltered since subscribers just reset their state.
  • Ensure ordered snapshot helpers (requestSnapshot, requestLimitedSnapshot) remain valid by recognising a truncate-induced reset and re-requesting data when necessary.

Live query consumers

  • The Collection surface that exposes live queries should react to truncate by:
    • invoking the existing cleanup/reset helpers (the same ones used when the collection is GC’d) to clear cached results or memoized selectors,
    • prompting a resubscription/snapshot if auto-loading is enabled,
    • exposing the raw truncate event to consumers who wish to show loading indicators.
  • Update docs and typings so downstream code knows ChangeMessage['type'] includes truncate with no key/value payload.

Testing checklist

  • Unit test CollectionSubscription.emitEvents covering truncate-only, truncate+insert, and subsequent snapshot requests.
  • Integration test change streams to ensure indexes, subscribers, and live queries all observe [truncate, insert…] and rehydrate correctly.
  • Regression test batching logic: pending sync transactions should not defer a truncate emission.

Appendix A – Legacy Code to Retire

Once the refactor lands, these pieces of the current implementation become obsolete and should be removed or folded into the new flow:

  • CollectionStateManager.recentlySyncedKeys and all logic manipulating it (state.ts:51, 272-280, 780-787, 835-838). Per-call syncedKeys replaces this set.
  • CollectionStateManager.preSyncVisibleState and capturePreSyncVisibleState (state.ts:50, 464-475, 777-858). Visibility is now captured inline during each commit.
  • CollectionStateManager.isCommittingSyncTransactions guard (state.ts:52, 219-222, 657-659). Sequential processing makes the flag redundant.
  • Legacy commitPendingTransactions body (state.ts:411-792), including truncate reapply logic and completedOptimisticOps dedupe (state.ts:480-757). The new three-phase commit replaces it outright.
  • CollectionStateManager.recomputeOptimisticState, collectOptimisticChanges, and getPreviousValue helpers (state.ts:216-408). These give way to the new rebaseOptimistic implementation.
  • CollectionStateManager.calculateSize helper (state.ts:333-346). Size will be maintained via per-event deltas.
  • deepEquals import and reconciler-only comparisons (state.ts:1, 718-751). Remove or relocate if no other call sites remain.
  • CollectionChangesManager batching state beyond a simple flush trigger (changes.ts:24, 57-86, 161-164). Single-batch emission lets us collapse this.
  • Transaction hook that eagerly calls commitPendingTransactions when sync transactions exist (transactions.ts:396-420). After the refactor, touchCollection should only trigger the optimistic rebase.
  • Tests or utilities referencing the removed fields (e.g. recentlySyncedKeys, preSyncVisibleState). Update or drop them alongside the code changes.

Take care to audit tests for references to these structures so they are updated or removed alongside the implementation changes.

Appendix B – Critical Comparison

Advantages of the proposed design

  • Unified change stream: explicit truncate events and a single optimistic rebase pipeline simplify reasoning and remove the need for ad-hoc per-key delete batches.
  • Deterministic batching: emitting one consolidated batch per sync/local change eliminates race-prone batching flags and microtask juggling.
  • Lower state surface area: transient structures (recentlySyncedKeys, preSyncVisibleState) disappear, reducing the chance of stale bookkeeping bugs.
  • Testability: clearer phases and smaller helpers make it easier to unit-test rebase logic, truncate handling, and change emission in isolation.
  • Explicit contracts: downstream consumers (indexes, live queries) receive a documented truncate signal instead of relying on incidental behaviour.

Drawbacks / risks introduced

  • Behavioural changes: accepting extra update noise (no deep-equality guard) may break assumptions in downstream consumers; we gain performance and simplicity but risk potential UI churn.
  • Plan complexity: the new logic still requires careful coordination between sync events and optimistic rebase; implementing the matrix correctly is non-trivial and mistakes could regress visibility.
  • Transition cost: migrating live queries, indexes, and tests to the new event contract demands significant refactoring before the benefits appear.
  • Synchronous flush requirement: ensuring truncate resets run synchronously tightens timing constraints; missing a synchronous path could reintroduce the race conditions we are trying to avoid.
  • New abstractions: introducing shared helpers (e.g., handleTruncate) adds indirection; if not carefully factored, we may end up with another layer of hard-to-follow logic.

Advantages of the current implementation

  • Battle tested: the existing code reflects multiple rounds of bug fixes and is currently passing all production tests.
  • Granular deduplication: deep-equality checks and multiple suppression layers avoid emitting redundant updates, keeping downstream churn low.
  • Incremental rollouts: the current structure supports partial fixes without rewriting the whole pipeline, which can be safer for urgent patches.

Drawbacks of the current implementation

  • Diffuse responsibilities: reconcile logic spans recomputeOptimisticState, commitPendingTransactions, and transaction hooks, making reasoning difficult.
  • Hidden coupling: batching flags (shouldBatchEvents, isCommittingSyncTransactions) and microtask clears (recentlySyncedKeys) act as implicit coordination mechanisms that are easy to break.
  • Truncate fragility: simulating truncates through per-key deletes causes large event bursts and requires careful reapplication of optimistic state, a common source of bugs.
  • State sprawl: numerous maps/sets must stay in sync; drift between them leads to subtle visibility issues.
  • Testing blind spots: with responsibilities split across several entry points, it is difficult to isolate behaviour in tests, leading to regression risks.

The choice is between living with today’s complexity and ad-hoc fixes, or investing in a clearer contract that may introduce short-term churn (extra updates, migration cost) but promises simpler reasoning and future enhancements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions