Skip to content

feat(repl): use bufferedMap for all subscriptions#823

Merged
morgo merged 9 commits into
block:mainfrom
morgo:use-buffered-map-everywhere
May 5, 2026
Merged

feat(repl): use bufferedMap for all subscriptions#823
morgo merged 9 commits into
block:mainfrom
morgo:use-buffered-map-everywhere

Conversation

@morgo
Copy link
Copy Markdown
Collaborator

@morgo morgo commented May 5, 2026

Summary

Replaces the deltaQueue subscription type with a unified bufferedMap for all replication subscriptions. The bufferedMap handles both memory-comparable and non-memory-comparable PKs:

  • Memory-comparable PKs — LWW map dedup during copy and post-copy.
  • Non-memory-comparable PKs — FIFO queue full-time by default (preserves the previous deltaQueue performance characteristics and keeps the queue path hot in CI). Opt-in via --force-enable-buffered-map to use map-during-copy / queue-post-copy.

SetWatermarkOptimization now takes a context.Context and returns an error because toggling can drain the outgoing store inline. pkg/migration/runner.go and pkg/move/runner.go were updated accordingly.

The old pkg/repl/subscription_queue.go and its tests are removed; their coverage is folded into subscription_buffered_test.go.

Fixes

Fixes #746 — every replication change now captures a full row image from the binary log via the bufferedMap path. The previous deltaMap/deltaQueue split could produce partial-image deltas for some PK shapes; with bufferedMap-everywhere, the row image is the source of truth on both the insert and delete sides of an LWW write.

Fixes #607move operations now accept tables with non-memory-comparable primary keys (e.g. VARCHAR with a CI collation). The tableCompatibilityCheck gate that rejected those tables has been removed, since the bufferedMap FIFO queue mode handles them correctly. Verified by TestMoveWithVarcharPK, which moves a VARCHAR(64) PK table with utf8mb4_0900_ai_ci collation under 4 concurrent INSERT/UPDATE/DELETE writers — once with the default (queue full-time) and once with --force-enable-buffered-map (map during copy, queue post-copy). Both modes complete successfully and source/target row counts match.

Docs

docs/migrate.md and docs/move.md document the new --force-enable-buffered-map flag, including the rationale (collation-sensitive PK ≠ map-key equality) and the experimental warning (correctness relies on the post-cutover checksum).

Test plan

  • go build ./... passes
  • go vet ./... passes
  • go test ./pkg/repl/... passes (including merged-in upstream tests TestBufferedMapConcurrentHasChanged, TestBufferedMapKeyOverwriteDedupes, TestBufferedMapHasChangedNilAndEmpty, TestBufferedMapKeyAboveWatermarkCounters, plus the four ported TestBufferedMapQueueFlush* tests)
  • go test ./pkg/move/... passes including TestMoveWithVarcharPK (both queue-full-time and force-enable-buffered-map subtests)
  • Full CI green

morgo and others added 7 commits May 5, 2026 04:52
…erywhere

# Conflicts:
#	pkg/repl/subscription_buffered.go
#	pkg/repl/subscription_buffered_test.go
Backfills four scenarios from the deleted subscription_queue_test.go that
were not already covered by the queue-mode tests in subscription_buffered_test.go:

- TestBufferedMapQueueFlushEmpty: Flush on an empty queue is a no-op.
- TestBufferedMapQueueFlushBatchSizeLimit: targetBatchSize=2 with 5 same-type
  events forces flushQueueLocked to split a same-type segment across multiple
  applier calls; end state must still have all rows applied.
- TestBufferedMapQueueFlushUnderLock: queue flush with a held TableLock,
  exercising the cutover path.
- TestBufferedMapQueueConcurrentFlush: interleaves HasChanged from a goroutine
  with repeated Flush calls to catch races between queue append and queue drain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the new flag to docs/migrate.md (full description) and
docs/move.md (short cross-reference). Both note the default is
the safer FIFO-queue-full-time behaviour for non-memory-comparable
PKs and that enabling the optimization is experimental and relies
on the post-cutover checksum for correctness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the tableCompatibilityCheck gate that rejected source tables with
VARCHAR / collation-sensitive primary keys. With deltaQueue gone and
bufferedMap routing those subscriptions through its FIFO queue mode, the
replication path is correct for non-memory-comparable PKs in moves too —
the queue replays binlog events in order, and the target's collation-aware
uniqueness produces the correct end state.

Verified by TestMoveWithVarcharPK: a real move of a VARCHAR(64) PK table
with utf8mb4_0900_ai_ci collation under 4 concurrent INSERT/UPDATE/DELETE
writers, run twice — once with the default (queue full-time) and once
with --force-enable-buffered-map (map during copy, queue post-copy).
Both modes complete successfully and source/target row counts match.

Also tidies a stylistic redundancy in pkg/repl/client.go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dead code left over from the deltaQueue removal — golangci-lint's
unused linter caught it on CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Briefly explains that the codebase previously had separate deltaMap and
deltaQueue subscription types, and that they were unified into a single
bufferedMap implementation as part of the fix for issue block#746 (binlog
vs. visibility race in the deltaMap's SELECT-at-flush path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR unifies replication change tracking around a single subscription implementation (bufferedMap) for all tables, removing the old deltaQueue type and extending bufferedMap with an internal FIFO mode for non-memory-comparable primary keys. It also propagates a new --force-enable-buffered-map flag through migration/move flows, and updates watermark toggling to be context-aware and error-returning because toggling may now drain pending writes inline.

Changes:

  • Remove deltaQueue and route all subscriptions through bufferedMap, adding an internal FIFO queue mode for non-memory-comparable PKs.
  • Change SetWatermarkOptimization to SetWatermarkOptimization(ctx, enabled) error across repl/migration/move, since toggling may drain pending data via the applier.
  • Add and document --force-enable-buffered-map (Migration/Move) and lift the move table compatibility gate that previously rejected non-memory-comparable PKs.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/repl/utils.go Removes deltaQueue-specific SQL helper no longer needed.
pkg/repl/subscription.go Updates subscription contract/docs for unified bufferedMap and new watermark toggle signature.
pkg/repl/subscription_queue.go Deletes the old deltaQueue implementation.
pkg/repl/subscription_queue_test.go Removes deltaQueue tests (coverage moved to buffered tests).
pkg/repl/subscription_buffered.go Adds internal FIFO queue mode, routing rules, drain-on-toggle logic, and updated flush behavior.
pkg/repl/subscription_buffered_test.go Expands tests to cover queue routing, FIFO semantics, batch splitting, under-lock flush, and transitions.
pkg/repl/README.md Updates replication documentation to describe bufferedMap-only design and FIFO fallback.
pkg/repl/client.go Adds ForceEnableBufferedMap config plumb-through; changes SetWatermarkOptimization to ctx+error and updates subscription selection logic.
pkg/repl/client_test.go Updates tests for the new watermark toggle signature and removes deltaQueue-specific coverage.
pkg/move/runner.go Passes ForceEnableBufferedMap into repl client and handles watermark toggle errors.
pkg/move/runner_test.go Adds an end-to-end move test for VARCHAR PK tables under concurrent writes (both routing modes).
pkg/move/move.go Adds --force-enable-buffered-map flag for move.
pkg/move/check/table_compatibility.go Removes “memory-comparable PK” requirement; keeps “must have PK”.
pkg/move/check/table_compatibility_test.go Updates tests to accept VARCHAR PKs for move compatibility checks.
pkg/migration/runner.go Handles watermark toggle errors and passes ForceEnableBufferedMap into repl client.
pkg/migration/resume_test.go Updates resume test for new watermark toggle signature.
pkg/migration/migration.go Adds --force-enable-buffered-map flag for migrate with detailed rationale.
docs/move.md Documents --force-enable-buffered-map for move.
docs/migrate.md Documents --force-enable-buffered-map for migrate with experimental warning and rationale.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/repl/subscription_buffered.go
Comment thread pkg/repl/subscription.go Outdated
Comment thread pkg/repl/client.go
morgo and others added 2 commits May 5, 2026 12:04
…yWithConcurrentWrites

PR block#821 dropped the composite_unbuffered and composite_buffered subtests
when deltaQueue was retired, since the unified bufferedMap implementation
hadn't shipped yet. Restore them now: the composite (id, x_token VARCHAR)
PK is non-memory-comparable, which routes the bufferedMap subscription
through its FIFO queue mode for the entire run — exactly the path that
needs cutover-atomicity coverage under concurrent writes.

We deliberately skip a force-enable-buffered-map=true variant: with the
default (false), the queue runs full-time across copy and post-copy
phases, which exercises the queue path strictly more than the
optimization. The default is the higher-coverage variant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two threads:

- pkg/repl/subscription.go: rewrite the Subscription doc comment to
  describe both routing policies for non-memory-comparable PKs (queue
  full-time vs. map-during-copy). The previous wording suggested the
  queue was only used "once watermark optimization is disabled", which
  is only true for forceEnableBufferedMap=true.

- pkg/repl/client.go: snapshot the subscription set under c.Lock and
  release the lock before calling sub.SetWatermarkOptimization on each
  one. The drain inside the toggle can do synchronous applier writes,
  and holding c.Lock across that would block processRowsEvent for
  unrelated tables (relevant for moves with many subscriptions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@morgo morgo marked this pull request as ready for review May 5, 2026 19:25
@morgo morgo merged commit a17227b into block:main May 5, 2026
11 of 12 checks passed
@morgo morgo deleted the use-buffered-map-everywhere branch May 5, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flaky test: TestCutoverAtomicityWithConcurrentWrites Support non-memory comparable PKs in move tables operations

3 participants