Skip to content

test: log applier path and copier chunk completions at Debug for #746#759

Merged
morgo merged 2 commits into
block:mainfrom
morgo:diagnostic-logging-746
May 1, 2026
Merged

test: log applier path and copier chunk completions at Debug for #746#759
morgo merged 2 commits into
block:mainfrom
morgo:diagnostic-logging-746

Conversation

@morgo
Copy link
Copy Markdown
Collaborator

@morgo morgo commented May 1, 2026

Summary

  • Adds Debug-level logs at the decision points in the binlog applier path and the copier that BUG.md (flaky test: TestCutoverAtomicityWithConcurrentWrites #746) needs to discriminate between the remaining hypotheses.
  • Bumps pkg/{migration,repl,copier} test log level to Debug in TestMain so the new logs reach CI output. Production behavior unchanged — Debug stays filtered there.
  • Investigative scaffolding for issue flaky test: TestCutoverAtomicityWithConcurrentWrites #746 — the log sites should stay useful long-term; the test-level Debug bump can be reverted once root-caused.

Why

Round one (#750) confirmed TestCutoverAtomicityWithConcurrentWrites loses rows mid-keyspace (e.g. id=118 missing while id=157 is present), ruling out the cutover-window race and pointing at the copy-phase delta-map / watermark interaction. The next CI failure needs to answer:

(a) Did HasChanged(K) ever fire? Was it dropped by KeyAboveHighWatermark?
(b) Was K ever in the delta map? Skipped repeatedly by KeyBelowLowWatermark?
(c) When the REPLACE INTO ... SELECT ... for K ran, what was its affected_rows?
(d) Which copy chunk's range covered K? What was that chunk's affected_rows?

This patch makes (a)-(d) printable in CI logs. Debug is the right level — it's what these decisions would warrant in any case, and there's no reason to filter them in tests.

What's logged (at Debug)

  • pkg/repl/subscription_map.go
    • HasChanged — every add to the delta map, and every drop by KeyAboveHighWatermark.
    • deltaMap.Flush — every key skipped by KeyBelowLowWatermark; per-statement affected_rows on the parallel path; stmt_count/keys_flushed on the under-lock path.
  • pkg/copier/{unbuffered,buffered}.go
    • One log per chunk completion with table, range, affected_rows, duration.

Test plan

  • go build ./... — clean.
  • go vet ./pkg/repl/ ./pkg/copier/ ./pkg/migration/ — clean.
  • go test ./pkg/repl/ ./pkg/copier/ — pass.
  • go test -run TestCutoverAtomicityWithConcurrentWrites ./pkg/migration/ — pass (2.4s) and emits the expected debug lines (HasChanged added to delta map, Flush stmt executed table=... affected_rows=... stmt=..., CopyChunk completed chunk=... affected_rows=...).

🤖 Generated with Claude Code

…k#746

The first round of diagnostics (block#750) confirmed that
TestCutoverAtomicityWithConcurrentWrites is losing rows mid-keyspace
(e.g. id=118 with id=157 still present), ruling out the cutover-window
race and pointing at the copy-phase delta-map / watermark interaction.
The next round needs per-key traces to discriminate between (a)
HasChanged silently dropping the binlog event, (b) the key being
parked in the delta map and skipped repeatedly, or (c) the REPLACE
statement running with affected_rows=0.

Adds Debug-level logs at the relevant decision points and bumps the
test log level to Debug in the affected packages so they appear in CI:

  pkg/repl/subscription_map.go:
    - HasChanged: log adds and above-high-watermark drops with key.
    - deltaMap.Flush: log not-below-low-watermark skips, per-statement
      affected_rows on the parallel path, and the keys_flushed count
      on the under-lock path.

  pkg/copier/{unbuffered,buffered}.go:
    - Log each chunk completion with table, range, affected_rows, and
      duration so a missing PK can be matched back to the chunk that
      should have copied it.

  pkg/{migration,repl,copier}/*_test.go (TestMain):
    - slog.SetLogLoggerLevel(slog.LevelDebug) so these new logs (and
      the existing Debug logs that were already there) reach CI output.
      Production behavior unchanged — Debug stays filtered there.

Verified locally: TestCutoverAtomicityWithConcurrentWrites passes
(2.4s) and emits the expected debug lines for HasChanged adds, Flush
stmt with affected_rows, and CopyChunk completion with chunk range.

Investigative scaffolding only — once block#746 is root-caused the new log
sites will still be useful but the test-level Debug bump can be reverted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@morgo morgo force-pushed the diagnostic-logging-746 branch from f31e486 to 5826db4 Compare May 1, 2026 13:34
@morgo morgo changed the title test: add SPIRIT_REPL_TRACE diagnostics for issue #746 test: log applier path and copier chunk completions at Debug for #746 May 1, 2026
@morgo morgo enabled auto-merge May 1, 2026 14:04
@morgo morgo merged commit 8b978c7 into block:main May 1, 2026
12 of 14 checks passed
@morgo morgo deleted the diagnostic-logging-746 branch May 1, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants