Skip to content

PHOENIX-7793 Replication Log writer improvements#2433

Merged
tkhurana merged 1 commit into
apache:PHOENIX-7562-feature-newfrom
tkhurana:PHOENIX-7793
Apr 30, 2026
Merged

PHOENIX-7793 Replication Log writer improvements#2433
tkhurana merged 1 commit into
apache:PHOENIX-7562-feature-newfrom
tkhurana:PHOENIX-7793

Conversation

@tkhurana
Copy link
Copy Markdown
Contributor

@tkhurana tkhurana commented Apr 30, 2026

Summary

Rewrites the ReplicationLog writer rotation mechanism to be lock-free and asynchronous, improving throughput on the disruptor consumer hot path.

  • Lock-free writer swap: Replaced ReentrantLock-based synchronous rotateLog() with an AtomicReference<LogFileWriter> pendingWriter staging pattern. LogRotationTask creates new writers on a background thread and stages them in pendingWriter. The disruptor consumer thread atomically drains the staged writer inside apply() before each action, replays any unsynced appends, and continues.
  • Async close: Old writers are submitted to a dedicated closeExecutor for background close, keeping the hot path unblocked.
  • On-demand size rotation: Instead of checking size under a lock in getWriter(), requestRotationIfNeeded() is called post-action and submits a LogRotationTask to the executor if the size threshold is exceeded. An AtomicBoolean rotationRequested prevents duplicate submissions.
  • Round-boundary aligned scheduling: The rotation executor's initialDelay is computed via computeInitialDelay() to align with replication round boundaries. The rotation interval is now derived from replicationRoundDurationSeconds rather than a separate config key.
  • Simplified retry logic in apply(): First failure retries on the same writer (transient). Second failure triggers requestRotation() to create a new writer on the background thread. The retry sleep gives the background thread time to stage it, and the next attempt drains it.
  • Removed RotationReason enum and per-reason metrics: Eliminated TIME/SIZE/ERROR rotation reason tracking. Only total rotation count and failure count remain.
  • closed field upgraded to AtomicBoolean for safe concurrent access without locks.

Test changes

  • Added recreateLogGroup(), overrideConf(), and waitForRotationTick() helpers to ReplicationLogBaseTest
  • TestableLog overrides startRotationExecutor() with a full-round initial delay to prevent flaky boundary-related failures
  • 8 new tests covering: mid-batch replay, undrained pending writer replacement, staged-writer pickup on retry, idle + lease recovery, replay failure retry, error-recovery rotation, on-demand rotation not suppressing scheduled ticks, round-boundary alignment
  • New ReplicationLogTest with unit tests for computeInitialDelay
  • Updated existing tests to use forceRotation() instead of removed rotateLog()

Test plan

  • mvn test -pl phoenix-core -Dtest=ReplicationLogTest
  • mvn test -pl phoenix-core -Dtest=ReplicationLogGroupTest
  • mvn test -pl phoenix-core -Dtest=ReplicationLogDiscoveryForwarderTest
  • Verify no regressions in other replication tests

@tkhurana tkhurana requested review from apurtell and kadirozde April 30, 2026 03:48
Copy link
Copy Markdown
Contributor

@kadirozde kadirozde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, Thanks!

@tkhurana tkhurana merged commit 2fcf0cc into apache:PHOENIX-7562-feature-new Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants