Skip to content

PHOENIX-7793 Addendum: Replication Log writer improvements#2459

Merged
tkhurana merged 1 commit into
apache:PHOENIX-7562-feature-newfrom
tkhurana:PHOENIX-7562-feature-new
May 5, 2026
Merged

PHOENIX-7793 Addendum: Replication Log writer improvements#2459
tkhurana merged 1 commit into
apache:PHOENIX-7562-feature-newfrom
tkhurana:PHOENIX-7562-feature-new

Conversation

@tkhurana
Copy link
Copy Markdown
Contributor

@tkhurana tkhurana commented May 5, 2026

Summary

  • On-demand rotation uses a CountDownLatch instead of blind Thread.sleep, waking the retry immediately when a fresh writer is staged
  • Every retry gets a fresh writer (rotation requested on first failure, not just the 2nd)
  • Rotation size clamped to HDFS block size to prevent single-file multi-block writes
  • LogFileFormatWriter syncs header on construction, forcing HDFS block allocation on the rotation thread
  • LogRotationTask catches Throwable (not just IOException) to prevent RuntimeException from silently killing the ScheduledExecutorService
  • Default sync retries reduced from 4 to 1 (2 total attempts), matching the "fast fail to SAF" design
  • calculateSyncTimeout() simplified to derive from hbase.regionserver.wal.sync.timeout + ZK session
  • Abort on double-failure: if both SYNC and SAF writes fail, the RS aborts so preWALRestore can re-ship orphaned edits
  • PhoenixWALSyncTimeoutException wraps timeout errors for clearer diagnostics
  • New syncToSafTransitions metric counter
  • Unified ReplicationLog.close(boolean graceful) replaces separate close()/closeOnError() — writer closes submitted async with bounded 10s await
  • LogFileWriter.close() uses AtomicBoolean CAS to prevent concurrent double-close
  • onExit(graceful) in mode implementations simplified to pass the flag through directly

Test plan

  • ReplicationLogGroupTest (39 tests) — validates retry semantics, rotation, abort, timeout, mode transitions
  • LogFileWriterSyncTest — validates sync call ordering after header-sync-on-init change
  • Run tests in loop to confirm no flaky timing issues
  • Integration tests (ReplicationLogGroupIT) for end-to-end with mini-cluster

@tkhurana tkhurana force-pushed the PHOENIX-7562-feature-new branch from 741f9b8 to 9b9ec1c Compare May 5, 2026 21:12
@tkhurana tkhurana merged commit 6931e71 into apache:PHOENIX-7562-feature-new May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant