Skip to content

HBASE-30169 Split completed parent region added to RIT list on host RS and master crash#8264

Open
Umeshkumar9414 wants to merge 3 commits into
apache:masterfrom
Umeshkumar9414:HBASE-30169
Open

HBASE-30169 Split completed parent region added to RIT list on host RS and master crash#8264
Umeshkumar9414 wants to merge 3 commits into
apache:masterfrom
Umeshkumar9414:HBASE-30169

Conversation

@Umeshkumar9414
Copy link
Copy Markdown
Contributor

  • Bug: When a host RegionServer and master crashes, regionCrashed() blindly added all hosted regions to RIT, including split-completed parent regions that should never be reassigned.
    • Fix: Added isSplitOrMerged() guard in regionCrashed() to skip split/merged regions. Extracted isSplitOrMerged() and isReplica() helpers to reduce duplication with handleRegionStateNodeOperation().
    • Removed broken isRegionInTransition(RegionInfo) API — RegionInfo.COMPARATOR considers offline/split flags, so lookups failed when those flags differed between the query and stored instance. Replaced
      with encoded-name-based lookup in test utility.
    • Added test testRITWithSplitTableRegion — splits a region, crashes the hosting RS, restarts the master, and verifies the split parent is NOT in the RIT list.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an assignment tracking bug where split/merged parent regions could be incorrectly added to the master’s Regions-In-Transition (RIT) list during RegionServer crash handling, especially across master restarts. This improves correctness of assignment state tracking and adds regression coverage for the crash + master restart scenario.

Changes:

  • Skip adding split/merged (and replica) regions to RIT during regionCrashed, and refactor shared guards into helpers.
  • Remove the broken AssignmentManager.isRegionInTransition(RegionInfo) API and replace test checks with an encoded-name-based lookup helper.
  • Add a regression test that splits a region, expires the hosting RS, restarts the master, and verifies the split parent is not in RIT.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java Updates tests to use the new encoded-name-based RIT check utility.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestRegionSplit.java Adds regression test for split-parent not appearing in RIT after RS+master crash/restart; modernizes a few assertions.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/AssignmentTestingUtil.java Introduces encoded-name-based isRegionInTransition helper and updates existing wait helper to use it.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionInTransitionTracker.java Adds split/merged guard in regionCrashed, extracts helper predicates, and removes the public containsKey-based RIT query method.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java Removes the now-deleted isRegionInTransition(RegionInfo) API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +45
// DO NOT USE containsKey() on regionInTransition map as MutableRegionInfo.COMPARATOR considers
// offline/split flags.
Comment on lines +53 to 54
while (!isRegionInTransition(hri, getMaster(util).getAssignmentManager())) {
Threads.sleep(10);
Comment on lines +186 to +195
// As there are only 3 RS, start one more RS before expiring one
UTIL.getHBaseCluster().startRegionServer();

// stop RS holding split parent
UTIL.getHBaseCluster().getMaster().getServerManager().expireServer(targetRS);

// stop master
UTIL.getHBaseCluster().stopMaster(0);
UTIL.getHBaseCluster().waitOnMaster(0);
Thread.sleep(500);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants