Support multiple regions in MIGRATE REGION and multiple DataNodes in REMOVE DATANODE#18046
Merged
Conversation
MIGRATE REGION previously accepted a single region id. Extend it to accept a comma-separated list of region ids while keeping a single source (FROM) and destination (TO) DataNode: MIGRATE REGION 1, 2, 3 FROM 4 TO 5 The list of regions is migrated from the source DataNode to the destination DataNode and the per-region results are aggregated into the response sub-status, mirroring the existing EXTEND/REMOVE REGION reporting. Both the tree model and table model grammars are updated. Changes span grammar (IoTDBSqlParser.g4, RelationalSql.g4), the thrift TMigrateRegionReq (regionId -> list<i32> regionIds), the parsers, the MigrateRegion AST node / MigrateRegionStatement, and ProcedureManager, which now resolves the fixed source/destination DataNodes once and loops over the deduped region ids submitting one RegionMigrateProcedure each. Adds tree- and table-model parser unit tests and a 1C5D integration test that migrates multiple regions in a single statement.
REMOVE DATANODE previously accepted a single DataNode id. Extend it to accept a comma-separated list of DataNode ids in one statement: REMOVE DATANODE 3, 4, 5 The removal already flowed through a multi-node path internally (RemoveDataNodeStatement holds a Set, TDataNodeRemoveReq carries a list<TDataNodeLocation> and RemoveDataNodesProcedure is plural); only the grammar, parsers and a 'size() != 1' guard restricted it to one id. Both the tree model and table model grammars now accept a list, the parsers collect every id, and ClusterConfigTaskExecutor.removeDataNode fails the whole statement (instead of silently dropping ids) when any requested id is not a registered DataNode, and rejects an empty set. Adds tree- and table-model parser unit tests and enables a 1C5D integration test that removes two DataNodes in a single statement.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18046 +/- ##
============================================
+ Coverage 41.40% 41.52% +0.11%
Complexity 318 318
============================================
Files 5286 5287 +1
Lines 369547 370788 +1241
Branches 47815 47990 +175
============================================
+ Hits 153008 153958 +950
- Misses 216539 216830 +291 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
- IoTDBMigrateMultiRegionForIoTV1IT: the success predicate used
getRunningRegionMap() (Running-only view) to assert the source replica
had left. The source replica is marked RegionStatus.Removing at the
start of RemoveRegionPeerProcedure and only deleted at its final
REMOVE_REGION_LOCATION_CACHE state, so the Running-only view drops it
prematurely and awaitUntilSuccess returns while the source row is still
present, failing the immediately-following getAllRegionMap() assertion
("Region 0 should have left source DataNode 1"). Mirror the
single-region migrate predicate: require the destination to be Running
and check source absence against the all-status view.
- ProcedureManager#migrateOneRegion: reword the moved
'choose the DataNode which has lowest load' comment so it no longer
contains the TODO keyword that the todo-check CI job rejects on added
lines.
|
CRZbulabula
added a commit
that referenced
this pull request
Jul 1, 2026
…-DN IT Address review feedback: instead of tolerating a null targetDataNode inside checkExtendRegion / checkReconstructRegion, detect the invalid target at the resolution layer. extendOneRegion and reconstructRegion now resolve the target via getRegisteredDataNodeLocationOrNull and, when the id is not a registered DataNode (a ConfigNode id or a non-existent id), immediately return "Target DataNode <id> does not exist in the cluster" -- mirroring the existing migrateRegion pattern -- so the check methods only ever receive a non-null target. The two check methods are restored to their original form. Also fix the flaky IoTDBRemoveDataNodeNormalIT.success1C5DRemoveTwoDataNodesUseSQL (added in #18046): it removed two randomly chosen DataNodes and asserted success, but if both hosted a replica of the same consensus group the ConfigNode correctly rejects the request ("Only one replica of the same consensus group is allowed to be migrated at the same time."). Add selectRemoveDataNodesWithoutRegionConflict, which picks DataNodes whose region sets are pairwise disjoint, and use it whenever more than one DataNode is removed. Verified locally: extendRegionToInvalidDataNodeTest, reconstructRegionToInvalidDataNodeTest and success1C5DRemoveTwoDataNodesUseSQL all pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



What
Two cluster-management commands that previously accepted a single id now accept a comma-separated list in one statement (the single-id form keeps working):
Why
Operators draining or rebalancing nodes routinely act on several regions / DataNodes at once. Today that needs one statement per target. Accepting a list makes these one-liners and aligns
MIGRATE REGIONwith the multi-region shape already used byEXTEND/REMOVE/RECONSTRUCT REGION.How
MIGRATE REGION — multiple regions, fixed FROM/TO
migrateRegion/migrateRegionStatementacceptregionIds+=INTEGER_LITERAL (COMMA regionIds+=INTEGER_LITERAL)*.TMigrateRegionReq:i32 regionId->list<i32> regionIds.MigrateRegion, statementMigrateRegionStatement: carryList<Integer> regionIds.ProcedureManager#migrateRegion: resolves the fixed source/destination DataNodes once, then loops over the de-duplicated region ids submitting oneRegionMigrateProcedureper region. Per-region results are aggregated into theTSStatussub-status, mirroringEXTEND/REMOVE REGIONreporting. Unknown source/destination ids fail fast with a clear message.REMOVE DATANODE — multiple DataNodes
removeDataNode/removeDataNodeStatementacceptdataNodeIds+=INTEGER_LITERAL (COMMA dataNodeIds+=INTEGER_LITERAL)*.RemoveDataNodeStatementholds aSet,TDataNodeRemoveReqcarrieslist<TDataNodeLocation>, andRemoveDataNodesProcedureis plural. The only restriction was asize() != 1guard inClusterConfigTaskExecutor#removeDataNode, now replaced so the statement fails as a whole (instead of silently dropping ids) when any requested id is not a registered DataNode, and rejects an empty set.Tests
MIGRATE REGIONandREMOVE DATANODE(4 new test classes).IoTDBMigrateMultiRegionForIoTV1IT(1C5D): migrates multiple regions from one DataNode to another in one statement, asserting every region left the source and joined the destination.IoTDBRemoveDataNodeNormalITthat removes two DataNodes in a single statement.Build verified: thrift + ANTLR regeneration,
iotdb-core/datanodeandiotdb-core/confignodecompile,integration-testtest-compiles,spotless:checkpasses, and all new parser unit tests pass.