Skip to content

Support multiple regions in MIGRATE REGION and multiple DataNodes in REMOVE DATANODE#18046

Merged
CRZbulabula merged 3 commits into
masterfrom
support-multi-region-migrate
Jun 30, 2026
Merged

Support multiple regions in MIGRATE REGION and multiple DataNodes in REMOVE DATANODE#18046
CRZbulabula merged 3 commits into
masterfrom
support-multi-region-migrate

Conversation

@CRZbulabula

@CRZbulabula CRZbulabula commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

What

Two cluster-management commands that previously accepted a single id now accept a comma-separated list in one statement (the single-id form keeps working):

-- migrate several regions between the same pair of DataNodes
MIGRATE REGION 1, 2, 3 FROM 4 TO 5

-- remove several DataNodes at once
REMOVE DATANODE 3, 4, 5

Why

Operators draining or rebalancing nodes routinely act on several regions / DataNodes at once. Today that needs one statement per target. Accepting a list makes these one-liners and aligns MIGRATE REGION with the multi-region shape already used by EXTEND / REMOVE / RECONSTRUCT REGION.

How

MIGRATE REGION — multiple regions, fixed FROM/TO

  • Grammar (both data models): migrateRegion / migrateRegionStatement accept regionIds+=INTEGER_LITERAL (COMMA regionIds+=INTEGER_LITERAL)*.
  • Thrift TMigrateRegionReq: i32 regionId -> list<i32> regionIds.
  • Parsers, AST node MigrateRegion, statement MigrateRegionStatement: carry List<Integer> regionIds.
  • ProcedureManager#migrateRegion: resolves the fixed source/destination DataNodes once, then loops over the de-duplicated region ids submitting one RegionMigrateProcedure per region. Per-region results are aggregated into the TSStatus sub-status, mirroring EXTEND / REMOVE REGION reporting. Unknown source/destination ids fail fast with a clear message.

REMOVE DATANODE — multiple DataNodes

  • Grammar (both data models): removeDataNode / removeDataNodeStatement accept dataNodeIds+=INTEGER_LITERAL (COMMA dataNodeIds+=INTEGER_LITERAL)*.
  • Parsers: collect every id (no longer wrap a single id in a singleton list).
  • The rest of the path was already multi-node: RemoveDataNodeStatement holds a Set, TDataNodeRemoveReq carries list<TDataNodeLocation>, and RemoveDataNodesProcedure is plural. The only restriction was a size() != 1 guard in ClusterConfigTaskExecutor#removeDataNode, now replaced so the statement fails as a whole (instead of silently dropping ids) when any requested id is not a registered DataNode, and rejects an empty set.

Tests

  • Tree- and table-model parser unit tests for single- and multi-target MIGRATE REGION and REMOVE DATANODE (4 new test classes).
  • IoTDBMigrateMultiRegionForIoTV1IT (1C5D): migrates multiple regions from one DataNode to another in one statement, asserting every region left the source and joined the destination.
  • A 1C5D case in IoTDBRemoveDataNodeNormalIT that removes two DataNodes in a single statement.

Build verified: thrift + ANTLR regeneration, iotdb-core/datanode and iotdb-core/confignode compile, integration-test test-compiles, spotless:check passes, and all new parser unit tests pass.

MIGRATE REGION previously accepted a single region id. Extend it to
accept a comma-separated list of region ids while keeping a single
source (FROM) and destination (TO) DataNode:

  MIGRATE REGION 1, 2, 3 FROM 4 TO 5

The list of regions is migrated from the source DataNode to the
destination DataNode and the per-region results are aggregated into the
response sub-status, mirroring the existing EXTEND/REMOVE REGION
reporting. Both the tree model and table model grammars are updated.

Changes span grammar (IoTDBSqlParser.g4, RelationalSql.g4), the thrift
TMigrateRegionReq (regionId -> list<i32> regionIds), the parsers, the
MigrateRegion AST node / MigrateRegionStatement, and ProcedureManager,
which now resolves the fixed source/destination DataNodes once and loops
over the deduped region ids submitting one RegionMigrateProcedure each.

Adds tree- and table-model parser unit tests and a 1C5D integration
test that migrates multiple regions in a single statement.
REMOVE DATANODE previously accepted a single DataNode id. Extend it to
accept a comma-separated list of DataNode ids in one statement:

  REMOVE DATANODE 3, 4, 5

The removal already flowed through a multi-node path internally
(RemoveDataNodeStatement holds a Set, TDataNodeRemoveReq carries a
list<TDataNodeLocation> and RemoveDataNodesProcedure is plural); only
the grammar, parsers and a 'size() != 1' guard restricted it to one id.

Both the tree model and table model grammars now accept a list, the
parsers collect every id, and ClusterConfigTaskExecutor.removeDataNode
fails the whole statement (instead of silently dropping ids) when any
requested id is not a registered DataNode, and rejects an empty set.

Adds tree- and table-model parser unit tests and enables a 1C5D
integration test that removes two DataNodes in a single statement.
@CRZbulabula CRZbulabula changed the title Support migrating multiple regions in one MIGRATE REGION statement Support multiple regions in MIGRATE REGION and multiple DataNodes in REMOVE DATANODE Jun 28, 2026
@codecov

codecov Bot commented Jun 28, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 15.29412% with 72 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.52%. Comparing base (fcd52d7) to head (0b0e591).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
...che/iotdb/confignode/manager/ProcedureManager.java 0.00% 63 Missing ⚠️
...ion/config/executor/ClusterConfigTaskExecutor.java 0.00% 4 Missing ⚠️
...yengine/plan/relational/sql/ast/MigrateRegion.java 42.85% 4 Missing ⚠️
...tion/config/metadata/region/MigrateRegionTask.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18046      +/-   ##
============================================
+ Coverage     41.40%   41.52%   +0.11%     
  Complexity      318      318              
============================================
  Files          5286     5287       +1     
  Lines        369547   370788    +1241     
  Branches      47815    47990     +175     
============================================
+ Hits         153008   153958     +950     
- Misses       216539   216830     +291     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- IoTDBMigrateMultiRegionForIoTV1IT: the success predicate used
  getRunningRegionMap() (Running-only view) to assert the source replica
  had left. The source replica is marked RegionStatus.Removing at the
  start of RemoveRegionPeerProcedure and only deleted at its final
  REMOVE_REGION_LOCATION_CACHE state, so the Running-only view drops it
  prematurely and awaitUntilSuccess returns while the source row is still
  present, failing the immediately-following getAllRegionMap() assertion
  ("Region 0 should have left source DataNode 1"). Mirror the
  single-region migrate predicate: require the destination to be Running
  and check source absence against the all-status view.
- ProcedureManager#migrateOneRegion: reword the moved
  'choose the DataNode which has lowest load' comment so it no longer
  contains the TODO keyword that the todo-check CI job rejects on added
  lines.
@sonarqubecloud

Copy link
Copy Markdown

@CRZbulabula CRZbulabula merged commit d00ec09 into master Jun 30, 2026
43 of 44 checks passed
@CRZbulabula CRZbulabula deleted the support-multi-region-migrate branch June 30, 2026 16:20
CRZbulabula added a commit that referenced this pull request Jul 1, 2026
…-DN IT

Address review feedback: instead of tolerating a null targetDataNode inside
checkExtendRegion / checkReconstructRegion, detect the invalid target at the
resolution layer. extendOneRegion and reconstructRegion now resolve the target
via getRegisteredDataNodeLocationOrNull and, when the id is not a registered
DataNode (a ConfigNode id or a non-existent id), immediately return
"Target DataNode <id> does not exist in the cluster" -- mirroring the existing
migrateRegion pattern -- so the check methods only ever receive a non-null
target. The two check methods are restored to their original form.

Also fix the flaky IoTDBRemoveDataNodeNormalIT.success1C5DRemoveTwoDataNodesUseSQL
(added in #18046): it removed two randomly chosen DataNodes and asserted success,
but if both hosted a replica of the same consensus group the ConfigNode correctly
rejects the request ("Only one replica of the same consensus group is allowed to
be migrated at the same time."). Add selectRemoveDataNodesWithoutRegionConflict,
which picks DataNodes whose region sets are pairwise disjoint, and use it whenever
more than one DataNode is removed.

Verified locally: extendRegionToInvalidDataNodeTest, reconstructRegionToInvalidDataNodeTest
and success1C5DRemoveTwoDataNodesUseSQL all pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant