[remove datanode] Load balancing for removing datanode#15022
[remove datanode] Load balancing for removing datanode#15022HxpSerein wants to merge 4 commits intoapache:masterfrom
Conversation
...rc/main/java/org/apache/iotdb/confignode/manager/load/balancer/region/IDestNodeSelector.java
Show resolved
Hide resolved
| fakeToRealIdMap.keySet().stream() | ||
| .filter(fakeId -> !remainDataNodesFakeId.contains(fakeId)) | ||
| .collect(Collectors.toSet()); | ||
| return availableDataNodeMap.get(alternativeDataNodes.stream().findAny().orElse(null)); |
There was a problem hiding this comment.
What's the meaning of get(null) ?
...test/java/org/apache/iotdb/confignode/manager/load/balancer/region/DestNodeSelectorTest.java
Show resolved
Hide resolved
| allocatedRegionGroups.remove(replicaSet); | ||
| List<TDataNodeLocation> dataNodeLocations = replicaSet.getDataNodeLocations(); | ||
| dataNodeLocations.remove(replicaSetRemoved.get(replicaSet.getRegionId())); | ||
| replicaSet.setDataNodeLocations(dataNodeLocations); |
There was a problem hiding this comment.
do we need to do this as you just remove it from replicaSet.getDataNodeLocations()
There was a problem hiding this comment.
Now allocatedReplicaSets is HashSet, and the hash value of ReplicaSets will change. Removing, updating, and adding them is a safer approach
| for (TRegionReplicaSet replicaSet : replicaSets) { | ||
| replicaSetRemoved.put(replicaSet.getRegionId(), dataNodeLocation); | ||
| } | ||
| migratedReplicas.addAll(replicaSets); |
There was a problem hiding this comment.
It looks like it might repeat when we support removing multiple nodes in the future, right? Maybe we should use a set
| .getClusterSchemaManager() | ||
| .getReplicationFactor(database, consensusGroupType); | ||
| } catch (DatabaseNotExistsException e) { | ||
| LOGGER.error( |
| allocatedRegionGroups.remove(remainReplicaSet); | ||
| List<TDataNodeLocation> dataNodeLocations = remainReplicaSet.getDataNodeLocations(); | ||
| dataNodeLocations.add(selectedDestDataNode.getLocation()); | ||
| remainReplicaSet.setDataNodeLocations(dataNodeLocations); |
There was a problem hiding this comment.
To ensure the correctness and readability of the code, it is recommended to retain the remove and add operations.
| return regionMigrationPlans; | ||
| } | ||
|
|
||
| public List<RegionMigrationPlan> selectMigrationPlans( |
There was a problem hiding this comment.
maybe you can use gpt to refactor and improve the code here
...che/iotdb/confignode/manager/load/balancer/region/PartiteGraphPlacementDestNodeSelector.java
Show resolved
Hide resolved
| break; | ||
| } | ||
| alternativeReplica[replicationFactor - 1] = dataNodeEntry.dataNodeId; | ||
| Pair<Integer, Integer> currentValue = valuation(alternativeReplica); |
There was a problem hiding this comment.
So should we consider the number of regions first instead of divergence?
There was a problem hiding this comment.
Such is the case. Only dataNodeEntries that meet the criteria of dataNodeEntry.freeDiskSpace == minFreeDiskSpace will be considered.
| allocateResult.remove(remainReplicaSet); | ||
| List<TDataNodeLocation> dataNodeLocations = remainReplicaSet.getDataNodeLocations(); | ||
| dataNodeLocations.add(selectedNode.getLocation()); | ||
| remainReplicaSet.setDataNodeLocations(dataNodeLocations); |
| List<TDataNodeLocation> dataNodeLocations = replicaSet.getDataNodeLocations(); | ||
| allocateResult.remove(replicaSet); | ||
| dataNodeLocations.remove(REMOVE_DATANODE_LOCATION); | ||
| replicaSet.setDataNodeLocations(dataNodeLocations); |
| selectedNodeIds.add(selectedNode.getLocation().getDataNodeId()); | ||
| } | ||
|
|
||
| Assert.assertEquals(TEST_DATA_NODE_NUM - 1, selectedNodeIds.size()); |


Description
Add region migration dest datanode selection algorithm for load balancing during removing datanode.
Please see https://apache-iotdb-project.feishu.cn/docx/G5aPdZJ4QozqmkxO9dsc7sEsnTe for detail design.