New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26178 Improve data structure and algorithm for BalanceClusterSt… #3575
Conversation
…ate to improve computation speed for large clusters
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Some comments.
Any idea of what the improvement here is or could be?
@@ -128,11 +128,11 @@ public void postMasterStartupInitialize() { | |||
protected final boolean idleRegionServerExist(BalancerClusterState c){ | |||
boolean isServerExistsWithMoreRegions = false; | |||
boolean isServerExistsWithZeroRegions = false; | |||
for (int[] serverList: c.regionsPerServer){ | |||
if (serverList.length > 1) { | |||
for (ArrayList<Integer> serverList: c.regionsPerServer){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/List/ArrayList/ ?
@@ -42,13 +42,13 @@ | |||
*/ | |||
int pickRandomRegion(BalancerClusterState cluster, int server, double chanceOfNoSwap) { | |||
// Check to see if this is just a move. | |||
if (cluster.regionsPerServer[server].length == 0 | |||
if (cluster.regionsPerServer.get(server).size() == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/isEmpty()/size() == 0/ ... its faster.
@@ -526,9 +526,9 @@ protected BalanceAction generate(BalancerClusterState cluster) { | |||
thisRegion = pickLowestLocalRegionOnServer(cluster, thisServer); | |||
} | |||
if (thisRegion == -1) { | |||
if (cluster.regionsPerServer[thisServer].length > 0) { | |||
if (cluster.regionsPerServer.get(thisServer).size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/isEmpty()/size()/ and negate the check?
selectedPrimaryIndex = currentPrimary; | ||
currentLargestRandom = currentRandom; | ||
} | ||
Map.Entry<Integer, ArrayList<Integer>> selectedPrimaryIndexEntry = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/List/ArrayList/ ?
int currentPrimary = -1; | ||
int currentPrimaryIndex = -1; | ||
int selectedPrimaryIndex = -1; | ||
int selectCoHostedRegionPerGroup(HashMap<Integer, ArrayList<Integer>> primariesOfRegionsPerGroup, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/List/ArrayList/ and s/Map/HashMap/?
} | ||
Map.Entry<Integer, ArrayList<Integer>> selectedPrimaryIndexEntry = null; | ||
|
||
// primariesOfRegionsPerGroup is a hashmap of count of primary index on a server. a count > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my info, whats a 'count of primary index on a server'? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the number of co-located region replicas on one group. Let me update the comment.
|
||
Arrays.sort(primariesOfRegions); | ||
HashMap<Integer, ArrayList<Integer>> primariesOfRegions = | ||
new HashMap<Integer, ArrayList<Integer>>(cluster.numRegions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't do this double allocation on a single line... we give each allocation and assign its own line... easier to read.
Left hand size should be Map rather than HashMap? Ditto for List and ArrayLIst.
@@ -80,22 +83,16 @@ protected double cost() { | |||
* @param primariesOfRegions a sorted array of primary regions ids for the regions hosted | |||
* @return a sum of numReplicas-1 squared for each primary region in the group. | |||
*/ | |||
protected final long costPerGroup(int[] primariesOfRegions) { | |||
protected final long costPerGroup(HashMap<Integer, ArrayList<Integer>> primariesOfRegions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Map/HashMap/ and s/List/ArrayList/ ?
|
||
// primariesOfRegionsPerGroup is a hashmap of count of primary index on a server. a count > 1 | ||
// means that replicas are co-hosted | ||
for(Map.Entry<Integer, ArrayList<Integer>> pair : primariesOfRegions.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/List/ArrayList/
@@ -32,7 +35,7 @@ | |||
"hbase.master.balancer.stochastic.regionReplicaHostCostKey"; | |||
private static final float DEFAULT_REGION_REPLICA_HOST_COST_KEY = 100000; | |||
|
|||
private int[][] primariesOfRegionsPerGroup; | |||
private ArrayList<HashMap<Integer, ArrayList<Integer>>> primariesOfRegionsPerGroup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of List of int [] ?
s/List/ArrayList/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Map/HashMap/
ArrayList<HashMap<Integer, ArrayList<Integer>>> primariesOfRegionsPerHost; | ||
|
||
// rackIndex -> hash map of count by primary region index per server | ||
ArrayList<HashMap<Integer, ArrayList<Integer>>> primariesOfRegionsPerRack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/List/ArrayList/ here unless you are using methods on ArrayLiist not in List?
regionsPerServer = new ArrayList<ArrayList<Integer>>(numServers); | ||
regionsPerHost = new ArrayList<HashSet<Integer>>(numHosts); | ||
regionsPerRack = new ArrayList<HashSet<Integer>>(numRacks); | ||
primariesOfRegionsPerServer = new ArrayList<HashMap<Integer, ArrayList<Integer>>>(numServers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can s/?/HashSet/ ... in all of these.
regionsPerRack = new ArrayList<HashSet<Integer>>(numRacks); | ||
primariesOfRegionsPerServer = new ArrayList<HashMap<Integer, ArrayList<Integer>>>(numServers); | ||
primariesOfRegionsPerHost = new ArrayList<HashMap<Integer, ArrayList<Integer>>>(numHosts); | ||
primariesOfRegionsPerRack = new ArrayList<HashMap<Integer, ArrayList<Integer>>>(numRacks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too?
for (Map.Entry<ServerName, List<RegionInfo>> entry : clusterState.entrySet()) { | ||
regionsPerServer.add(new ArrayList<Integer>(entry.getValue().size())); | ||
primariesOfRegionsPerServer.add( | ||
new HashMap<Integer, ArrayList<Integer>>(entry.getValue().size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Map/HashMap/... etc.
primariesOfRegionsPerRack[i][numRegionPerRackIndex] = primaryIndex; | ||
numRegionPerRackIndex++; | ||
} | ||
private void populateRegionPerLocationFromServer(ArrayList<HashSet<Integer>> regionsPerLocation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/List/ArrayList/ ?
return; | ||
} | ||
ArrayList list = map.get(key); | ||
if(list == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after if.
ArrayList list = map.get(key); | ||
if(list == null) { | ||
list = new ArrayList(); | ||
map.put(key, list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a single-thread in here?
list.add(value); | ||
} | ||
|
||
public static void removeElement(HashMap<Integer, ArrayList<Integer>> map, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Map/HashMap/ etc.
LOG.warn("remove element from null"); | ||
return; | ||
} | ||
ArrayList list = map.get(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/List/ArrayList/
return; | ||
} | ||
ArrayList list = map.get(key); | ||
if(list != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, except in PrimaryRegionCountSkewCostFunction.regionMoved, I haven't seen any advantages of the changes here.
ArrayList is slower than a plain int[], as int[] is more cache friendly.
And for changing array to HashMap, I do not see any advantages either. Locating an element in array is also O(1), it will be faster than locating a value in HashMap...
So could you please write a simple design doc to describe what you are trying to do here?
@Apache9 Yes, it is a valid concern that ArrayList could be slower than int[]. But it is not a good idea to mix generic and primitives. The goal here is we can save time when we have large collections. Let me comment the pull request code comparison to explain the change of time complexity of the algorithms. |
regionsPerServer[mra.getToServer()] = | ||
addRegion(regionsPerServer[mra.getToServer()], mra.getRegion()); | ||
// remove by value | ||
regionsPerServer.get(mra.getFromServer()).remove(Integer.valueOf(mra.getRegion())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n)->O(1) for removing an element by its value
replaceRegion(regionsPerServer[a.getFromServer()], a.getFromRegion(), a.getToRegion()); | ||
regionsPerServer[a.getToServer()] = | ||
replaceRegion(regionsPerServer[a.getToServer()], a.getToRegion(), a.getFromRegion()); | ||
replaceRegion(regionsPerServer.get(a.getFromServer()), a.getFromRegion(), a.getToRegion()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n)->O(1) for removing an element by its value
regionsPerServer[a.getToServer()] = | ||
replaceRegion(regionsPerServer[a.getToServer()], a.getToRegion(), a.getFromRegion()); | ||
replaceRegion(regionsPerServer.get(a.getFromServer()), a.getFromRegion(), a.getToRegion()); | ||
replaceRegion(regionsPerServer.get(a.getToServer()), a.getToRegion(), a.getFromRegion()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n)->O(1) for removing an element by its value
@@ -572,26 +550,22 @@ public void doAction(BalanceAction action) { | |||
// FindBugs: Having the assert quietens FB BC_UNCONFIRMED_CAST warnings | |||
assert action instanceof AssignRegionAction : action.getClass(); | |||
AssignRegionAction ar = (AssignRegionAction) action; | |||
regionsPerServer[ar.getServer()] = | |||
addRegion(regionsPerServer[ar.getServer()], ar.getRegion()); | |||
regionsPerServer.get(ar.getServer()).add(ar.getRegion()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n)->O(1) for adding an element. The old method needs an array copy.
@@ -699,102 +684,57 @@ void regionMoved(int region, int oldServer, int newServer) { | |||
// update for servers | |||
int primary = regionIndexToPrimaryIndex[region]; | |||
if (oldServer >= 0) { | |||
primariesOfRegionsPerServer[oldServer] = | |||
removeRegion(primariesOfRegionsPerServer[oldServer], primary); | |||
removeElement(primariesOfRegionsPerServer.get(oldServer), primary, region); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n)->O(1) for removing an element by its value. Please notice in the hash map the key is the value of the element of the old int[].
} | ||
primariesOfRegionsPerServer[newServer] = | ||
addRegionSorted(primariesOfRegionsPerServer[newServer], primary); | ||
addElement(primariesOfRegionsPerServer.get(newServer), primary, region); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n)->O(1) for adding an element by its value. The old implementation takes an array copy.
int newLocation = serverIndexToLocation[newServer]; | ||
if (newLocation != oldLocation) { | ||
regionsPerLocation.get(newLocation).add(region); | ||
addElement(primariesOfRegionsPerLocation.get(newLocation), primary, region); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n)->O(1) for adding an element. old methods needs array copy.
addElement(primariesOfRegionsPerLocation.get(newLocation), primary, region); | ||
if (oldLocation >= 0) { | ||
regionsPerLocation.get(oldLocation).remove(region); | ||
removeElement(primariesOfRegionsPerLocation.get(oldLocation), primary, region); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n)->O(1) for removing an element by its value. Please notice in the hash map the key is the value of the element of the old int[].
costs[newServer] = computeCostForRegionServer(newServer); | ||
if (region == cluster.regionIndexToPrimaryIndex[region]) { | ||
if (oldServer >= 0) { | ||
costs[oldServer]--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n) -> O(1)
OK, forgot the expand the diff of the first file... Is it just because we need to copy the whole array when adding or removing elements from an array? Is it possible to just make use of something like primitive collections? Use ArrayList and HashMap will cost several times more memories... |
|
||
// we have found the primary id and the set of regions for the region to move. | ||
// now to return one of the secondary | ||
for (Integer regionIndex : selectedPrimaryIndexEntry.getValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we store the region indexes of all colocated replicas here so we don't need to do O(n) look up on regionIndexToPrimaryIndex.
There is one thing I need to explain. The int[] for all data structures I replaced with either HashSet or HashMap were to store region indexes in sorted order. The look up is always by the value of the array element, not the index, so the access is always in O(n). Because it is sorted, adding a value requires array copy. Removal is the same. I don't like the memory bloating either. Let me explore if I can find sth more memory efficient while serving the needs. |
int rand = ThreadLocalRandom.current().nextInt(cluster.regionsPerServer[server].length); | ||
return cluster.regionsPerServer[server][rand]; | ||
int rand = ThreadLocalRandom.current().nextInt(cluster.regionsPerServer.get(server).size()); | ||
return cluster.regionsPerServer.get(server).get(rand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to keep RegionsPerServer as ArrayList<ArrayList> because we need to get random element while making it compatible with regionsPerHost/Rack.
This is the comment for BalancerClusterStatus
It explains that why we use arrays instead of HashMap here. But looking at the code, we even do not use binary search when locating a value in the array, this is also very costly. Maybe we could implement something like a primitive HashMap to gain both the benefits here. |
Can we run the balancer standalone? Can we feed it scenarios and study it in operation: its performance and its memory usage? It would be fun running scenarios through the balancer in isolation w/ profiler attached. Could you jmh it even? |
Just to say that I like the @Apache9 suggestion of writing out a page on intent and how the change will bring improvement. |
primariesOfRegionsPerServer = new int[numServers][]; | ||
primariesOfRegionsPerHost = new int[numHosts][]; | ||
primariesOfRegionsPerRack = new int[numRacks][]; | ||
regionsPerServer = new ArrayList<ArrayList<Integer>>(numServers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could implement a simple IntArrayList, this could save a lot of memory, so here we could write code like
regionsPerServer = new IntArrayList[numServers];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a two dimensional collection. The same as regionsPerHost and regionPerRack which are changed from int[][] to ArrayList<HashSet> for lookup/insert and delete by value. There are methods that take all three as input. The best I can do is to make all three as array of collections(HashSet or int[]). The biggest concerns I have is to mix generic and primitives and arrays which is generally not recommended. Let me test the perf difference and report back if it is worth it. Thank you.
regionsPerHost = new ArrayList<HashSet<Integer>>(numHosts); | ||
regionsPerRack = new ArrayList<HashSet<Integer>>(numRacks); | ||
primariesOfRegionsPerServer = new ArrayList<HashMap<Integer, ArrayList<Integer>>>(numServers); | ||
primariesOfRegionsPerHost = new ArrayList<HashMap<Integer, ArrayList<Integer>>>(numHosts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just be new HashMap[numHosts]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do we really need HashMap here? The key is just a index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key is the value of primary index that we need to look up in the collection and the value of hash map is the list of region indexes of all colocated replicas here so we don't need to do O(n) look up on regionIndexToPrimaryIndex at https://github.com/apache/hbase/blob/master/hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/RegionReplicaCandidateGenerator.java#L72
Close this pr after performance evaluation showed better result with using primitive collection in #3682 |
…ate to improve computation speed for large clusters