HBASE-30134: Improve CacheAwareLoadBalancer to consider low cache ratio when calculating imbalance#8197
HBASE-30134: Improve CacheAwareLoadBalancer to consider low cache ratio when calculating imbalance#8197wchevreuil wants to merge 4 commits intoapache:masterfrom
Conversation
…io when calculating imbalance Change-Id: I0880f137e2e9efe950de021a9e832e240378f9c7
Change-Id: Ie87a60c1a822682786ca6b2bc9ce4b0a86af54ee
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Improves cache-aware balancing by incorporating reported block cache free space and “cold data” region metrics into CacheAwareLoadBalancer’s imbalance calculation, with accompanying protocol/client/server wiring and tests.
Changes:
- Extend ServerLoad/ServerMetrics to report block cache free bytes and per-region cold data size.
- Update CacheAwareLoadBalancer cost computation to consider low cache ratios alongside free cache capacity and cold-data adjustments.
- Add/adjust unit tests and constructors to pass the new metrics through BalancerClusterState.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestCacheAwareLoadBalancerCostFunctions.java | Adds tests and helpers for needsBalance behavior with low cache ratio + free cache space. |
| hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionsRecoveryChore.java | Updates ServerMetrics test stub to implement new cold-data API. |
| hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java | Populates ServerLoad with cacheFreeSize and regionColdData. |
| hbase-protocol-shaded/src/main/protobuf/server/ClusterStatus.proto | Adds new ServerLoad fields for cacheFreeSize and regionColdData. |
| hbase-client/src/main/java/org/apache/hadoop/hbase/ServerMetricsBuilder.java | Parses/serializes new metrics into ServerMetrics and ServerLoad. |
| hbase-client/src/main/java/org/apache/hadoop/hbase/ServerMetrics.java | Adds new ServerMetrics APIs for cacheFreeSize and regionColdDataSize. |
| hbase-balancer/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStoreFileTableSkewCostFunction.java | Updates BalancerClusterState ctor usage for new parameter. |
| hbase-balancer/src/test/java/org/apache/hadoop/hbase/master/balancer/CandidateGeneratorTestUtil.java | Updates BalancerClusterState ctor usage for new parameter. |
| hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java | Introduces overridable createState() factory for BalancerClusterState. |
| hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java | Uses cacheFreeSize + cold-data metrics in cache cost heuristics and state creation. |
| hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java | Updates BalancerClusterState ctor usage for new parameter. |
| hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/BalancerConditionals.java | Updates BalancerClusterState ctor usage for new parameter. |
| hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/BalancerClusterState.java | Stores per-server cache free bytes and exposes observed region cache ratio helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (DataTieringManager.getInstance() != null) { | ||
| DataTieringManager.getInstance().getRegionColdDataSize() |
There was a problem hiding this comment.
No, DataTieringManager singleton instance cannot be null once it has been initialised. There's a resetForTestingOnly method, but that's for test purposes only. Besides, even if it was possible to disable DataTieringManager, we wouldn't want to use the stale instance.
| private static List<ServerName> | ||
| getServersInInsertionOrder(Map<ServerName, List<RegionInfo>> cluster) { | ||
| return new ArrayList<>(cluster.keySet()); | ||
| } |
There was a problem hiding this comment.
Well, we have entire control over the map implementation instance passed from our tests, which is a LinkedHashMap, so no need to worry about this here.
| * Unallocated block cache capacity on this RegionServer, in bytes. | ||
| * Used by the master for cache-aware load balancing (optional). | ||
| */ |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Change-Id: I2d35abc09cae1c0e9a7d758b90772d4a0c2abb93
No description provided.