Affected Version
Apache Druid version:31.00-32.00
Description
We have encountered an issue with the smart loading mechanism in Apache Druid, specifically related to the calculation of numBalancerThreads when the server has a high CPU count (e.g., exceeding 200 CPUs).
Cluster Size:
- CPU count per server: > 200,
- Historical nodes per server: 6
- Configurations in Use:smartSegmentLoading=true
Steps to Reproduce the Problem:
Deploy Apache Druid on a server with more than 200 CPUs, ensuring that the smartSegmentLoading configuration is set to true.
Initiate a data balancing operation.
Observe the calculation of numBalancerThreads in the logs.
Error Message:
java.lang.IllegalArgumentException: Number of balancer threads must be in range (0, 100].
Observed Behavior:
When the server's CPU count exceeds 200, Druid calculates numBalancerThreads as CPU count / 2. However, this value exceeds the allowed range of (0, 100], resulting in the above exception. This prevents the data balancing operation from proceeding successfully.
Additional Source Code:
return Math.max(1, JvmUtils.getRuntimeInfo().getAvailableProcessors() / 2)
Proposed Solution:
Remove the hard-coded upper limit of 100 for numBalancerThreads or make it configurable, allowing it to scale dynamically based on the server's CPU count. This would ensure compatibility with high-CPU environments and prevent the exception.
Affected Version
Apache Druid version:31.00-32.00
Description
We have encountered an issue with the smart loading mechanism in Apache Druid, specifically related to the calculation of numBalancerThreads when the server has a high CPU count (e.g., exceeding 200 CPUs).
Cluster Size:
Steps to Reproduce the Problem:
Deploy Apache Druid on a server with more than 200 CPUs, ensuring that the smartSegmentLoading configuration is set to true.
Initiate a data balancing operation.
Observe the calculation of numBalancerThreads in the logs.
Error Message:
java.lang.IllegalArgumentException: Number of balancer threads must be in range (0, 100].Observed Behavior:
When the server's CPU count exceeds 200, Druid calculates numBalancerThreads as CPU count / 2. However, this value exceeds the allowed range of (0, 100], resulting in the above exception. This prevents the data balancing operation from proceeding successfully.
Additional Source Code:
return Math.max(1, JvmUtils.getRuntimeInfo().getAvailableProcessors() / 2)Proposed Solution:
Remove the hard-coded upper limit of 100 for numBalancerThreads or make it configurable, allowing it to scale dynamically based on the server's CPU count. This would ensure compatibility with high-CPU environments and prevent the exception.