Skip to content

Potential Bug in Smart Loading: numBalancerThreads Calculation When Server CPU Count Exceeds 200 #17801

@aiden-sun

Description

@aiden-sun

Affected Version

Apache Druid version:31.00-32.00

Description

We have encountered an issue with the smart loading mechanism in Apache Druid, specifically related to the calculation of numBalancerThreads when the server has a high CPU count (e.g., exceeding 200 CPUs).

Cluster Size:

  • CPU count per server: > 200,
  • Historical nodes per server: 6
  • Configurations in Use:smartSegmentLoading=true

Steps to Reproduce the Problem:
Deploy Apache Druid on a server with more than 200 CPUs, ensuring that the smartSegmentLoading configuration is set to true.
Initiate a data balancing operation.
Observe the calculation of numBalancerThreads in the logs.
Error Message:
java.lang.IllegalArgumentException: Number of balancer threads must be in range (0, 100].
Observed Behavior:
When the server's CPU count exceeds 200, Druid calculates numBalancerThreads as CPU count / 2. However, this value exceeds the allowed range of (0, 100], resulting in the above exception. This prevents the data balancing operation from proceeding successfully.
Additional Source Code:
return Math.max(1, JvmUtils.getRuntimeInfo().getAvailableProcessors() / 2)
Proposed Solution:
Remove the hard-coded upper limit of 100 for numBalancerThreads or make it configurable, allowing it to scale dynamically based on the server's CPU count. This would ensure compatibility with high-CPU environments and prevent the exception.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions