Indexing scheduling unbalanced for Kafka source

**Describe the bug**
When using a `SourceToScheduleType::NonSharded` (e.g Kafka), the current implementation of the the indexing scheduler seems systematically collocates all pipeline of a given source into the same indexer. For the Kafka source, this prevents distributing the indexing load of a given topic across indexers.

Note that this problem was already reported [here](https://github.com/quickwit-oss/quickwit/issues/4624#issuecomment-1996319156). The proposed solution of setting a small `cpu_capacity` does not work because the scheduler scales the capacities to fit the workload before assigning the pipelines to the nodes.

**Steps to reproduce (if applicable)**
See test in comments.

**Expected behavior**
Pipelines with high throughputs should be more or less evenly distributed across indexers.

**Possible solutions**

1) measure the actual load for each Kafka source ([currently hardcoded to 4CPU](https://github.com/quickwit-oss/quickwit/blob/7dabbe5fa55e3b42c2061915287645547d321582/quickwit/quickwit-control-plane/src/indexing_scheduler/mod.rs#L214-L229)) and use that for scheduling. This increases the risk of entering rebalancing ping pong between the control plane and the Kafka reblancing protocol.
2) for each source, try to first limit the max number of pipelines that can be assigned to each node according to its unscaled original capacity.
3) (variant of 2) re-introduce a source parameter like `max_num_pipelines_per_indexer` so that users can at least manually force the distribution of the load for given source/topics across nodes. This parameter would be pretty hard to configure properly (and hard to maintain for fluctuating workloads)

EDIT:
4) add a "num cpu per pipeline" parameter to the source, to make it possible to inform Quickwit that some Kafka topic do not require such a large amount of cpu.
5) (variant of 4) add an "average data rate" parameter to the source, which would have the same effect as "num cpu per pipeline" but easier for the user to configure (QW internally converts the bandwidth to CPUs)

**Configuration:**
Main (but same behavior in 0.8).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Indexing scheduling unbalanced for Kafka source #5747

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Indexing scheduling unbalanced for Kafka source #5747

Description

Activity

rdettai commented on Apr 15, 2025

rdettai commented on Apr 16, 2025

hardboiled commented on Apr 22, 2025

rdettai commented on Apr 23, 2025

fulmicoton-dd commented on Apr 23, 2025

daniele-br commented on Apr 28, 2025

fulmicoton-dd commented on Apr 29, 2025

rdettai commented on Apr 29, 2025

fulmicoton-dd commented on May 14, 2025

fulmicoton-dd commented on May 14, 2025

evanxg852000 commented on May 22, 2025

rdettai commented on Jun 10, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions