Skip to content

When a kafka node goes down in single-replica topic mode, the Kafka indexing task cannot be started #12385

@wanglihui-git

Description

@wanglihui-git

Affected Version

0.18.1——0.22.1

Description

Due to the large amount of data in the production environment, our kafka cluster had to use a single-replica topic. When a kafka node goes down, the kafka indexing task cannot be started. The normal running Supervisor can still run continuously, but after the reset operation, it can't run either.

If this happens in the production environment, and the kafka node is down and cannot be recovered in a short time, how can the Druid task increase the reliability of it?

The following is a screenshot of my test. The error message is: 'Timeout of 60000ms expired before the position for partition topic-0 could be determined'.After a while, the Supervisors state changed to 'LOST_CONTACT_WITH_STREAM'.
image
Uploading image.png…

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions