Skip to content

Bulk segment deletion on Historical causes significant delays of handoff from Kafka tasks #12149

@eastcirclek

Description

@eastcirclek

Affected Version

0.22.1

Description

Since I deleted hundreds of thousands of segments, Kafka ingestion tasks have failed even after successfully publishing segments:

2022-01-11T11:14:43,310 INFO [[index_kafka_navi-gps_ae030ea793e6992_omamibla]-publish] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Published segments: [navi-gps_2022-01-11T09:00:00.000Z_2022-01-11T10:00:00.000Z_2022-01-11T09:11:38.515Z_8, navi-gps_2022-01-11T10:00:00.000Z_2022-01-11T11:00:00.000Z_2022-01-11T10:20:19.334Z_1, navi-gps_2022-01-11T11:00:00.000Z_2022-01-11T12:00:00.000Z_2022-01-11T11:00:00.031Z_7]
2022-01-11T11:14:43,311 INFO [[index_kafka_navi-gps_ae030ea793e6992_omamibla]-publish] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Saved sequence metadata to disk: []
2022-01-11T11:15:41,912 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:16:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:17:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:18:41,911 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:19:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:20:41,917 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:21:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:22:41,908 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:23:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:24:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:25:41,912 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:26:41,907 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:27:41,915 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:28:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:29:41,910 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:30:41,914 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:31:41,913 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:32:41,908 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:33:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:34:41,912 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:35:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:36:41,910 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:37:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:38:41,909 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:39:41,907 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:40:41,908 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:41:41,913 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:42:41,908 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:43:41,908 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:44:41,912 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.handoff.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for [3] Segments
2022-01-11T11:44:48,187 INFO [parent-monitor-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Triggering JVM shutdown.
2022-01-11T11:44:48,188 INFO [Thread-73] org.apache.druid.cli.CliPeon - Running shutdown hook
2022-01-11T11:44:48,189 INFO [Thread-73] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]

The kafka task has waited for Handoff for PT30M which is I guess the default value of completionTimeout of KafkaSupervisorIOConfig.

I found that Historical is busy handling of unannouncement of deleted segments for tens of hours. The newly published segments seem to be registered and become available only after the segments have been all unannounced.

Restarting the coordinator seems to force Historical to register the new segments [1], but it doesn't seem like a solution.

[1] https://groups.google.com/g/druid-user/c/h_E3ZDeVHd4

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions