Skip to content

Actively Reading tasks fails even though Pending completion tasks are a success #16727

@hardikbajaj

Description

@hardikbajaj

Affected Version

Druid 25.0.0

Description

We are facing a bug in SeekableStreamSupervisor.java .
We are running a Kafka Ingestion Supervisor, with task replication 2. Task group [3] consuming from partition [P1, P2] contains two tasks [A1, A2] , each one consuming from same partitions (Replica tasks) . When we submit the supervisor config, [A1, A2] are to be added in PendingCompletionTaskgroup , as { groupId:3, [ TaskGroup[A1, A2] ] } . But what we observed from some logs like
1 . Creating new pending completion task group [3] for discovered task [A1] overlord-0
2 . Creating new pending completion task group [3] for discovered task [A2] overlord-0
3 . Added discovered task [A2] to existing pending task group [3]
Which makes the state of PendingCompletionTaskGroup as { groupId:3, [ TaskGroup[A1, A2], TaskGroup[A2] ] } . We have seen the logs from sequence 1 and 2 are from two different process threads, and probably due to some Synchronisation issues in CopyOnWriteArrayList , we are getting two task groups with duplicate taskIds, instead of one in a taskGroup.
We have further Rootcaused this issue due to forEach loop here, which instead of calling addDiscoveredTaskToPendingCompletionTaskGroups for A1, A2 , permutes it with partitions too, and calls for A1_P1, A2_P1, A1_P2, A2_P2 , causing these multiple task groups to be created in a lot of cases. (Almost 17% of cases)
The problem that arises due to this is, as we have [ TaskGroup[A1, A2], TaskGroup[A2] ] in PendingCompletionGroup for a GroupId. We see in rare cases where even if A1 passes, and A2 fails . It can mark entireTaskGroupFailed = true . and not only kill [A1,A2] but kill all the actively running tasks which are running, causing ingestion lags.
We wanted to have a discussion on a potential fix for this issue, as this bug can cause ingestion lag incidents when we submit supervisor. On that, I also wanted a clarity on is having task group like [ TaskGroup[A1, A2], TaskGroup[A2] ] in pendingCompletionGroup is an expected behaviour ? Should the taskId not be permuted with partitions Or we should add some synchronisation locks on preventing two replica tasks to be added in two different groups for a single groupId. Or is it a case to be considered in void checkPendingCompletionTasks() to avoid killing Actively running tasks.
Any help on understanding it better would be appreciated.

Please include as much detailed information about the problem as possible.
I have included everything I can in the issue description, if any more logs or debugging is required, I can add in comments on ad-hoq requests

  • Cluster size
  • Configurations in use
  • Steps to reproduce the problem
  • The error message or stack traces encountered. Providing more context, such as nearby log messages or even entire logs, can be helpful.
  • Any debugging that you have already done

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions