Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix PulsarService/BrokerService shutdown when brokerShutdownTimeoutMs=0 #21496

Merged

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Nov 1, 2023

Motivation

PulsarService shutdown leaks threads in tests when execution is slow enough that the thread gets interrupted before the second phase of the shutdown is executed. It is hard to reproduce this issue on an ordinary developer machine because execution completes so quickly that the path where the thread has already been interrupted doesn't happen. In CI, there's 2 cores and 4 test processes running concurrently and the problem occurs fairly often. For example, the thread leak detector reported that ServerCnxTest leaked 144 threads in this build:

Tests in class ServerCnxTest created thread id 10950 with name 'Thread[broker-topic-workers-OrderedExecutor-0-0,5,main]'
Tests in class ServerCnxTest created thread id 10951 with name 'Thread[broker-topic-workers-OrderedExecutor-1-0,5,main]'
Tests in class ServerCnxTest created thread id 10952 with name 'Thread[pulsar-stats-updater-OrderedScheduler-0-0,5,main]'
Tests in class ServerCnxTest created thread id 10953 with name 'Thread[pulsar-inactivity-monitor-OrderedScheduler-0-0,5,main]'
Tests in class ServerCnxTest created thread id 10954 with name 'Thread[pulsar-msg-expiry-monitor-OrderedScheduler-0-0,5,main]'
Tests in class ServerCnxTest created thread id 10955 with name 'Thread[pulsar-compaction-monitor-OrderedScheduler-0-0,5,main]'
Tests in class ServerCnxTest created thread id 10956 with name 'Thread[pulsar-consumed-ledgers-monitor-OrderedScheduler-0-0,5,main]'
Tests in class ServerCnxTest created thread id 10957 with name 'Thread[pulsar-backlog-quota-checker-OrderedScheduler-0-0,5,main]'
Tests in class ServerCnxTest created thread id 11122 with name 'Thread[broker-topic-workers-OrderedExecutor-0-0,5,main]'
Tests in class ServerCnxTest created thread id 11123 with name 'Thread[broker-topic-workers-OrderedExecutor-1-0,5,main]'
Tests in class ServerCnxTest created thread id 11124 with name 'Thread[pulsar-stats-updater-OrderedScheduler-0-0,5,main]'
Tests in class ServerCnxTest created thread id 11125 with name 'Thread[pulsar-inactivity-monitor-OrderedScheduler-0-0,5,main]'
...
...
...
Warning: Summary: Tests in class org.apache.pulsar.broker.service.ServerCnxTest created 144 new threads. There are now 154 threads in total.

Modifications

  • Adding the future cancellation timeout logic doesn't make sense when brokerShutdownTimeoutMs=0. Skip adding the timeout logic and refactor the method.
  • Running the second phase on the completion thread blocks a Netty thread. It's better to run the 2nd phase of the shutdown in a new separate thread.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Copy link
Contributor

@cbornet cbornet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari lhotari merged commit 6ab322e into apache:master Nov 1, 2023
46 of 47 checks passed
Technoboy- pushed a commit that referenced this pull request Nov 10, 2023
Technoboy- pushed a commit that referenced this pull request Nov 10, 2023
nborisov pushed a commit to nborisov/pulsar that referenced this pull request Nov 13, 2023
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants