Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-6844: Call shutdown on GlobalStreamThread after all StreamThreads have stopped #4950

Conversation

bbejeck
Copy link
Contributor

@bbejeck bbejeck commented May 1, 2018

Moved the shutdown of GlobalStreamThread to after all StreamThread instances have stopped.

There can be a race condition where shut down is called on a StreamThread then shut down is called on a GlobalStreamThread, but if StreamThread is delayed in shutting down, the GlobalStreamThread can shutdown first.
If the StreamThread tries to access a GlobalStateStore before closing the user can get an exception stating "..Store xxx is currently closed "

Tested by running all current streams tests.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@bbejeck
Copy link
Contributor Author

bbejeck commented May 1, 2018

@bbejeck bbejeck changed the title KAFKA-6684: Call shutdown on all GlobalStreamThread after all StreamThreads have stopped KAFKA-6844: Call shutdown on all GlobalStreamThread after all StreamThreads have stopped May 1, 2018
@bbejeck bbejeck changed the title KAFKA-6844: Call shutdown on all GlobalStreamThread after all StreamThreads have stopped KAFKA-6844: Call shutdown on GlobalStreamThread after all StreamThreads have stopped May 1, 2018
@mjsax mjsax added the streams label May 2, 2018
@mjsax mjsax requested review from mjsax and guozhangwang May 2, 2018 09:33
Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I am wondering, how hard it would be to write an test for this case?

@bbejeck
Copy link
Contributor Author

bbejeck commented May 2, 2018

@mjsax I had similar thoughts, I just ran out of time yesterday. I'll take look at getting a test together.

Copy link
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@guozhangwang
Copy link
Contributor

LGTM!

@bbejeck
Copy link
Contributor Author

bbejeck commented May 3, 2018

added an integration test

@Override
public void process(final String key, final Long value) {
// need to simulate slow processing
Utils.sleep(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better test scenario is to move the logic in close() call, i.e. when the stream thread is being shutdown, and topology is closing, we call processorNode.close() in which we wait for a while and then tries to access the global store. It mimics the case where in closing the store cache is flushed and hence tries to access the global store again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will move

@bbejeck
Copy link
Contributor Author

bbejeck commented May 4, 2018

@guozhangwang updated this

@guozhangwang
Copy link
Contributor

@bbejeck Jenkins failure is due to a recent PR removing the deprecated API.

@bbejeck bbejeck force-pushed the KAFKA-6844_race_condition_between_shutting_down_stream_and_global_threads branch from a09a661 to d6606fb Compare May 4, 2018 22:09
@bbejeck
Copy link
Contributor Author

bbejeck commented May 4, 2018

rebased this

@guozhangwang guozhangwang merged commit f448e49 into apache:trunk May 4, 2018
ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
…ds have stopped (apache#4950)

Moved the shutdown of GlobalStreamThread to after all StreamThread instances have stopped.

There can be a race condition where shut down is called on a StreamThread then shut down is called on a GlobalStreamThread, but if StreamThread is delayed in shutting down, the GlobalStreamThread can shutdown first.
If the StreamThread tries to access a GlobalStateStore before closing the user can get an exception stating "..Store xxx is currently closed "

Tested by running all current streams tests.

Reviewers: Ted Yu <yuzhihong@gmail.com>, John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
@bbejeck bbejeck deleted the KAFKA-6844_race_condition_between_shutting_down_stream_and_global_threads branch July 10, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants