Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-10292: Set min.insync.replicas to 1 of __consumer_offsets #9286

Merged

Conversation

cadonna
Copy link
Contributor

@cadonna cadonna commented Sep 14, 2020

The test StreamsBrokerBounceTest.test_all_brokers_bounce() fails on
2.5 because in the last stage of the test there is only one broker
left and the offset commit cannot succeed because the
min.insync.replicas of __consumer_offsets is set to 2 and acks is
set to all. This causes a time out and extends the closing of the
Kafka Streams client to beyond the duration passed to the close
method of the client.

This affects especially the 2.5 branch since there Kafka Streams
commits offsets for each task, i.e., close() needs to wait for the
timeout for each task. In 2.6 and trunk the offset commit is done
per thread, so close() does only need to wait for one time out per
stream thread.

I opened this PR on trunk, since the test could also become
flaky on trunk and we want to avoid diverging system tests across
branches.

A more complete solution would be to improve the test by defining
a better success criteria.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

The test StreamsBrokerBounceTest.test_all_brokers_bounce() fails on
2.5 because in the last stage of the test there is only one broker
left and the offset commit cannot succeed because the
min.insync.replicas of __consumer_offsets is set to 2 and acks is
set to all. This causes a time out and extends the closing of the
Kafka Streams client to beyond the duration passed to the close
method of the client.

This affects especially the 2.5 branch since there Kafka Streams
commits offsets for each task, i.e., close() needs to wait for the
timeout for each task. In 2.6 and trunk the offset commit is done
per thread, so close() does only need to wait for one time out per
stream thread.

I opened this PR on trunk, since the test could also become
flaky on trunk and we want to avoid diverging system tests across
branches.

A more complete solution would be to improve the test by defining
a better success criteria.
@cadonna
Copy link
Contributor Author

cadonna commented Sep 14, 2020

Call for review: @vvcephei @guozhangwang

@cadonna
Copy link
Contributor Author

cadonna commented Sep 15, 2020

@guozhangwang
Copy link
Contributor

LGTM!

@guozhangwang guozhangwang merged commit a46c07e into apache:trunk Sep 15, 2020
@cadonna
Copy link
Contributor Author

cadonna commented Sep 15, 2020

@guozhangwang, Thank you for merging! Could you please cherry-pick this PR to 2.6 and 2.5?

guozhangwang pushed a commit that referenced this pull request Sep 15, 2020
The test StreamsBrokerBounceTest.test_all_brokers_bounce() fails on
2.5 because in the last stage of the test there is only one broker
left and the offset commit cannot succeed because the
min.insync.replicas of __consumer_offsets is set to 2 and acks is
set to all. This causes a time out and extends the closing of the
Kafka Streams client to beyond the duration passed to the close
method of the client.

This affects especially the 2.5 branch since there Kafka Streams
commits offsets for each task, i.e., close() needs to wait for the
timeout for each task. In 2.6 and trunk the offset commit is done
per thread, so close() does only need to wait for one time out per
stream thread.

I opened this PR on trunk, since the test could also become
flaky on trunk and we want to avoid diverging system tests across
branches.

A more complete solution would be to improve the test by defining
a better success criteria.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
guozhangwang pushed a commit that referenced this pull request Sep 15, 2020
The test StreamsBrokerBounceTest.test_all_brokers_bounce() fails on
2.5 because in the last stage of the test there is only one broker
left and the offset commit cannot succeed because the
min.insync.replicas of __consumer_offsets is set to 2 and acks is
set to all. This causes a time out and extends the closing of the
Kafka Streams client to beyond the duration passed to the close
method of the client.

This affects especially the 2.5 branch since there Kafka Streams
commits offsets for each task, i.e., close() needs to wait for the
timeout for each task. In 2.6 and trunk the offset commit is done
per thread, so close() does only need to wait for one time out per
stream thread.

I opened this PR on trunk, since the test could also become
flaky on trunk and we want to avoid diverging system tests across
branches.

A more complete solution would be to improve the test by defining
a better success criteria.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
@guozhangwang
Copy link
Contributor

Yeah I've already cherry-picked to 2.6/2.5.

@cadonna
Copy link
Contributor Author

cadonna commented Sep 15, 2020

Thanks a lot!

@cadonna cadonna deleted the AK10292-fix_broker_bounce_system_test branch September 16, 2020 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants