New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-12983: reset needsJoinPrepare flag before rejoining the group #10986
KAFKA-12983: reset needsJoinPrepare flag before rejoining the group #10986
Conversation
@ableegoldman Thanks for the patch. The change makes sense to me. I wonder if we could add a unit test which would fail without it though. This would avoid regressing in the future. What do you think? |
@ableegoldman Thanks for the patch. I think the original idea behind the implementation was to ensure that each rebalance triggered only one call to |
@hachikuji I think the key idea behind this fix is that, if a rebalance failed with e.g. memberId lost, then conceptually we would just started a new rebalance in which we would call Personally I think this fix is fine -- @ableegoldman if you could just add a unit test for the case of memberId lost during a first rebalance, and check that we would re-triggered |
To clarify, from the perspective of the eager protocol, how would this case look? Would we get multiple calls to |
@hachikuji in the EAGER case, after the first @everyone, I was having trouble getting a unit test that would actually verify this behavior but I wanted to kick off discussion on the fix ASAP (for obvious reasons) so I opened the PR without one. I do intended to add a test, I just haven't had time to pursue that yet. Suggestions welcome :P |
Ok I realize we actually do have a test that reproduces this already: |
5911911
to
0483d07
Compare
Now ready for review @dajac @hachikuji @guozhangwang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
Two test failures, both |
…10986) The #onJoinPrepare callback is not always invoked before a member (re)joins the group, but only once when it first enters the rebalance. This means that any updates or events that occur during the join phase can be lost in the internal state: for example, clearing the SubscriptionState (and thus the "ownedPartitions" that are used for cooperative rebalancing) after losing its memberId during a rebalance. We should reset the needsJoinPrepare flag inside the resetStateAndRejoin() method. Reviewers: Guozhang Wang <guozhang@apache.org>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>
Merged to trunk and cherrypicked to 2.8 & 3.0 (cc @kkonstantine) |
…10986) The #onJoinPrepare callback is not always invoked before a member (re)joins the group, but only once when it first enters the rebalance. This means that any updates or events that occur during the join phase can be lost in the internal state: for example, clearing the SubscriptionState (and thus the "ownedPartitions" that are used for cooperative rebalancing) after losing its memberId during a rebalance. We should reset the needsJoinPrepare flag inside the resetStateAndRejoin() method. Reviewers: Guozhang Wang <guozhang@apache.org>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>
…pache#10986) The #onJoinPrepare callback is not always invoked before a member (re)joins the group, but only once when it first enters the rebalance. This means that any updates or events that occur during the join phase can be lost in the internal state: for example, clearing the SubscriptionState (and thus the "ownedPartitions" that are used for cooperative rebalancing) after losing its memberId during a rebalance. We should reset the needsJoinPrepare flag inside the resetStateAndRejoin() method. Reviewers: Guozhang Wang <guozhang@apache.org>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>
…pache#10986) The #onJoinPrepare callback is not always invoked before a member (re)joins the group, but only once when it first enters the rebalance. This means that any updates or events that occur during the join phase can be lost in the internal state: for example, clearing the SubscriptionState (and thus the "ownedPartitions" that are used for cooperative rebalancing) after losing its memberId during a rebalance. We should reset the needsJoinPrepare flag inside the resetStateAndRejoin() method. Reviewers: Guozhang Wang <guozhang@apache.org>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>
…pache#10986) The #onJoinPrepare callback is not always invoked before a member (re)joins the group, but only once when it first enters the rebalance. This means that any updates or events that occur during the join phase can be lost in the internal state: for example, clearing the SubscriptionState (and thus the "ownedPartitions" that are used for cooperative rebalancing) after losing its memberId during a rebalance. We should reset the needsJoinPrepare flag inside the resetStateAndRejoin() method. Reviewers: Guozhang Wang <guozhang@apache.org>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>
The
#onJoinPrepare
callback is not always invoked before a member (re)joins the group, but only once when it first enters the rebalance. This means that any updates or events that occur during the join phase can be lost in the internal state: for example, clearing the SubscriptionState (and thus the "ownedPartitions" that are used for cooperative rebalancing) after losing its memberId during a rebalance.We should reset the
needsJoinPrepare
flag inside the resetStateAndRejoin() method. Should be cherrypicked back to 2.8 at least