KAFKA-7858: Replace JoinGroup request/response with automated protocol #6419

abbccdda · 2019-03-10T18:52:50Z

Prioritizing this migration because we have blocking feature from KIP-345 part 1: #6177

Upgrade join group protocols could ease the process of adding group instance id towards JoinGroupResponse.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

abbccdda · 2019-03-11T04:42:43Z

@cmccabe @hachikuji Could you guys give this diff a quick r? This would unblock the progress on KIP-345, thanks!

abbccdda · 2019-03-11T17:26:14Z

cc @vahidhashemian on this thread since the change is pretty straightforward and requires no background context.

abbccdda · 2019-03-12T02:11:56Z

Retest this please

abbccdda · 2019-03-14T20:32:29Z

Retest this please

cmccabe · 2019-03-15T17:14:15Z

@abbccdda: looks good, thanks!

When processing older responses, I think generation ID needs to default to -1 to match the previous behavior. This should be changed in JoinGroupResponse.json.

cmccabe · 2019-03-15T17:14:38Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java

@@ -202,7 +203,7 @@ public AbstractCoordinator(LogContext logContext,
     */
    protected abstract Map<String, ByteBuffer> performAssignment(String leaderId,
                                                                 String protocol,
-                                                                 Map<String, ByteBuffer> allMemberMetadata);
+                                                                 List<JoinGroupResponseData.JoinGroupResponseMember> allMemberMetadata);


Should this be a JoinGroupResponseDataSet?

I don't see this struct anywhere, but I think a list should be fine here?

Sorry, I meant JoinGroupResponseMemberSet. Anyway, a list is fine too-- I don't feel that strongly about it.

abbccdda · 2019-03-15T23:58:54Z

Retest this please

abbccdda · 2019-03-16T03:53:04Z

@cmccabe @hachikuji mind taking another look?

abbccdda · 2019-03-18T18:11:24Z

@cmccabe Another look when you get time?

cmccabe · 2019-03-18T20:22:54Z

LGTM. Thanks, @abbccdda .

abbccdda · 2019-03-18T21:36:08Z

Thanks! @cmccabe

…p v0 (#7072) The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3, we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal to the session timeout. We lost this logic when we converted the API to use the generated schema definition (#6419) which uses the default value of -1. The impact of this is that the group rebalance timeout becomes 0, so rebalances finish immediately after we enter the PrepareRebalance state and kick out all old members. This causes consumer groups to enter an endless rebalance loop. This patch restores the old behavior. Reviewers: Ismael Juma <ismael@juma.me.uk>

…p v0 (apache#7072) The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3, we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal to the session timeout. We lost this logic when we converted the API to use the generated schema definition (apache#6419) which uses the default value of -1. The impact of this is that the group rebalance timeout becomes 0, so rebalances finish immediately after we enter the PrepareRebalance state and kick out all old members. This causes consumer groups to enter an endless rebalance loop. This patch restores the old behavior. Reviewers: Ismael Juma <ismael@juma.me.uk>

…p v0 (#7072) The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3, we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal to the session timeout. We lost this logic when we converted the API to use the generated schema definition (#6419) which uses the default value of -1. The impact of this is that the group rebalance timeout becomes 0, so rebalances finish immediately after we enter the PrepareRebalance state and kick out all old members. This causes consumer groups to enter an endless rebalance loop. This patch restores the old behavior. Reviewers: Ismael Juma <ismael@juma.me.uk>

…ession timeout for JoinGroup v0 (apache#7072) TICKET = KAFKA-8653 LI_DESCRIPTION = EXIT_CRITERIA = HASH [b725b3c] ORIGINAL_DESCRIPTION = The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3, we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal to the session timeout. We lost this logic when we converted the API to use the generated schema definition (apache#6419) which uses the default value of -1. The impact of this is that the group rebalance timeout becomes 0, so rebalances finish immediately after we enter the PrepareRebalance state and kick out all old members. This causes consumer groups to enter an endless rebalance loop. This patch restores the old behavior. Reviewers: Ismael Juma <ismael@juma.me.uk> (cherry picked from commit b725b3c)

*add more validation during KRPC deserialization When deserializing KRPC (which is used for RPCs sent to Kafka, Kafka Metadata records, and some other things), check that we have at least N bytes remaining before allocating an array of size N. Remove DataInputStreamReadable since it was hard to make this class aware of how many bytes were remaining. Instead, when reading an individual record in the Raft layer, simply create a ByteBufferAccessor with a ByteBuffer containing just the bytes we're interested in. Add SimpleArraysMessageTest and ByteBufferAccessorTest. Also add some additional tests in RequestResponseTest. Co-author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@gmail.com>

*add more validation during KRPC deserialization When deserializing KRPC (which is used for RPCs sent to Kafka, Kafka Metadata records, and some other things), check that we have at least N bytes remaining before allocating an array of size N. Remove DataInputStreamReadable since it was hard to make this class aware of how many bytes were remaining. Instead, when reading an individual record in the Raft layer, simply create a ByteBufferAccessor with a ByteBuffer containing just the bytes we're interested in. Add SimpleArraysMessageTest and ByteBufferAccessorTest. Also add some additional tests in RequestResponseTest. Co-author: Manikumar Reddy <manikumar.reddy@gmail.com>

abbccdda force-pushed the join_group_upgrade branch 7 times, most recently from 2ee5bdd to b68720a Compare March 11, 2019 04:39

abbccdda force-pushed the join_group_upgrade branch from b68720a to fbdd1d1 Compare March 11, 2019 16:51

abbccdda force-pushed the join_group_upgrade branch from fbdd1d1 to 2472a8a Compare March 11, 2019 21:56

abbccdda mentioned this pull request Mar 12, 2019

KAFKA-7862 & KIP-345 part-1: Add static membership logic to JoinGroup protocol #6177

Merged

3 tasks

abbccdda force-pushed the join_group_upgrade branch 4 times, most recently from 8126990 to 85ee20f Compare March 14, 2019 16:54

abbccdda force-pushed the join_group_upgrade branch from 85ee20f to dbb400b Compare March 14, 2019 23:51

cmccabe reviewed Mar 15, 2019

View reviewed changes

join group change

99578dc

abbccdda force-pushed the join_group_upgrade branch from dbb400b to 99578dc Compare March 15, 2019 18:44

cmccabe merged commit 8406f36 into apache:trunk Mar 18, 2019

hachikuji mentioned this pull request Jul 11, 2019

KAFKA-8653; Default rebalance timeout to session timeout for JoinGroup v0 #7072

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-7858: Replace JoinGroup request/response with automated protocol #6419

KAFKA-7858: Replace JoinGroup request/response with automated protocol #6419

abbccdda commented Mar 10, 2019

abbccdda commented Mar 11, 2019

abbccdda commented Mar 11, 2019

abbccdda commented Mar 12, 2019

abbccdda commented Mar 14, 2019

cmccabe commented Mar 15, 2019

cmccabe Mar 15, 2019

abbccdda Mar 15, 2019

cmccabe Mar 18, 2019

abbccdda commented Mar 15, 2019

abbccdda commented Mar 16, 2019

abbccdda commented Mar 18, 2019

cmccabe commented Mar 18, 2019

abbccdda commented Mar 18, 2019

KAFKA-7858: Replace JoinGroup request/response with automated protocol #6419

KAFKA-7858: Replace JoinGroup request/response with automated protocol #6419

Conversation

abbccdda commented Mar 10, 2019

Committer Checklist (excluded from commit message)

abbccdda commented Mar 11, 2019

abbccdda commented Mar 11, 2019

abbccdda commented Mar 12, 2019

abbccdda commented Mar 14, 2019

cmccabe commented Mar 15, 2019

cmccabe Mar 15, 2019

Choose a reason for hiding this comment

abbccdda Mar 15, 2019

Choose a reason for hiding this comment

cmccabe Mar 18, 2019

Choose a reason for hiding this comment

abbccdda commented Mar 15, 2019

abbccdda commented Mar 16, 2019

abbccdda commented Mar 18, 2019

cmccabe commented Mar 18, 2019

abbccdda commented Mar 18, 2019