Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-7858: Replace JoinGroup request/response with automated protocol #6419

Merged
merged 1 commit into from
Mar 18, 2019

Conversation

abbccdda
Copy link
Contributor

Prioritizing this migration because we have blocking feature from KIP-345 part 1: #6177

Upgrade join group protocols could ease the process of adding group instance id towards JoinGroupResponse.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@abbccdda abbccdda force-pushed the join_group_upgrade branch 7 times, most recently from 2ee5bdd to b68720a Compare March 11, 2019 04:39
@abbccdda
Copy link
Contributor Author

@cmccabe @hachikuji Could you guys give this diff a quick r? This would unblock the progress on KIP-345, thanks!

@abbccdda
Copy link
Contributor Author

cc @vahidhashemian on this thread since the change is pretty straightforward and requires no background context.

@abbccdda
Copy link
Contributor Author

Retest this please

@abbccdda
Copy link
Contributor Author

Retest this please

@cmccabe
Copy link
Contributor

cmccabe commented Mar 15, 2019

@abbccdda: looks good, thanks!

When processing older responses, I think generation ID needs to default to -1 to match the previous behavior. This should be changed in JoinGroupResponse.json.

@@ -202,7 +203,7 @@ public AbstractCoordinator(LogContext logContext,
*/
protected abstract Map<String, ByteBuffer> performAssignment(String leaderId,
String protocol,
Map<String, ByteBuffer> allMemberMetadata);
List<JoinGroupResponseData.JoinGroupResponseMember> allMemberMetadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a JoinGroupResponseDataSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this struct anywhere, but I think a list should be fine here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant JoinGroupResponseMemberSet. Anyway, a list is fine too-- I don't feel that strongly about it.

@abbccdda
Copy link
Contributor Author

Retest this please

@abbccdda
Copy link
Contributor Author

@cmccabe @hachikuji mind taking another look?

@abbccdda
Copy link
Contributor Author

@cmccabe Another look when you get time?

@cmccabe
Copy link
Contributor

cmccabe commented Mar 18, 2019

LGTM. Thanks, @abbccdda .

@cmccabe cmccabe merged commit 8406f36 into apache:trunk Mar 18, 2019
@abbccdda
Copy link
Contributor Author

Thanks! @cmccabe

ijuma pushed a commit that referenced this pull request Jul 11, 2019
…p v0 (#7072)

The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.

Reviewers: Ismael Juma <ismael@juma.me.uk>
ijuma pushed a commit to confluentinc/kafka that referenced this pull request Jul 11, 2019
…p v0 (apache#7072)

The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (apache#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.

Reviewers: Ismael Juma <ismael@juma.me.uk>
ijuma pushed a commit that referenced this pull request Jul 11, 2019
…p v0 (#7072)

The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.

Reviewers: Ismael Juma <ismael@juma.me.uk>
xiowu0 pushed a commit to linkedin/kafka that referenced this pull request Aug 22, 2019
…ession timeout for JoinGroup v0 (apache#7072)

TICKET = KAFKA-8653
LI_DESCRIPTION =
EXIT_CRITERIA = HASH [b725b3c]
ORIGINAL_DESCRIPTION =

The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (apache#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.

Reviewers: Ismael Juma <ismael@juma.me.uk>
(cherry picked from commit b725b3c)
singhnama pushed a commit to singhnama/kafka that referenced this pull request Jul 20, 2022
*add more validation during KRPC deserialization

When deserializing KRPC (which is used for RPCs sent to Kafka, Kafka Metadata records, and some
other things), check that we have at least N bytes remaining before allocating an array of size N.

Remove DataInputStreamReadable since it was hard to make this class aware of how many bytes were
remaining. Instead, when reading an individual record in the Raft layer, simply create a
ByteBufferAccessor with a ByteBuffer containing just the bytes we're interested in.

Add SimpleArraysMessageTest and ByteBufferAccessorTest. Also add some additional tests in
RequestResponseTest.

Co-author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@gmail.com>
omkreddy pushed a commit to omkreddy/kafka that referenced this pull request Jul 27, 2022
*add more validation during KRPC deserialization

When deserializing KRPC (which is used for RPCs sent to Kafka, Kafka Metadata records, and some
other things), check that we have at least N bytes remaining before allocating an array of size N.

Remove DataInputStreamReadable since it was hard to make this class aware of how many bytes were
remaining. Instead, when reading an individual record in the Raft layer, simply create a
ByteBufferAccessor with a ByteBuffer containing just the bytes we're interested in.

Add SimpleArraysMessageTest and ByteBufferAccessorTest. Also add some additional tests in
RequestResponseTest.

Co-author: Manikumar Reddy <manikumar.reddy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants