Skip to content

KAFKA-20169: Support static membership for Kafka Streams with the streams rebalance protocol (1/N)#22245

Open
chickenchickenlove wants to merge 4 commits into
apache:trunkfrom
chickenchickenlove:KAFKA-20169-SERVER-CORE
Open

KAFKA-20169: Support static membership for Kafka Streams with the streams rebalance protocol (1/N)#22245
chickenchickenlove wants to merge 4 commits into
apache:trunkfrom
chickenchickenlove:KAFKA-20169-SERVER-CORE

Conversation

@chickenchickenlove
Copy link
Copy Markdown
Contributor

Introduction

This PR enables "static membership for Streams group heartbeat" and aligns group-epoch behavior with the KIP-1071 intent for Streams rebalancing.

Notification

  1. According to KIP-1071, the group epoch must be bumped only when one of the following member attributes changes: topology epoch, rack ID, client tags, or process ID. However, the previous implementation determined whether to bump the group epoch by comparing the entire member record. In this PR, the static membership path is updated so that the group epoch is bumped only when at least one of topology epoch, rack ID, client tags, or process ID differs.
  2. When a static member leaves with member epoch -2 and later rejoins with the same instance ID, we treat this as not constituting a member join/leave for the purpose of group-epoch bumping. Accordingly, the implementation does not bump the group epoch in this case, since it does not satisfy the “member join/leave” condition described in KIP-1071.
### KIP-1071 says
The group epoch is bumped:
- When a member joins or leaves the group.
- When a member is fenced or removed from the group by the group coordinator.
- When the partition metadata is updated. For instance when a new partition is added or a new topic matching the - subscribed topics is created.
- When a member with an assigned warm-up task reports a task changelog offset and task changelog end offset whose difference is less than acceptable.recovery.lag.
- When a member updates its topology metadata, rack ID, client tags or process ID. Note: Typically, these do not change within the lifetime of a Streams client, so this only happens when a member with static membership rejoins with an updated configuration.
- When an assignment configuration for the group is updated.

Changes

  • Removed service-layer rejection of static membership in Streams heartbeat.
  • Added static-member validation paths in Streams coordinator logic:
    • unreleased instance id
    • fenced instance id
    • unknown static member
  • Added static Streams member subscribe/replace flow:
    • supports join of new static member
    • supports rejoin replacement with new memberId for same instanceId
    • writes replacement records (tombstone old + create new)
  • Added static leave handling for Streams:
    • -2 (temporary leave): keep static identity and write current-assignment epoch -2
    • -1 (actual leave): fence/remove member and bump group epoch
  • Updated max-size handling:
    • If instanceId already maps to an existing static member, skip max-size rejection for replacement flow
  • Updated assignment continuity for static rejoin:
    • when needed, resolve target assignment from previous static member id
  • Updated epoch bump decision for static rejoin:
    • do not bump on member-id change alone
    • bump only on epoch-relevant metadata changes (topology epoch, rack id, client tags, process id)
  • Added helper builder support for cloning Streams member with a new member id.
  • Added utility method to check whether a static member mapping is currently valid.
  • Add test case.

Scope

  • Only server-side implementation is included.

@github-actions github-actions Bot added group-coordinator triage PRs from the community labels May 10, 2026
@chickenchickenlove
Copy link
Copy Markdown
Contributor Author

This PR is part of #21565 and implements only server-side protocol and test cases.
If test cases are too big, I will separate this PR into code PR and test PR.
Please let me know 🙇‍♂️

@chickenchickenlove chickenchickenlove force-pushed the KAFKA-20169-SERVER-CORE branch from 5d0997c to 441be69 Compare May 11, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group-coordinator triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant