Skip to content

KAFKA-20357: Persist lastProducerEpoch fields for the transaction log.#22096

Open
chickenchickenlove wants to merge 1 commit intoapache:trunkfrom
chickenchickenlove:KAFKA-20357
Open

KAFKA-20357: Persist lastProducerEpoch fields for the transaction log.#22096
chickenchickenlove wants to merge 1 commit intoapache:trunkfrom
chickenchickenlove:KAFKA-20357

Conversation

@chickenchickenlove
Copy link
Copy Markdown
Contributor

This change follows up on KAFKA-20310.

When producer epoch exhaustion triggers producerId rotation, the coordinator may fail after writing the updated transaction metadata but before returning the InitProducerId response to the client. After failover, a retry from the client can be incorrectly rejected with PRODUCER_FENCED.

The failure happens because the recovered transaction metadata does not restore the previous epoch information needed to recognize the retry correctly. In particular, lastProducerEpoch falls back to NO_PRODUCER_EPOCH after transaction log replay, so TransactionMetadata.prepareIncrementProducerEpoch(...) cannot identify the request as a valid retry of the earlier producerId rotation.

This PR fixes that by persisting lastProducerEpoch in TransactionLogValue for flexible transaction log records and restoring it during log replay.

With this change, retries of InitProducerId after failover are handled correctly when the original request triggered producerId rotation due to epoch exhaustion.

Tests

  • add transaction log coverage for serializing and restoring lastProducerEpoch
  • add a coordinator test covering InitProducerId retry after failover during producerId rotation caused by epoch exhaustion.

Other consideration

if (isAtLeastTransactionsV2 &&
(txnState == TransactionState.COMPLETE_COMMIT || txnState == TransactionState.COMPLETE_ABORT) &&
transitProducerEpoch == 0) {
return transitLastProducerEpoch == lastProducerEpoch && transitMetadata.prevProducerId() == producerId;
}

This code is not affected by this changes
For InitProducerId, TransactionCoordinator validates the retry using values from the client request.
In contrast, this block only validates a local state transition, so it is not directly affected by coordinator failover.

@github-actions github-actions Bot added core Kafka Broker transactions Transactions and EOS triage PRs from the community small Small PRs labels Apr 19, 2026
@github-actions
Copy link
Copy Markdown

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@chickenchickenlove
Copy link
Copy Markdown
Contributor Author

@jolshan Hi!
When you get a chance, could you take a look? 🙇‍♂️

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Kafka Broker needs-attention small Small PRs transactions Transactions and EOS triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant