Skip to content

KAFKA-19883: [DRAFT] KIP-1289 — Transactional acknowledgments for share groups (state machine + client API + wire schema)#22357

Draft
Shekharrajak wants to merge 9 commits into
apache:trunkfrom
Shekharrajak:kip-1289-txn-ack-share-groups
Draft

KAFKA-19883: [DRAFT] KIP-1289 — Transactional acknowledgments for share groups (state machine + client API + wire schema)#22357
Shekharrajak wants to merge 9 commits into
apache:trunkfrom
Shekharrajak:kip-1289-txn-ack-share-groups

Conversation

@Shekharrajak
Copy link
Copy Markdown
Contributor

Share groups (KIP-932) have no equivalent of sendOffsetsToTransaction. There is no way to atomically bind a share-group acknowledgment to a producer transaction, blocking EOS for share-group-based read-process-write pipelines.

Description

This is the foundational layer for KIP-1289. It does not include the broker-side RPC handler (follow-up PR).

  • RecordState — new TX_PENDING state (server module)
  • InFlightState — transactional staging (server module)
  • ShareGroupMetadata (clients module) - ShareConsumer.shareGroupMetadata() — fires a background event to read live membership state from ShareMembershipManager
  • Wire schema (clients module)- TxnShareAcknowledgeRequest / TxnShareAcknowledgeResponse (apiKey 93, v0)
  • Producer EOS flow (clients module) - Producer.sendShareAcknowledgementsToTransaction(

tests

• RecordStateTest — full 6-state transition matrix including all TX_PENDING paths
• InFlightStateTxnTest

follow up

• Broker-side handleTransactionnShareAcknowledge handler in Kafka Apis.scala
• ShareCoordinatorShard.replayEndTransactionMarker() real body
• SharePartition TX_PENDING acquisition exclusion
• IT tests (require broker handler)

@github-actions github-actions Bot added triage PRs from the community core Kafka Broker producer consumer clients labels May 23, 2026
ARCHIVING((byte) 3), // Per KIP-1191
ARCHIVED((byte) 4);
ARCHIVED((byte) 4),
TX_PENDING((byte) 5); // Per KIP-1289: staged into an open producer transaction
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TX_PENDING is the bridge state that holds the record - between aquired and ack (or archived)


// Either the transition is from Available -> Acquired or from Acquired -> Available/
// Acknowledged/Archived.
if (newState == TX_PENDING && this != ACQUIRED) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validation for tx_pending state - it should be after AQUIRED , before acknowledged or archiving

if (state != RecordState.TX_PENDING) {
return null;
}
if (this.stagedProducerId != producerId || this.stagedProducerEpoch != producerEpoch) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need fencing to ensure only the right producer can confirm.
The confirmation can come from different RPC call from a different broker/server.

topic.partitions().add(partition);
}

TxnShareAcknowledgeRequestData data = new TxnShareAcknowledgeRequestData()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TxnShareAcknowledgeRequest will be sent from produer at client side with all producer deails

@github-actions github-actions Bot removed the triage PRs from the community label May 24, 2026
for (AcknowledgementBatch b : entry.getValue()) {
for (byte ackType : b.acknowledgeTypes()) {
batches.add(new TxnShareAcknowledgeBatch()
.setFirstOffset(b.firstOffset())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(firstOffset, lastOffset, ackType) triples -> this will help in pointing to records which is already on the broker's log.

"(currentState= " + currentState + ")");
}

TxnRequestHandler handler;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same implementation as sendOffsetsToTransaction() transactionManager.sendOffsetsToTransaction(offsets, groupMetadata);

@Shekharrajak
Copy link
Copy Markdown
Contributor Author

Drafting txn Coord side of the changes.

@github-actions github-actions Bot added the KIP-932 Queues for Kafka label May 24, 2026
Map<TopicIdPartition, CompletableFuture<Throwable>> futures = new HashMap<>();
acknowledgeTopics.forEach((topicIdPartition, acknowledgePartitionBatches) -> {
SharePartitionKey sharePartitionKey = sharePartitionKey(groupId, topicIdPartition);
SharePartition sharePartition = partitionCache.get(sharePartitionKey);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per-partition dispatch

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sharePartition.stageTxnAcknowledge(memberId, producerId, producerEpoch, acknowledgePartitionBatches)


public void applyTxnMarker(long producerId, short producerEpoch, TransactionResult result) {
log.debug("Broadcasting txn marker producerId={} epoch={} result={}", producerId, producerEpoch, result);
partitionCache.values().forEach(sp -> sp.applyTxnMarker(producerId, producerEpoch, result));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterates every SharePartition on this broker, each iterates its InFlightStates, each checks (state == TX_PENDING && stagedProducerId == producerId && stagedProducerEpoch == epoch) and apply the marker for all the share partitions.

This can be improved but InFlightState.applyTxnMarker is O(1) operation

var skippedMarkers = 0
for (marker <- markers.asScala) {
val producerId = marker.producerId
sharePartitionManager.applyTxnMarker(producerId, marker.producerEpoch, marker.transactionResult)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After auth check for each marker calling share parititon manager - since share partition manager is responsible for finding share partition and apply

) {
for (Map.Entry<Long, InFlightBatch> entry : subMap.entrySet()) {
InFlightBatch inFlightBatch = entry.getValue();
if (inFlightBatch.lastOffset() < startOffset) continue;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below startOffset - already acknowledged

return future;
}

private Throwable stageBatchTxnRecords(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return throwable : signalling pattern

}

throwable = stageBatchTxnRecords(memberId, producerId, producerEpoch, batch, ackTypeMap, subMap);
if (throwable != null) break;
Copy link
Copy Markdown
Contributor Author

@Shekharrajak Shekharrajak May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break if there is one stage txn failed with any exception - if previous ones are success, then those updated with txn_pending state but current txn batch not.

transaction will abort and applyTxnMarker(ABORT) will revert them to AVAILABLE both batch ( batch 1, which was succeed and batch 2 which got failed in some step - and after the txn timeout it will also be aborted)

# user code : 
try {
    producer.sendShareAcknowledgementsToTransaction(acks, shareGroupMeta);  // ← throws
    producer.commitTransaction();
} catch (KafkaException e) {
    producer.abortTransaction();  // ← this is what reverts the records state back to 
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must have same semantics as KafkaProducer.sendOffsetsToTransaction

Retry contract (catch KafkaException → abortTransaction() → retry loop)
• Distinction between fatal vs abortable vs retriable errors
• Producer reuse semantics after abort

@AndrewJSchofield
Copy link
Copy Markdown
Member

Thanks for the PR.

Personally, I would prefer if there was no PR for such a complicated KIP until it has successfully passed a vote. Without serious review by committers knowledgeable in the areas of share groups and transactions, the KIP would not yet be able to pass a vote. It will need +3 binding votes.

There are many details which need to be agreed by the committers before we are there. For example, we will need to have a new transaction.version and new share.version to ensure that we correctly police these schema changes to the information on the internal metadata topics, and only enable the feature once all brokers in the cluster are ready. We would typically break down the code into many PRs so that the chunks are reviewable with reasonable effort. For example, I would do the feature changes first, then the JSON RPC schemas, and then gradually introduce the actual logic. I suggest KIP-1191 and https://issues.apache.org/jira/browse/KAFKA-19469 as a good example of breaking the code down and introducing it progressively (that's one I'm actively reviewing for AK 4.4).

Finally, we would not accept a KIP like this directly into Apache Kafka as a GA feature without some kind of Early Access or Preview release, so the feature bump also allows us to pace its enablement with the successful completion of an extended period of system testing.

My suggestion would be to close the PR, concentrate on fleshing out the KIP and building alignment with committers.

@Shekharrajak Shekharrajak changed the title KAFKA-19883: KIP-1289 — Transactional acknowledgments for share groups (state machine + client API + wire schema) KAFKA-19883: [DRAFT] KIP-1289 — Transactional acknowledgments for share groups (state machine + client API + wire schema) May 24, 2026
@Shekharrajak Shekharrajak marked this pull request as draft May 24, 2026 15:12
@Shekharrajak
Copy link
Copy Markdown
Contributor Author

Hi @AndrewJSchofield,

Thanks for checking. Yes I undertand these points and wanted to start a draft code so that I can share it in discussion thread for better understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants