Kafka Connect: Coordinator's check on commitState.isCommitReady() is inefficient

### Apache Iceberg version

1.10.1 (latest release)

### Query engine

None

### Please describe the bug 🐞

There is a performance problem on Coordinator's check on commit readiness which significantly degrade the system performance when there is a backlog on message processing on control topic.

During each commit cycle, Coordinator reads each DATA_COMPLETE message from the control topic and calling [commitState.isCommitReady()](https://github.com/apache/iceberg/blob/main/kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/channel/Coordinator.java#L142) method to see whether we got all topic partitions represented from those messages.  However [check is done through a loop](https://github.com/apache/iceberg/blob/main/kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/channel/CommitState.java#L117) for all previous messages.  If there are n DATA_COMPLETE messages, this is an O(N^2) calculation.

When everything is smooth, the n is usually bound by the number of workers in the Kafka Connect cluster and the number of source topic partitions each worker needs to process but when there is a backlog building up on the control topic, the things goes spiral down.  Often the backlog buildup started when there was a networking or HiveMetaStore availability issue, the Coordinator has problems committing entries to HMS.  The commit failed and the retry on the next commit cycle needs to process 2n messages from the control topic (because worker still keeps generating).  The inefficient processing of CommitState.isCommitReady() coupled with increased number of Kafka messages to be processed from the control topic cause the next commit cycle more prone to failure.

The fix for this performance issue is simple, we just need to use a map to cache the topic partition names we have seen so far in CommitState.  Once the size of the map reaches the expected count, the commit is ready.  This should be an O(n) calculation.

### Willingness to contribute

- [x] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka Connect: Coordinator's check on commitState.isCommitReady() is inefficient #16361

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Kafka Connect: Coordinator's check on commitState.isCommitReady() is inefficient #16361

Description

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions