Skip to content

KAFKA-20505: Moving future completion out of locks for Share Partition#22126

Merged
apoorvmittal10 merged 1 commit intoapache:trunkfrom
apoorvmittal10:KAFKA-20505
Apr 23, 2026
Merged

KAFKA-20505: Moving future completion out of locks for Share Partition#22126
apoorvmittal10 merged 1 commit intoapache:trunkfrom
apoorvmittal10:KAFKA-20505

Conversation

@apoorvmittal10
Copy link
Copy Markdown
Contributor

@apoorvmittal10 apoorvmittal10 commented Apr 23, 2026

As explianed in KAFKA-20505, there can be a deadlock when future is
completed for the request where next set of actions tries to attain lock
on purgatory (checkAndComplete/trigger waiting requests). As the lock
might not always be released hence a deadlock can happen. The PR moves
such futures out of the lock.

I have also reviewed other future completions and doesn't seems we need
other changes.

I have tested using franz-go Kafka test and can't reproduce the issues
in 160 continuous runs. Earlier the issue was reproducible between 20-50
consecutive runs.

=== Run 160 ===
=== RUN   TestShareGroupETL
=== PAUSE TestShareGroupETL
=== CONT  TestShareGroupETL
[09:59:38.788 1][INFO] producing to a new topic for the first time,
fetching metadata to learn its partitions; topic:
f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e
...
...
[09:59:43.94 3][INFO] immediate metadata update triggered; why: forced
load because we are producing to a topic for the first time
[09:59:43.947 3][INFO] done waiting for metadata for new topic; topic:
6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9
    share_test.go:507: level 1 phase 2: adding consumers after 122923
consumed
[09:59:44.225 3][INFO] flushing
[09:59:44.225 4][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
...
...
[09:59:44.226 5][INFO] beginning to manage the share group lifecycle;
group: 0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35
[09:59:44.226 10][INFO] beginning to manage the share group lifecycle;
group: 0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35
[09:59:44.227 3][INFO] leaving share group; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
member_id: LP4lpqzQjAm-QxQdCRkSXA==
[09:59:44.227 7][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments: map[]
[09:59:44.227 4][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments: map[]
...
...
[09:59:49.232 7][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments:
map[f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e:[1
2]]
[09:59:49.232 6][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments:
map[f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e:[1]]
...
...
[09:59:49.466 18][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:49.466 13][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:49.466 15][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:49.467 16][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:49.467 14][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:49.467 11][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
...
...
[09:59:49.467 15][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:49.468 13][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments: map[]
[09:59:49.469 17][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments: map[]
...
...
[09:59:54.472 18][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments:
map[5da63960cb4fe97d887d575a73dfbddc51a2eb8071d119b3a5ba5a2b0d87bc7e:[1]]
[09:59:54.485 14][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments:
map[6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9:[0]]
...
...
[09:59:54.494 12][INFO] metadata update triggered; why: reload trigger
due to produce topic still not known
[09:59:54.495 12][INFO] producer id initialization success; id: 3524,
epoch: 0
[09:59:54.5 13][INFO] producing to a new topic for the first time,
fetching metadata to learn its partitions; topic:
6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9
[09:59:54.5 13][INFO] immediate metadata update triggered; why: forced
load because we are producing to a topic for the first time
...
...
[09:59:54.525 11][INFO] leaving share group; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
member_id: _4Jk9faoDdUlGMlSR9zKmg==
    share_test.go:605: level 2 rebalance 1: killing l2-c1 after 169339
consumed
[09:59:55.101 14][INFO] flushing
[09:59:55.101 19][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:55.102 19][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:55.103 14][INFO] leaving share group; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
member_id: wJibgxG934tiAuqCPloF_w==
[09:59:55.107 19][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments: map[]
    share_test.go:619: level 2 rebalance 2: killing l2-c3 after 375726
consumed
[09:59:55.401 18][INFO] flushing
...
...
[10:00:00.915 20][INFO] leaving share group; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
member_id: HjzG4-5QwncfYfr8pSmEUQ==
    share_test.go:377: level 1: 499900 unique keys, 500624 total
accepts, 500624 produced, 724 duplicates, 35614 redelivered, max dc 3,
consumed 532987
    share_test.go:377: level 2: 499900 unique keys, 501513 total
accepts, 501513 produced, 1613 duplicates, 20272 redelivered, max dc 2,
consumed 518049
    share_test.go:704: level 1: 100 purely rejected, 35614 redelivered
    share_test.go:60: deleting topic
f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e
    share_test.go:61: deleting topic
f7e388a2de7ef0814328f9186e8c4b73b1f2437490e1b98730af9fb17ee74175
    share_test.go:62: deleting topic
5da63960cb4fe97d887d575a73dfbddc51a2eb8071d119b3a5ba5a2b0d87bc7e
    share_test.go:63: deleting topic
6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9
    share_test.go:64: deleting topic
7e74eb054cbb02e0de5da8a8018115dc01094496222039f323841770b11b8a12
    share_test.go:65: deleting topic
4b8c44d4071cd22272ae9ac694342faa3404bd10b479fe88874bdef4a8a4276d
--- PASS: TestShareGroupETL (22.73s)
PASS
ok      github.com/twmb/franz-go/pkg/kgo        22.926s

Reviewers: Andrew Schofield aschofield@confluent.io, Chia-Ping Tsai
chia7712@gmail.com, PoAn Yang payang@apache.org

@github-actions github-actions Bot added the triage PRs from the community label Apr 23, 2026
@github-actions github-actions Bot added core Kafka Broker KIP-932 Queues for Kafka small Small PRs labels Apr 23, 2026
@apoorvmittal10 apoorvmittal10 added ci-approved and removed small Small PRs triage PRs from the community labels Apr 23, 2026
Copy link
Copy Markdown
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Looks good to me.

@apoorvmittal10
Copy link
Copy Markdown
Contributor Author

@FrankYang0529 As you are the release manager for 4.2.1, I have to open a cherry-pick for 4.2.1 release as this issue is a blocker. Once merged in trunk will open PR for 4.2, also will merge in 4.3.

@FrankYang0529
Copy link
Copy Markdown
Member

@FrankYang0529 As you are the release manager for 4.2.1, I have to open a cherry-pick for 4.2.1 release as this issue is a blocker. Once merged in trunk will open PR for 4.2, also will merge in 4.3.

Thank you. I will prepare another RC after this is backport to 4.2 branch.

Copy link
Copy Markdown
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've left a suggestion that could be handled in a follow-up.

lock.writeLock().lock();
try {
if (throwable != null) {
if (throwable != null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer to return a chained CompletableFuture rather than manually completing an explicitly created one, as calling complete() under a lock is highly error-prone. For example, rollbackOrProcessStateUpdates could be significantly streamlined with this chaining approach.

    CompletableFuture<Void> rollbackOrProcessStateUpdates(
        Throwable throwable,
        List<PersisterBatch> persisterBatches
    ) {
        if (throwable != null) {
            lock.writeLock().lock();
            try {
              ...
                });
            } finally {
                lock.writeLock().unlock();
            }
            return CompletableFuture.failedFuture(throwable);
        }

        if (persisterBatches.isEmpty()) {
            return CompletableFuture.completedFuture(null);
        }
        ...
        return writeShareGroupState(persisterBatches.stream().map(PersisterBatch::stateBatch).toList())
            .whenComplete((result, exception) -> {

Another benefit is that it naturally propagates any exceptions thrown inside the whenComplete block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 Valid suggestion. I wanted to keep the changes minimal for the blocker. Shall I do the chaining only in trunk so we have limited scope of testing in 4.2.1 and 4.3.0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created following ticket and assigned to myself. It will be benfecial and can be well tested in trunk prior 4.4.0 release as it's code improvement: https://issues.apache.org/jira/browse/KAFKA-20521. Please let me know if you think it's fine to have separately.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, you mentioned already that this can be a follow-up then I shall merge this commit and cherry-pick. Then shall open follow-up.

@apoorvmittal10 apoorvmittal10 merged commit e1cfea3 into apache:trunk Apr 23, 2026
33 of 34 checks passed
apoorvmittal10 added a commit to apoorvmittal10/kafka that referenced this pull request Apr 23, 2026
apache#22126)

As explianed in KAFKA-20505, there can be a deadlock when future is
completed for the request where next set of actions tries to attain lock
on purgatory (checkAndComplete/trigger waiting requests). As the lock
might not always be released hence a deadlock can happen. The PR moves
such futures out of the lock.

I have also reviewed other future completions and doesn't seems we need
other changes.

I have tested using franz-go Kafka test and can't reproduce the issues
in 160 continuous runs. Earlier the issue was reproducible between 20-50
consecutive runs.

```
=== Run 160 ===
=== RUN   TestShareGroupETL
=== PAUSE TestShareGroupETL
=== CONT  TestShareGroupETL
[09:59:38.788 1][INFO] producing to a new topic for the first time,
fetching metadata to learn its partitions; topic:
f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e
...
...
[09:59:43.94 3][INFO] immediate metadata update triggered; why: forced
load because we are producing to a topic for the first time
[09:59:43.947 3][INFO] done waiting for metadata for new topic; topic:
6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9
    share_test.go:507: level 1 phase 2: adding consumers after 122923
consumed
[09:59:44.225 3][INFO] flushing
[09:59:44.225 4][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
...
...
[09:59:44.226 5][INFO] beginning to manage the share group lifecycle;
group: 0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35
[09:59:44.226 10][INFO] beginning to manage the share group lifecycle;
group: 0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35
[09:59:44.227 3][INFO] leaving share group; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
member_id: LP4lpqzQjAm-QxQdCRkSXA==
[09:59:44.227 7][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments: map[]
[09:59:44.227 4][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments: map[]
...
...
[09:59:49.232 7][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments:
map[f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e:[1
2]]
[09:59:49.232 6][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments:
map[f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e:[1]]
...
...
[09:59:49.466 18][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:49.466 13][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:49.466 15][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:49.467 16][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:49.467 14][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:49.467 11][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
...
...
[09:59:49.467 15][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:49.468 13][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments: map[]
[09:59:49.469 17][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments: map[]
...
...
[09:59:54.472 18][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments:
map[5da63960cb4fe97d887d575a73dfbddc51a2eb8071d119b3a5ba5a2b0d87bc7e:[1]]
[09:59:54.485 14][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments:
map[6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9:[0]]
...
...
[09:59:54.494 12][INFO] metadata update triggered; why: reload trigger
due to produce topic still not known
[09:59:54.495 12][INFO] producer id initialization success; id: 3524,
epoch: 0
[09:59:54.5 13][INFO] producing to a new topic for the first time,
fetching metadata to learn its partitions; topic:
6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9
[09:59:54.5 13][INFO] immediate metadata update triggered; why: forced
load because we are producing to a topic for the first time
...
...
[09:59:54.525 11][INFO] leaving share group; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
member_id: _4Jk9faoDdUlGMlSR9zKmg==
    share_test.go:605: level 2 rebalance 1: killing l2-c1 after 169339
consumed
[09:59:55.101 14][INFO] flushing
[09:59:55.101 19][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:55.102 19][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:55.103 14][INFO] leaving share group; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
member_id: wJibgxG934tiAuqCPloF_w==
[09:59:55.107 19][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments: map[]
    share_test.go:619: level 2 rebalance 2: killing l2-c3 after 375726
consumed
[09:59:55.401 18][INFO] flushing
...
...
[10:00:00.915 20][INFO] leaving share group; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
member_id: HjzG4-5QwncfYfr8pSmEUQ==
    share_test.go:377: level 1: 499900 unique keys, 500624 total
accepts, 500624 produced, 724 duplicates, 35614 redelivered, max dc 3,
consumed 532987
    share_test.go:377: level 2: 499900 unique keys, 501513 total
accepts, 501513 produced, 1613 duplicates, 20272 redelivered, max dc 2,
consumed 518049
    share_test.go:704: level 1: 100 purely rejected, 35614 redelivered
    share_test.go:60: deleting topic
f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e
    share_test.go:61: deleting topic
f7e388a2de7ef0814328f9186e8c4b73b1f2437490e1b98730af9fb17ee74175
    share_test.go:62: deleting topic
5da63960cb4fe97d887d575a73dfbddc51a2eb8071d119b3a5ba5a2b0d87bc7e
    share_test.go:63: deleting topic
6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9
    share_test.go:64: deleting topic
7e74eb054cbb02e0de5da8a8018115dc01094496222039f323841770b11b8a12
    share_test.go:65: deleting topic
4b8c44d4071cd22272ae9ac694342faa3404bd10b479fe88874bdef4a8a4276d
--- PASS: TestShareGroupETL (22.73s)
PASS
ok      github.com/twmb/franz-go/pkg/kgo        22.926s
```

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai
 <chia7712@gmail.com>, PoAn Yang <payang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-approved core Kafka Broker KIP-932 Queues for Kafka

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants