Skip to content

[improve][broker] Enhanced process logic of PersistentReplicator.cancellationPendingReadTasks#25501

Open
gosonzhang wants to merge 1 commit intoapache:masterfrom
gosonzhang:master-geo-stuck
Open

[improve][broker] Enhanced process logic of PersistentReplicator.cancellationPendingReadTasks#25501
gosonzhang wants to merge 1 commit intoapache:masterfrom
gosonzhang:master-geo-stuck

Conversation

@gosonzhang
Copy link
Copy Markdown

@gosonzhang gosonzhang commented Apr 9, 2026

Motivation

While normally inFlightTasks contains at most one pending read task (guaranteed by hasPendingRead() in acquirePermitsIfNotFetchingSchema()), in exceptional scenarios, multiple tasks with entries == null may occur.

The original implementation only set entries for the last pending task iterated over; the entries of the preceding pending tasks remained null. If canceledPendingRead = true (i.e., the cursor's read has been canceled, and the callback will not occur again), these unset tasks will remain in a pending state indefinitely, causing:
hasPendingRead() to always return true, acquirePermitsIfNotFetchingSchema() to always return null, and copying to be permanently stuck.

The improved implementation sets entries = emptyList() for all pending tasks when canceledPendingRead = true. This ensures that all canceled tasks complete correctly, and no task will remain in a pending state indefinitely.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

gosonzhang#4

@gosonzhang gosonzhang closed this Apr 9, 2026
@gosonzhang gosonzhang reopened this Apr 9, 2026
@gosonzhang
Copy link
Copy Markdown
Author

/pulsarbot rerun-failure-checks

@merlimat merlimat requested a review from poorbarcode April 9, 2026 16:59
@liangyepianzhou
Copy link
Copy Markdown
Contributor

While normally inFlightTasks contains at most one pending read task (guaranteed by hasPendingRead() in acquirePermitsIfNotFetchingSchema()), in exceptional scenarios, multiple tasks with entries == null may occur.

Can you provide a detailed explanation of this scenario and add a unit test to cover it?

@gosonzhang
Copy link
Copy Markdown
Author

gosonzhang commented Apr 10, 2026

While normally inFlightTasks contains at most one pending read task (guaranteed by hasPendingRead() in acquirePermitsIfNotFetchingSchema()), in exceptional scenarios, multiple tasks with entries == null may occur.

Can you provide a detailed explanation of this scenario and add a unit test to cover it?

@liangyepianzhou Improvement to the existing implementation: if multiple tasks with entries == null actually appear in inFlightTasks, the previous implementation would inevitably cause consumption stuck.

In reality, inFlightTasks should not and will not have multiple tasks with entries == null during runtime, so there's no way to construct this scenario.

@gosonzhang
Copy link
Copy Markdown
Author

While normally inFlightTasks contains at most one pending read task (guaranteed by hasPendingRead() in acquirePermitsIfNotFetchingSchema()), in exceptional scenarios, multiple tasks with entries == null may occur.

Can you provide a detailed explanation of this scenario and add a unit test to cover it?

@liangyepianzhou Improvement to the existing implementation: if multiple tasks with entries == null actually appear in inFlightTasks, the previous implementation would inevitably cause consumption stuck.

In reality, inFlightTasks should not and will not have multiple tasks with entries == null during runtime, so there's no way to construct this scenario.

@liangyepianzhou The relevant tests for pr(#24189) have covered the modified parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants