Skip to content

Ensure that all raced promises are resolved after the race #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 13, 2024

Conversation

milindl
Copy link
Contributor

@milindl milindl commented Nov 11, 2024

Each race spawns a new promise for every promise being raced, and long-lived promises can cause an issue with that.For each promise being raced, it's safe to resolve it immediately after the race if we can guarantee that no one else is waiting on it.There are 3 places of significance when it comes to racing:

  1. checkMaxPollIntervalNotExceeded which races #maxPollIntervalRestart with a timer. This method can only be called on the non-worker async context (ie. only one instance of this method is running). Further, when the promise is being raced here, the same promise is not being raced in any other method (This can be checked by checking the other sites which might await #maxPollIntervalRestart . Only other site is cacheExpirationLoop and these two methods are never called together, as one calls the other.
  2. cacheExpirationLoop : this is racing both #maxPollIntervalRestart and #workerTerminationScheduled. The first one can be resolved safely with the same logic as (1). For #workerTerminationScheduled we can actually remove it from the race. Why? Because #workerTerminationScheduled resolved means that very soon #maxPollIntervalRestart will be resolved too (in the main runInternal loop). Otherwise the timer will expire but then the while loop will check the expiry of #workerTerminationScheduled .
  3. The last one, .#queueNonEmpty is the trickiest because all workers can be awaiting it at the same time, so resolving it unconditionally causes a case where workers wake each other up without anything actually being in the queue. I thought maybe, rather than removing the promise, I will remove the race, and await only the promise. Unfortunately, this makes some workers halt: the fetcher returning messages as null, and the call to nextFetchRetry can be interrupted by queueNonEmptyCb in the middle of them. I thought of a possible way to solve the race condition, but I think my proposed solution is better: to maintain a linked list of waiters. Linked list size is bounded by worker count.

This addresses #150

@milindl milindl requested review from a team as code owners November 11, 2024 13:44
@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

@milindl milindl requested a review from emasab November 11, 2024 13:46
Copy link
Contributor

@emasab emasab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix Milind! Just a few comments

Copy link
Contributor

@emasab emasab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@milindl milindl merged commit 74d84f9 into master Nov 13, 2024
2 checks passed
@milindl milindl deleted the dev_mem_leaks branch November 13, 2024 16:23
@AndreasKlein
Copy link

Hi @milindl is there a plan for when this will be released already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants