Fix leak in activeTasks map in TaskQueue#18031
Merged
kfaraz merged 3 commits intoapache:masterfrom May 24, 2025
Merged
Conversation
8aae114 to
c36b986
Compare
Currently we check only whether the timestamp on the task comes strictly before the current clock value during a syncFromStorage(). This can causes races between startPendingTasksOnRunner() and syncFromStorage() where the latter spuriously updates the clock values for all tasks in activeTasks, preventing syncFromStorage() from making any deletion progress.
c36b986 to
6a93984
Compare
jtuglu1
commented
May 24, 2025
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java
Show resolved
Hide resolved
kfaraz
approved these changes
May 24, 2025
Contributor
kfaraz
left a comment
There was a problem hiding this comment.
LGTM 🚀
Thanks for the fix and the tests, @jtuglu-netflix !
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java
Show resolved
Hide resolved
Contributor
|
@jtuglu-netflix , please let me know if this is good to merge or if you intend to include some more change here. |
Contributor
Author
Nothing more to this PR |
kfaraz
reviewed
May 24, 2025
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java
Outdated
Show resolved
Hide resolved
Contributor
Author
|
@kfaraz all good here? |
Contributor
Yes, merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently we check only whether the timestamp on the task comes strictly before the current clock value during a
syncFromStorage(). This can cause races betweenstartPendingTasksOnRunner()andsyncFromStorage()where the latter spuriously updates the clock values for all tasks in activeTasks, preventingsyncFromStorage()from making any deletion progress. This can cause waiting tasks to continuously increase without GC.Description
removeTaskInternalinstead ofupdateTaskEntrybecause of this being opaque to the caller. I think the invariant that if theupdateOperationis called on an entry ===> itslastUpdatedfield should also be updated makes sense here.getWaitingTaskCountto filter out the completed tasks. This should ensure that the only unknown tasks to the taskRunner are those which have not been placed in its queue (in line with the definition).activeTasksfor better debugging.Release note
Fix task leak in activeTasks map in TaskQueue and fix waiting task metric count
Key changed/added classes in this PR
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.javaThis PR has: