New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-23443: LLAP speculative task pre-emption seems to be not working #1012
Conversation
if (isGuaranteed) { | ||
// removals are idempotent, so no other state checks are required | ||
removeFromPreemptionQueue(taskWrapper); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this can be further simplified to removeFromPreemptionQueue and if (! isGuaranteed)
The comments should remain though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would also avoid the speculative wording as it could cause further confusion. I believe preemptable should be fine in this context
if (newFinishableState) { | ||
// removals are idempotent, so no other state checks are required | ||
removeFromPreemptionQueue(taskWrapper); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
assertNotNull(taskWrapper); | ||
assertTrue(taskWrapper.isInPreemptionQueue()); | ||
assertEquals(fragmentId2, taskWrapper.getRequestId()); | ||
assertEquals(1, taskExecutorService.preemptionQueue.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we assume that r1 would be running now? and not be either in the wait or preemptonQ
Can we check this?
} else { | ||
// if guaranteed task, if the finishable state changed to non-finishable and if the task doesn't exist | ||
// pre-emption queue, then add it so that it becomes candidate to kill | ||
taskWrapper.updateCanFinishForPriority(newFinishableState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be a case where we have a Guaranteed task that changes from non-finishable to finishable and is only part of the preemptionQueue?
Under that scenario our code (and the old code) would remove it from preemptionQ and it would not be part of any other Q.
From the code below it seems that this can indeed happen:
https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L776
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-finishable -> Finishable does not have to be pre-emption queue. This could be in wait queue (if not capacity) or taken by executor to run both of which are fine.
You brought up good point, we may be adding the same task fragment to pre-emption queue twice. I will add a "if not exists" check when adding to pre-emption queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prasanthj thanks for fixing this! Patch looks good and is now committed!
At some point I would also change the first comment of the method to clarify that a task that is both Guaranteed and Finishable should never be in the preemption queue.
https://github.com/apache/hive/pull/1012/files#diff-16658bf15468ecd089c4fd32e75fa8b2R876
I believe there is value in documenting and describing how the scheduler works (started keeping some noted but happy to add more work). I personally find the information in this area of the project very limited.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
I think after HIVE-23210 we are getting a stable sort order and it is causing pre-emption to not work in certain cases.
Scheduler only peek's at the pre-emption queue and looks at whether it is non-finishable.
https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420
In the above case, all tasks are speculative but state change is not triggering pre-emption queue re-ordering so peek() always returns canFinish task even though non-finishable tasks are in the queue.