Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-23443: LLAP speculative task pre-emption seems to be not working #1012

Closed
wants to merge 3 commits into from

Conversation

prasanthj
Copy link
Contributor

I think after HIVE-23210 we are getting a stable sort order and it is causing pre-emption to not work in certain cases.

"attempt_1589167813851_0000_119_01_000008_0 (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started at 2020-05-11 05:59:22, in preemption queue, can finish)", 
"attempt_1589167813851_0008_84_01_000008_1 (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started at 2020-05-11 06:00:23, in preemption queue, can finish)" 

Scheduler only peek's at the pre-emption queue and looks at whether it is non-finishable. 

https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420

In the above case, all tasks are speculative but state change is not triggering pre-emption queue re-ordering so peek() always returns canFinish task even though non-finishable tasks are in the queue. 

if (isGuaranteed) {
// removals are idempotent, so no other state checks are required
removeFromPreemptionQueue(taskWrapper);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can be further simplified to removeFromPreemptionQueue and if (! isGuaranteed)
The comments should remain though :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would also avoid the speculative wording as it could cause further confusion. I believe preemptable should be fine in this context

if (newFinishableState) {
// removals are idempotent, so no other state checks are required
removeFromPreemptionQueue(taskWrapper);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

assertNotNull(taskWrapper);
assertTrue(taskWrapper.isInPreemptionQueue());
assertEquals(fragmentId2, taskWrapper.getRequestId());
assertEquals(1, taskExecutorService.preemptionQueue.size());
Copy link
Contributor

@pgaref pgaref May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we assume that r1 would be running now? and not be either in the wait or preemptonQ
Can we check this?

} else {
// if guaranteed task, if the finishable state changed to non-finishable and if the task doesn't exist
// pre-emption queue, then add it so that it becomes candidate to kill
taskWrapper.updateCanFinishForPriority(newFinishableState);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a case where we have a Guaranteed task that changes from non-finishable to finishable and is only part of the preemptionQueue?
Under that scenario our code (and the old code) would remove it from preemptionQ and it would not be part of any other Q.

From the code below it seems that this can indeed happen:
https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L776

Copy link
Contributor Author

@prasanthj prasanthj May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-finishable -> Finishable does not have to be pre-emption queue. This could be in wait queue (if not capacity) or taken by executor to run both of which are fine.
You brought up good point, we may be adding the same task fragment to pre-emption queue twice. I will add a "if not exists" check when adding to pre-emption queue.

Copy link
Contributor

@pgaref pgaref May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prasanthj thanks for fixing this! Patch looks good and is now committed!
At some point I would also change the first comment of the method to clarify that a task that is both Guaranteed and Finishable should never be in the preemption queue.
https://github.com/apache/hive/pull/1012/files#diff-16658bf15468ecd089c4fd32e75fa8b2R876

I believe there is value in documenting and describing how the scheduler works (started keeping some noted but happy to add more work). I personally find the information in this area of the project very limited.

@github-actions
Copy link

github-actions bot commented Sep 1, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Sep 1, 2020
@github-actions github-actions bot closed this Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants