Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-460] Handle placeholder timeout #231

Merged
merged 9 commits into from
Mar 8, 2021

Conversation

kingamarton
Copy link
Contributor

No description provided.

@kingamarton kingamarton changed the title [YUNIKORN-406] Handle placeholder timeout [YUNIKORN-460] Handle placeholder timeout Feb 15, 2021
@codecov
Copy link

codecov bot commented Feb 19, 2021

Codecov Report

Merging #231 (8e31d83) into master (c47ed51) will decrease coverage by 1.05%.
The diff coverage is 42.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #231      +/-   ##
==========================================
- Coverage   59.75%   58.69%   -1.06%     
==========================================
  Files          35       35              
  Lines        3133     3196      +63     
==========================================
+ Hits         1872     1876       +4     
- Misses       1180     1237      +57     
- Partials       81       83       +2     
Impacted Files Coverage Δ
pkg/appmgmt/appmgmt_recovery.go 67.50% <0.00%> (-8.18%) ⬇️
pkg/cache/amprotocol_mock.go 0.00% <0.00%> (ø)
pkg/cache/task.go 70.40% <ø> (-4.00%) ⬇️
pkg/common/resource.go 90.72% <0.00%> (-9.28%) ⬇️
pkg/common/utils/gang_utils.go 67.94% <0.00%> (-13.59%) ⬇️
pkg/controller/application/app_controller.go 71.05% <ø> (-0.26%) ⬇️
...missioncontrollers/webhook/admission_controller.go 33.74% <0.00%> (+1.00%) ⬆️
pkg/cache/application_events.go 43.33% <8.33%> (-9.73%) ⬇️
pkg/cache/application.go 72.57% <62.50%> (-4.17%) ⬇️
pkg/common/si_helper.go 63.15% <80.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b48c35c...8e31d83. Read the comment docs.

@kingamarton kingamarton self-assigned this Feb 19, 2021
@kingamarton kingamarton requested review from yangwwei and wilfred-s and removed request for yangwwei February 19, 2021 13:00
@kingamarton
Copy link
Contributor Author

I still have to cover the changes with unit tests. Before I start to write them, @wilfred-s , @yangwwei can you please check briefly the patch if you agree with this approach?

pkg/common/utils/gang_utils.go Outdated Show resolved Hide resolved
Comment on lines 153 to 157
if updated.State == events.States().Application.Killed {
//TODO: implement the killed event
ev := cache.NewFailApplicationEvent(updated.ApplicationID)
dispatcher.Dispatch(ev)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kingamarton how can we support a "Kill" application event?
I am not fully understanding what will happen behind this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, we will fail the application, also the placeholders will be removed. Since the app will be failed, it will be skipped from the next scheduling cycles.

pkg/common/constants/constants.go Outdated Show resolved Hide resolved
@kingamarton kingamarton marked this pull request as ready for review March 8, 2021 17:49
Comment on lines +153 to +156
if updated.State == events.States().Application.Killed {
ev := cache.NewFailApplicationEvent(updated.ApplicationID)
dispatcher.Dispatch(ev)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we fail an application, we need to expose some pod level events to indicate this issue.
we can do this in a follow up JIRA.

const AnnotationTaskGroupName = "yunikorn.apache.org/task-group-name"
const AnnotationTaskGroups = "yunikorn.apache.org/task-groups"
const AnnotationSchedulingPolicyParam = "yunikorn.apache.org/schedulingPolicyParameters"
const SchedulingPolicyTimeoutParam = "placeholderTimeout"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it implies the unit for placeholderTimeout is seconds, I think we should have this declared explicitly in the parameter, otherwise, we will need additional docs to explain the format. Suggest to rename this to placeholderTimeoutInSeconds

@yangwwei
Copy link
Contributor

yangwwei commented Mar 8, 2021

Overall the changes looked good, +1.
For the remaining review comments, I have created several follow up JIRAs under https://issues.apache.org/jira/browse/YUNIKORN-553.

@yangwwei yangwwei merged commit 8f15278 into apache:master Mar 8, 2021
yangwwei pushed a commit that referenced this pull request Mar 8, 2021
Add a configurable option in the scheduling policy parameters "placeholderTimeout" to handle
the placeholder timeout. The default value if not given is 15 minutes before cleaning up the
placeholders created by the scheduler.
craigcondit pushed a commit to craigcondit/yunikorn-k8shim that referenced this pull request May 10, 2022
Add a configurable option in the scheduling policy parameters "placeholderTimeout" to handle
the placeholder timeout. The default value if not given is 15 minutes before cleaning up the
placeholders created by the scheduler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants