-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-460] Handle placeholder timeout #231
Conversation
Codecov Report
@@ Coverage Diff @@
## master #231 +/- ##
==========================================
- Coverage 59.75% 58.69% -1.06%
==========================================
Files 35 35
Lines 3133 3196 +63
==========================================
+ Hits 1872 1876 +4
- Misses 1180 1237 +57
- Partials 81 83 +2
Continue to review full report at Codecov.
|
I still have to cover the changes with unit tests. Before I start to write them, @wilfred-s , @yangwwei can you please check briefly the patch if you agree with this approach? |
pkg/callback/scheduler_callback.go
Outdated
if updated.State == events.States().Application.Killed { | ||
//TODO: implement the killed event | ||
ev := cache.NewFailApplicationEvent(updated.ApplicationID) | ||
dispatcher.Dispatch(ev) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kingamarton how can we support a "Kill" application event?
I am not fully understanding what will happen behind this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, we will fail the application, also the placeholders will be removed. Since the app will be failed, it will be skipped from the next scheduling cycles.
if updated.State == events.States().Application.Killed { | ||
ev := cache.NewFailApplicationEvent(updated.ApplicationID) | ||
dispatcher.Dispatch(ev) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we fail an application, we need to expose some pod level events to indicate this issue.
we can do this in a follow up JIRA.
const AnnotationTaskGroupName = "yunikorn.apache.org/task-group-name" | ||
const AnnotationTaskGroups = "yunikorn.apache.org/task-groups" | ||
const AnnotationSchedulingPolicyParam = "yunikorn.apache.org/schedulingPolicyParameters" | ||
const SchedulingPolicyTimeoutParam = "placeholderTimeout" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it implies the unit for placeholderTimeout
is seconds, I think we should have this declared explicitly in the parameter, otherwise, we will need additional docs to explain the format. Suggest to rename this to placeholderTimeoutInSeconds
Overall the changes looked good, +1. |
Add a configurable option in the scheduling policy parameters "placeholderTimeout" to handle the placeholder timeout. The default value if not given is 15 minutes before cleaning up the placeholders created by the scheduler.
Add a configurable option in the scheduling policy parameters "placeholderTimeout" to handle the placeholder timeout. The default value if not given is 15 minutes before cleaning up the placeholders created by the scheduler.
No description provided.