[YUNIKORN-460] Handle placeholder timeout #231

kingamarton · 2021-02-15T18:20:23Z

No description provided.

codecov · 2021-02-19T11:41:58Z

Codecov Report

Merging #231 (8e31d83) into master (c47ed51) will decrease coverage by 1.05%.
The diff coverage is 42.27%.

@@            Coverage Diff             @@
##           master     #231      +/-   ##
==========================================
- Coverage   59.75%   58.69%   -1.06%     
==========================================
  Files          35       35              
  Lines        3133     3196      +63     
==========================================
+ Hits         1872     1876       +4     
- Misses       1180     1237      +57     
- Partials       81       83       +2

Impacted Files	Coverage Δ
pkg/appmgmt/appmgmt_recovery.go	`67.50% <0.00%> (-8.18%)`	⬇️
pkg/cache/amprotocol_mock.go	`0.00% <0.00%> (ø)`
pkg/cache/task.go	`70.40% <ø> (-4.00%)`	⬇️
pkg/common/resource.go	`90.72% <0.00%> (-9.28%)`	⬇️
pkg/common/utils/gang_utils.go	`67.94% <0.00%> (-13.59%)`	⬇️
pkg/controller/application/app_controller.go	`71.05% <ø> (-0.26%)`	⬇️
...missioncontrollers/webhook/admission_controller.go	`33.74% <0.00%> (+1.00%)`	⬆️
pkg/cache/application_events.go	`43.33% <8.33%> (-9.73%)`	⬇️
pkg/cache/application.go	`72.57% <62.50%> (-4.17%)`	⬇️
pkg/common/si_helper.go	`63.15% <80.00%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b48c35c...8e31d83. Read the comment docs.

kingamarton · 2021-02-19T13:01:49Z

I still have to cover the changes with unit tests. Before I start to write them, @wilfred-s , @yangwwei can you please check briefly the patch if you agree with this approach?

This reverts commit 9b78f37.

…-k8shim into YUNIKORN-460

pkg/common/utils/gang_utils.go

yangwwei · 2021-03-01T06:01:51Z

pkg/callback/scheduler_callback.go

+			if updated.State == events.States().Application.Killed {
+				//TODO: implement the killed event
+				ev := cache.NewFailApplicationEvent(updated.ApplicationID)
+				dispatcher.Dispatch(ev)
+			}


@kingamarton how can we support a "Kill" application event?
I am not fully understanding what will happen behind this.

As we discussed, we will fail the application, also the placeholders will be removed. Since the app will be failed, it will be skipped from the next scheduling cycles.

pkg/common/constants/constants.go

…-k8shim into YUNIKORN-460

yangwwei · 2021-03-08T18:09:55Z

pkg/callback/scheduler_callback.go

+			if updated.State == events.States().Application.Killed {
+				ev := cache.NewFailApplicationEvent(updated.ApplicationID)
+				dispatcher.Dispatch(ev)
+			}


When we fail an application, we need to expose some pod level events to indicate this issue.
we can do this in a follow up JIRA.

yangwwei · 2021-03-08T18:31:09Z

pkg/common/constants/constants.go

+const AnnotationTaskGroupName = "yunikorn.apache.org/task-group-name"
+const AnnotationTaskGroups = "yunikorn.apache.org/task-groups"
+const AnnotationSchedulingPolicyParam = "yunikorn.apache.org/schedulingPolicyParameters"
+const SchedulingPolicyTimeoutParam = "placeholderTimeout"


Looks like it implies the unit for placeholderTimeout is seconds, I think we should have this declared explicitly in the parameter, otherwise, we will need additional docs to explain the format. Suggest to rename this to placeholderTimeoutInSeconds

yangwwei · 2021-03-08T22:48:58Z

Overall the changes looked good, +1.
For the remaining review comments, I have created several follow up JIRAs under https://issues.apache.org/jira/browse/YUNIKORN-553.

Add a configurable option in the scheduling policy parameters "placeholderTimeout" to handle the placeholder timeout. The default value if not given is 15 minutes before cleaning up the placeholders created by the scheduler.

[YUNIKORN-406] Handle placeholder timeout

9b78f37

kingamarton changed the title ~~[YUNIKORN-406] Handle placeholder timeout~~ [YUNIKORN-460] Handle placeholder timeout Feb 15, 2021

[YUNIKORN-460] Small fixes

84bf01f

kingamarton self-assigned this Feb 19, 2021

kingamarton requested review from yangwwei and wilfred-s and removed request for yangwwei February 19, 2021 13:00

kingamarton added 4 commits February 26, 2021 12:42

Revert "[YUNIKORN-406] Handle placeholder timeout"

817049c

This reverts commit 9b78f37.

Merge branch 'master' of https://github.com/apache/incubator-yunikorn…

44ac1bd

…-k8shim into YUNIKORN-460

Handled completed apps

f8ca96f

Handle placeholder timeout

6f404df

yangwwei reviewed Mar 1, 2021

View reviewed changes

kingamarton added 3 commits March 8, 2021 12:26

Merge branch 'master' of https://github.com/apache/incubator-yunikorn…

a06c325

…-k8shim into YUNIKORN-460

Addressed review comments

f358a19

Core and interface dependency update

8e31d83

kingamarton marked this pull request as ready for review March 8, 2021 17:49

yangwwei reviewed Mar 8, 2021

View reviewed changes

yangwwei approved these changes Mar 8, 2021

View reviewed changes

yangwwei merged commit 8f15278 into apache:master Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YUNIKORN-460] Handle placeholder timeout #231

[YUNIKORN-460] Handle placeholder timeout #231

kingamarton commented Feb 15, 2021

codecov bot commented Feb 19, 2021 •

edited

Loading

kingamarton commented Feb 19, 2021

yangwwei Mar 1, 2021

kingamarton Mar 8, 2021

yangwwei Mar 8, 2021

yangwwei Mar 8, 2021

yangwwei commented Mar 8, 2021

[YUNIKORN-460] Handle placeholder timeout #231

[YUNIKORN-460] Handle placeholder timeout #231

Conversation

kingamarton commented Feb 15, 2021

codecov bot commented Feb 19, 2021 • edited Loading

Codecov Report

kingamarton commented Feb 19, 2021

yangwwei Mar 1, 2021

Choose a reason for hiding this comment

kingamarton Mar 8, 2021

Choose a reason for hiding this comment

yangwwei Mar 8, 2021

Choose a reason for hiding this comment

yangwwei Mar 8, 2021

Choose a reason for hiding this comment

yangwwei commented Mar 8, 2021

codecov bot commented Feb 19, 2021 •

edited

Loading