Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-1526] support K8s pod overhead #520

Closed
wants to merge 3 commits into from

Conversation

zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Feb 3, 2023

What is this PR for?

With K8s 1.24 a pod can now provide an overhead in the pod spec: pod.Spec.Overhead. The pod overhead allows specifying an overhead based on the runtime set on the pod. For certain runtimes this overhead can be large. See this KEP for details.

The scheduler should take into account this overhead if set on a pod. We currently calculate the size of the pod based on the containers but do not take the overhead into account. That needs to change.

We need to take the overhead into account as part of scheduling and quota calculations: include the pod.Spec.Overhead resources in the size of the pod before sending it to the core. Overhead only support cpu and memory and is added to the requests. The plugin framework (node related checks) calculates the pod size each time it is called (overhead!) and includes the overhead. The callback for the predicates must take that into account and not be broken by that implementation.

Adding clearly how we have calculated the overall pod size in the logging is a requirement.

Note from @wilfred-s :
The field is optional and will not be set in any pod, unless the cluster has been setup with it, even if you run 1.24 or later.
It is not a user managed field it is set by an admission controller. You cannot set the field on a pod when you create the pod. The field has been part of the offical pod spec since 1.16, and marked as a beta field since 1.18. It was only moved to GA in 1.24.
We need to handle it as an optional field. Even for a release that has the feature as GA the field can be a nil

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@codecov
Copy link

codecov bot commented Feb 3, 2023

Codecov Report

Merging #520 (234f047) into master (a9cdd44) will increase coverage by 0.17%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #520      +/-   ##
==========================================
+ Coverage   69.41%   69.59%   +0.17%     
==========================================
  Files          45       45              
  Lines        7714     7715       +1     
==========================================
+ Hits         5355     5369      +14     
+ Misses       2162     2151      -11     
+ Partials      197      195       -2     
Impacted Files Coverage Δ
pkg/common/resource.go 82.29% <100.00%> (+0.87%) ⬆️
pkg/dispatcher/dispatcher.go 74.82% <0.00%> (-1.40%) ⬇️
pkg/common/si_helper.go 97.60% <0.00%> (+0.04%) ⬆️
pkg/cache/context.go 43.03% <0.00%> (+0.61%) ⬆️
pkg/cache/task.go 68.81% <0.00%> (+1.00%) ⬆️
pkg/cache/nodes.go 81.81% <0.00%> (+2.08%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -191,6 +191,19 @@ func TestParsePodResource(t *testing.T) {
assert.Equal(t, res.Resources[siCommon.CPU].GetValue(), int64(3000))
assert.Equal(t, res.Resources["nvidia.com/gpu"].GetValue(), int64(5))

// Add pod OverHead, only support CPU and Memory
Copy link
Contributor

@0yukali0 0yukali0 Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zhuqi-lucas , the getResource function would parse the resource which includes cpu, memory, gpu and other else.
Should we add filtering function to specific resource type?
Or adding a UT case which overhead contains different type of resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in the KEP says that there is a limit on the type of resource that can be added. That means we need to follow the second option and extend the UT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wilfred-s and @0yukali0 for review, sure, addressed in latest PR.

podOverHeadResource := getResource(pod.Spec.Overhead)
podResource = Add(podResource, podOverHeadResource)
// Logging the overall pod size and pod overhead
log.Logger().Debug("We have calculated the overall pod size which includes the pod overhead",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text in the message with "we": who is "we"?
Make it simple and factual, include a pod reference in the message as this is not traceable to anything:

  log.Logger().Debug("Pod overhead specified, overall pod size adjusted",
  zap.String("taskID", pod.UID),
  zap.String("Pod overall size", podResource.String()),
  zap.String("Pod overhead size", podOverHeadResource.String()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion!
Addressed in latest PR.

@zhuqi-lucas zhuqi-lucas requested review from wilfred-s and 0yukali0 and removed request for wilfred-s and 0yukali0 February 6, 2023 08:27
Copy link
Contributor

@craigcondit craigcondit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit.

pkg/common/resource.go Show resolved Hide resolved
@zhuqi-lucas zhuqi-lucas requested review from craigcondit and removed request for wilfred-s February 13, 2023 03:49
Copy link
Contributor

@craigcondit craigcondit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants