New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for multiple sizes huge pages #84051
Implement support for multiple sizes huge pages #84051
Conversation
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some initial thoughts, but otherwise this makes much sense to me. Thanks for your work on this.
The questions about validation are open for discussion, and should probably not be a part of this PR.
Test failures look valid, so those should be addressed. We also have some e2e tests for hugepages, have you verified that they still pass?
Adding hold since this depends on support on node level: #82820
/hold
/sig node |
I think that overall changes in emptyDir.go follow the KEP update as well. Regarding The KEP says that Would it be okay to allow the above case? [Pod Spec]
|
This is an interesting corner case. Volumes are Pod level and right now the sizes of those volumes are calculated by sum of all containers requests/limits for hugepages. In theory, if pod has more than one container and more than one hugetlbfs mounts and not all of those volumemounts are used in all of the containers, we need to change logic of validation and logic of calculating sizes for each of hugepage volume. |
Not sure if I understand what you mean. The "size" of a volume mount of type hugetlbfs is the page size used, and not the amount of memory available. By default a program can use all the pre allocated huge page memory from such a mount, but this is limited via the hugetlb cgroup. How to handle multiple containers with multiple hugepage sizes, together with different volumes is maybe something we can incorporate into the KEP, but think it should be ok as is now too. The reason we verify that the page sizes used in the mount is in The problem will arise when a pod use a page size in a volume without having the size in the requests/limits: apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: container
volumeMounts:
- mountPath: /hugepages
name: hugepage
volumes:
- name: hugepage
emptyDir:
medium: HugePages # alternatively HugePages-1Gi This (the example above) can be valid on a given node if it supports 1GiB pages, but we cannot be sure since the scheduler doesn't take huge page support into account when finding a node. This is a case we successfully validates during volume mounting today, but the podspec is still "valid" in the sense that the apiserver will accept it. In this pod we know that the node running the pod support apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: container1
volumeMounts:
- mountPath: /hugepages
name: hugepage
resources:
requests:
hugepages-1Gi: 2Gi
limits:
hugepages-1Gi: 2Gi
- name: container2
volumeMounts:
- mountPath: /hugepages-1Gi
name: hugepage-1Gi
volumes:
- name: hugepage
emptyDir:
medium: HugePages
- name: hugepage-1Gi
emptyDir:
medium: HugePages-1Gi |
5d89c9a
to
1feb251
Compare
/test pull-kubernetes-e2e-gce-storage-slow |
515630e
to
1a5cad6
Compare
@liggitt > is it expected that pods today could be specifying these new hugepages mediums? Yes, if they need to use multiple huge pages sizes. |
@@ -290,11 +290,11 @@ func (ed *emptyDir) setupHugepages(dir string) error { | |||
} | |||
// If the directory is a mountpoint with medium hugepages, there is no | |||
// work to do since we are already in the desired state. | |||
if isMnt && medium == v1.StorageMediumHugePages { | |||
if isMnt && v1helper.IsHugePageMedium(medium) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously, there could only be a single hugepages size per pod, so all hugepages mounts had to agree. with this PR, that is no longer the case, correct?
6df5aa5
to
54f60a1
Compare
/retest |
54f60a1
to
c38dbb6
Compare
API validation changes lgtm, will defer the rest of the review to node/storage folks. Please assign me once those have lgtm and I'll add approval for the API bits. /unassign |
c38dbb6
to
4430f88
Compare
This implementation allows Pod to request multiple hugepage resources of different size and mount hugepage volumes using storage medium HugePage-<size>, e.g. spec: containers: resources: requests: hugepages-2Mi: 2Mi hugepages-1Gi: 2Gi volumeMounts: - mountPath: /hugepages-2Mi name: hugepage-2mi - mountPath: /hugepages-1Gi name: hugepage-1gi ... volumes: - name: hugepage-2mi emptyDir: medium: HugePages-2Mi - name: hugepage-1gi emptyDir: medium: HugePages-1Gi NOTE: This is an alpha feature. Feature gate HugePageStorageMediumSize must be enabled for it to work.
Co-Authored-By: Odin Ugedal <odin@ugedal.com>
23d1c36
to
03ecc20
Compare
Extended GetMountMedium function to check if hugetlbfs volume is mounted with the page size equal to the medium size. Page size is obtained from the 'pagesize' mount option of the mounted hugetlbfs volume.
/retest |
the kubelet changes still lgtm. /approve assign @liggitt |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bart0sh, derekwaynecarr, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
if len(hugePageResources) > 1 { | ||
allErrs = append(allErrs, field.Invalid(specPath, hugePageResources, "must use a single hugepage size in a pod spec")) | ||
if !opts.AllowMultipleHugePageResources { | ||
allErrs = append(allErrs, ValidatePodSingleHugePageResources(pod, specPath)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if HugePageStorageMediumSize is enabled, should we validate the name and format etc. too? What if user gives an arbitrary string in this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understand correctly, the user could already give an arbitrary string in that field. if that's the case, we cannot easily tighten validation on an existing field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this ValidatePodSingleHugePageResources is for validating whether there are multiple huge page size is specified in the resources. So it makes sense to only validate if feature is disabled.
So one scenario is rollback, if MutipleHuagePageResource is enabled during pod creation, so it sets multiple sizes. If during pod rollback, MutipleHuagePageResource is disabled, it might fail to update pod?
I also checked a few cases, the following can pass validation which seems not right
resources:
limits:
hugepages-xGi: 100Mi
volumes:
- name: hugepage
emptyDir:
medium: abc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If during pod rollback, MutipleHuagePageResource is disabled, it might fail to update pod?
that is addressed here:
kubernetes/pkg/registry/core/pod/strategy.go
Lines 110 to 115 in 5698716
func (podStrategy) ValidateUpdate(ctx context.Context, obj, old runtime.Object) field.ErrorList { | |
oldFailsSingleHugepagesValidation := len(validation.ValidatePodSingleHugePageResources(old.(*api.Pod), field.NewPath("spec"))) > 0 | |
opts := validation.PodValidationOptions{ | |
// Allow multiple huge pages on pod create if feature is enabled or if the old pod already has multiple hugepages specified | |
AllowMultipleHugePageResources: oldFailsSingleHugepagesValidation || utilfeature.DefaultFeatureGate.Enabled(features.HugePageStorageMediumSize), | |
} |
the following can pass validation which seems not right
see discussion in #52936 (comment)
As part of the telco effort we should provide possibility, to use multiple sizes huge pages under the node. Kubernetes supports this feature as alpha under the 1.18, to enable it you should enable feature gate `HugePageStorageMediumSize`, see kubernetes/kubernetes#84051. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
What type of PR is this?
/kind feature
What this PR does / why we need it:
This is an implementation of recently merged update of the hugepages KEP
It was tested on a local cluster with allocated huge pages of two sizes:
With this pod configuration:
Both sizes hugepages where mounted correctly in the container:
Pod level allocations for both huge pages sizes look correct as well:
Does this PR introduce a user-facing change?: