Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resources outside the *kubernetes.io namespace are integers and cannot be over-committed. #48922

Merged

Conversation

ConnorDoyle
Copy link
Contributor

@ConnorDoyle ConnorDoyle commented Jul 14, 2017

What this PR does / why we need it:

Fixes #50473

Rationale: since the scheduler handles all resources except CPU as integers, that could just be the default behavior for namespaced resources.

cc @RenaudWasTaken @vishh

Release note:

Resources outside the `*kubernetes.io` namespace are integers and cannot be over-committed.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 14, 2017
@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jul 14, 2017
@ConnorDoyle
Copy link
Contributor Author

@vishh
Copy link
Contributor

vishh commented Jul 14, 2017

I'd split compute resources into three categories:

  1. First class resources - These resources are expected to be available across all kube deployments with predictable behavior. They typically include special handling across the system. Examples include cpu, memory and storage.
  2. Extended Resources - These resources aren't expected to be available across all kube deployments. These resources require special extensions at the node level, and possibly at the scheduler level too. Examples include acclerators, special NICs, etc.
  3. Opaque Resources - These resources aren't expected to be available across all kube deployments. These resources do not require any extensions across the system. Examples include Software licences.

I intend to introduce a more feature rich Compute Resource representation API in the future, which can allow for representing both integer and fractional resources across all the three categories mentioned above.

tl;dr; I don't think we should deprecate OIRs. We should rather have well defined user journeys for each of these categories and let users choose which one they want to use.
2 and 3 may be the same at the scheduler level, but I'm expecting the User Journeys to expose the differences.
cc @steveperry-53

@ConnorDoyle
Copy link
Contributor Author

@vishh that makes a lot of sense to me. So currently the first-class resources live in the default namespace, also available to without fully-qualifying the name. In your mind would it be another namespace that distinguishes extended resources from opaque resources? IIUC that was the function of the proposed extensions.kubernetes.io/ namespace we talked about.

@ncdc
Copy link
Member

ncdc commented Jul 17, 2017

/unassign
/assign @timothysc @derekwaynecarr @sjenning (this seems up your alley)

@timothysc
Copy link
Member

@ConnorDoyle What's the user story behind this? Resource accounting? I keep going back and forth, but it would help me a lot if you had a concrete user story.

@ConnorDoyle
Copy link
Contributor Author

@timothysc here are some (others please chime in if I missed the mark):

  • As a Kubelet contributor designing the device plugin subsystem, I want a way to reliably distinguish "opaque" resources from "extended" resources, where extended resources require the Kubelet to take isolation/monitoring/cleanup actions, and opaque resources do not.
  • As a device plugin author, I want to advertise new resources for each device managed by my plugin so that users can consume those devices in pod specs.
  • As a Kubernetes pod author, I want to ensure my pod is scheduled on a node that has sufficient devices that my workload needs, and for those devices to be appropriately exposed inside my pod's containers.

The device plugin proposal (linked in description) takes a different approach by extending the node spec to have DeviceCapacity instead of reusing the existing capacity and allocatable maps. So, this is a mini-counter proposal about how device resources are accounted.

func IsOpaqueIntResourceName(name api.ResourceName) bool {
return strings.HasPrefix(string(name), api.ResourceOpaqueIntPrefix)
return IsCustomNamespacedResourceName(name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could alternatively do something like:

func IsIntResourceName(name api.ResourceName) bool {
    return IsOpaqueIntResourceName(name) || IsExtendedResourceName(name)
}

And then update current uses of IsOpaqueIntResourceName() in the scheduler to use IsIntResourceName() instead. This would preserve the existing OIR prefix behavior and let resources with the new "extensions.kubernetes.io/" prefix ride along with the existing scheduler logic for OIRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidopp I like this change in general. It allows for exposing arbitrary compute resources in all domains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishh do you mean you like the patch as written, or the change suggested in this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point in adding more and more special cased prefixes in the API.
I like the current proposal. Any resources without a domain name or with kubernetes.io domain name will be treated as first class resources.
Everything else will be treated as integer resources until we have a better API representation for resources that allows to represent non-integer resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this also include "alpha.kubernetes.io/xxx" or ResourceStorageScratch or ResourceStorageOverlay? Generally I feel it is probably safer to start with:
func IsIntResourceName(name api.ResourceName) bool {
return IsOpaqueIntResourceName(name) || IsExtendedResourceName(name)
}
to make sure we don't break the current resource behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue of extensibility across quota, limitrange, UI, etc., is beyond just the APIs. I'd recommend deferring it since it requires changes to the rest of the system as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiayingz pointed out that extensions.kubernetes.io is also part of the domain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I understand. With the suggested change, alpha.kubernetes.io/*, beta.kubernetes.io/*, kubernetes.io/*, and any resource not of the form xxx/* will be considered a first class resource. All resources of the form xxx/* that does not match alpha.kubernetes.io/*, beta.kubernetes.io/*, or kubernetes.io/* will be considered opaque integer resources (or extended resources, which I admit I haven't read up on, so I don't really know what the desired semantics are for that). Is that correct?

If so, it seems fine to me, assuming it doesn't break any existing resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidopp Any resource that's of the form [xxx.]*.kubernetes.io/ or without a domain name is considered to be first class. So alpha.kubernetes.io or foo.bar.kubernetes.io or storage.kubernetes.io are all first class resources.

Everything else with a domain name that's not kubernetes.io will be considered as extended resource which are opaque to the scheduler and possibly opaque to the node.
To begin with all these extended resources are considered to be integers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems fine. It's really hard to predict how/when a decision like this will bite us later on, but I can't see any reason right now why it might be a problem...

@soltysh soltysh removed their assignment Jul 31, 2017
@mwielgus mwielgus removed their assignment Aug 7, 2017
@vishh
Copy link
Contributor

vishh commented Aug 7, 2017

Other resource APIs like Quota, Limit Range, etc., aren't extensible yet. So this PR will lead to feature requests for Quota and Limit range API extensions @davidopp

@vishh
Copy link
Contributor

vishh commented Aug 7, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2017
// Note: partially-qualified (unprefixed) names are assumed by default to be
// in the kubernetes.io/ namespace.
func IsCustomNamespacedResourceName(name api.ResourceName) bool {
return strings.Contains(string(name), "/") &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since api.ResourceDefaultNamespacePrefix ends with a slash, why do you need this line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, never mind

@thockin
Copy link
Member

thockin commented Aug 10, 2017

This needs an associated issue and a closes #12345 in the FIRST COMMENT, or it can not be approved.

@thockin thockin self-assigned this Aug 10, 2017
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2017
@ConnorDoyle
Copy link
Contributor Author

Updated to fix unit test failure.

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 15, 2017
@ConnorDoyle
Copy link
Contributor Author

ConnorDoyle commented Aug 15, 2017

@davidopp @vishh @thockin updated.

The previous OIR behavior is not opt-in yet. I would like to work on that in a follow-up in order to unblock device plugins. It's blocked on fixing #50658.

Rationale: if we make the OIR behavior opt-in, then the existing alternative behavior of ignoring the resource request would be a very bad surprise for users. We need to reject unknown first-class resources before making the change so that users get a clear fail-fast signal that they have done something wrong.

// Returns true if the resource name is not in the *kubernetes.io/ namespace.
// Note: partially-qualified (unprefixed) names are assumed by default to be
// in the kubernetes.io/ namespace.
func IsCustomNamespacedResourceName(name v1.ResourceName) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the inverse of the IsDefaultNamespaceResource() function in pkg/api/helper/helpers.go. Can you at least add a comment to both of them so that someone in the future who changes them will ensure they stay in sync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't think we need both. Will remove one of them. Good catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@ConnorDoyle ConnorDoyle force-pushed the integer-resources-as-default branch 2 times, most recently from d5e77f1 to 6d4d735 Compare August 16, 2017 22:22
@ConnorDoyle ConnorDoyle changed the title Extend OIR behavior to all non-default-namespaced resources. Resources outside the *kubernetes.io namespace are integers and cannot be over-committed. Aug 16, 2017
@vishh
Copy link
Contributor

vishh commented Aug 16, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2017
@thockin
Copy link
Member

thockin commented Aug 17, 2017

Is this right? Given the design for device plugins, do we believe that all such plugin extensions represent integer-countable resources? I'll approve, because Vish is on it, but I'd like that spelled out somewhere.

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ConnorDoyle, thockin, vishh

Associated issue: 50473

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2017
@ConnorDoyle
Copy link
Contributor Author

@thockin it should be included in the docs for the device plugin feature (at least). Some of the discussion happened on Vish's hardware accelerator proposal. To summarize, we didn't want to encode per-resource attributes like fractional-ness and overcommit-ibility in the resource name, and there's no API structure yet for them. Our thought is that this approach has a decent chance of being forward-compatible if one is added.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 46317, 48922, 50651, 50230, 47599)

@k8s-github-robot k8s-github-robot merged commit ce1485c into kubernetes:master Aug 17, 2017
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Aug 17, 2017

@ConnorDoyle: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-etcd3 630af54 link /test pull-kubernetes-e2e-gce-etcd3

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ConnorDoyle ConnorDoyle deleted the integer-resources-as-default branch August 17, 2017 03:00
@vishh
Copy link
Contributor

vishh commented Aug 17, 2017

@thockin

do we believe that all such plugin extensions represent integer-countable resources?

The plan is to introduce a better API for representing resources (as discussed offline) which would then allow for opening up sharing of extended resources. Until then, extended resources are expected to not allow sharing. If we go with the option of allowing sharing now, then taking that away when a new API pipeline gets introduced will become complicated.

@jiayingz
Copy link
Contributor

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet