Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - introduce ratcheting validation mechanism #64907

Closed
wants to merge 1 commit into from

Conversation

liggitt
Copy link
Member

@liggitt liggitt commented Jun 8, 2018

What this PR does / why we need it:

Introduces a ratcheting validation mechanism, as described in #64841 (comment)

Tightening validation on existing data cannot be done in a way that prevents existing stored objects from being updated.

This PR introduces the following mechanism:

  • identify granular additional validations (usually to fix fatal errors that always result in persisted, yet broken objects)
  • apply those validations on create to prevent new broken objects from being created
  • apply those validations on update only if the existing object also passes the validation
    • this prevents valid objects from becoming invalid
    • this avoids breaking the ability to update existing invalid objects

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #64841
Fixes #57510 - flex driver name
Fixes #58477 - duplicate envvar names
Fixes #54567, #64011 - invalid memory quantities (xref #63426)
xref #52936 - storage medium
xref #60934 - duplicate pvc claimName

Special notes for your reviewer:
/assign @lavalamp @deads2k

Release note:

Validation has tightened when creating new pods (or new objects containing pod templates):
* flex volume driver names must now be valid
* emptyDir medium must now be valid
* duplicate envvar names are not allowed
* duplicate volumes referencing the same PVC claimName are not allowed
* memory limits and requests must be an integer value

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jun 8, 2018
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 8, 2018
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2018
@liggitt
Copy link
Member Author

liggitt commented Jun 8, 2018

@kubernetes/sig-api-machinery-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jun 8, 2018
@liggitt
Copy link
Member Author

liggitt commented Jun 8, 2018

would like feedback on the mechanism first, then will tag various teams on the specific validation commits (those were mostly to illustrate how the mechanism could be used)

@liggitt liggitt added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 8, 2018
@liggitt
Copy link
Member Author

liggitt commented Jun 8, 2018

/retest

spec := data.(*core.PodSpec)
for i, c := range spec.InitContainers {
names := sets.NewString()
for j, e := range c.Env {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like these loops could be a subroutine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up the loops

@lavalamp
Copy link
Member

lavalamp commented Jun 8, 2018

Looks good overall.

I do wish there were less setup code: folks have to remember to call this everywhere there's a pod template type. But it is fairly straightforward, maybe that is better than a complicated automated system. We can always automate when we find people making errors that the automation would catch.

@lavalamp
Copy link
Member

lavalamp commented Jun 8, 2018

We should think about how to do this problem for CRD validation, too, when we add it. Perhaps there all validations should be ratcheting.

@sttts
Copy link
Contributor

sttts commented Jun 11, 2018

We should think about how to do this problem for CRD validation, too, when we add it. Perhaps there all validations should be ratcheting.

This is getting interesting. For native types we only have positive validation predicates A(old,new) and B(old,new) and C(old,new). So we can easily declare some as ratcheting (call it A', B', C'), making the complete validation predicate A'(old,new) and B'(old,new) and C'(old,new) weaker than the original. With CRDs and their OpenAPI based validation, we don't have positivity, e.g. ((not A(old,new)) or B(old,new)) and C(old,new) is expressible. Making A ratcheting as A' gives a stronger predicate than before, not what we want. So ratcheting has to be restricted to positive predicates.

Another speciality for CRDs: for defaulting we have to apply the OpenAPI schema before we even have an old object to check against. We could get away by tagging certain OpenAPI properties as ratcheting, drop them on (coercing) schema validation for defaulting and only apply them on strategy validation.

But even this way, it is not obvious how a ratcheting validation with an old and a new object will look like. The validation algorithm and the whole OpenAPI schema semantics does not know the concept of validation of differences between two objects on update. If we come up with such semantics though, this would also be very interesting to formulate "read-only field", which cannot be espressed today in OpenAPI at all.

@lavalamp
Copy link
Member

lavalamp commented Jun 11, 2018 via email

@liggitt
Copy link
Member Author

liggitt commented Jun 11, 2018

@sttts I assumed we would be writing a validator that takes both old and new objects, and only enforces a validation check on the new object if the old one passes (or is missing the relevant field). I am talking about value validation and not schema validation, of course.

One of the reasons the approach in this PR works well is that the tightening validations are separable. Take the following scenario:

  • I have base validation V, and tightening validations A, B, C.
  • I have an old object that passes V, A, B, and fails C

Any update would also be required to pass V, A, B. If V, A, B, and C could not be checked individually, the old object failing C means we'd have to skip all validation on the updated object, which would open us up to objects with much bigger validation problems.

@sttts is the declarative validation able to be broken down to the level of individual checks?

@mbohlool
Copy link
Contributor

/sub

@sttts
Copy link
Contributor

sttts commented Jun 11, 2018

@sttts is the declarative validation able to be broken down to the level of individual checks?

No, in general it is not. But one knows which parts of a schema can be "broken apart" in the way you describe. So I think we can mark certain sub-schemata as ratcheting and the CRD type validation will check those markers. E.g. in ((not A(old,new)) or B(old,new)) and C(old,new) only B and C are positive and therefore can be ratcheting (or some sub-schema of them).

@sttts
Copy link
Contributor

sttts commented Jun 11, 2018

And yes, we will definitely be implementing read-only/immutable fields, for
built-ins and CRs. Take a look at some of the stuff @apelisse and @seans3
have been doing in kube-openapi if you haven't seen it already.

@lavalamp If you have a link, that would be awesome. This sounds interesting. I saw the extension PRs. Do you mean those to express new kube-specific properties?

We can certainly make this work using the restriction to positive sub-schemata (see post above). I am not completely convinced though yet that this always gives the semantics we want and that users expect, especially because schema and value validation are not separable: some value sub-schemata are required to select the right branches of AllOf/AnyOf/OneOf quantors. This needs some more thought.

@yliaog
Copy link
Contributor

yliaog commented Jun 11, 2018

/cc @yliaog

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 16, 2018
@bgrant0607
Copy link
Member

Sorry for not noticing this earlier.

I commented on #64841

@liggitt
Copy link
Member Author

liggitt commented Jul 17, 2018

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 17, 2018
@bgrant0607
Copy link
Member

@liggitt Is the idea that every release the ratcheting validations would be moved to standard ones, and potentially new ratcheting validations would be introduced?

@liggitt
Copy link
Member Author

liggitt commented Jul 17, 2018

Is the idea that every release the ratcheting validations would be moved to standard ones, and potentially new ratcheting validations would be introduced?

No, they couldn't be moved to be standard validations until we could guarantee the API servers would never encounter persisted etcd data that would fail those validations.

@tallclair
Copy link
Member

/lgtm cancel
See #64841 (comment)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 17, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@liggitt: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 17, 2018
@erictune
Copy link
Member

Hi, sorry to be late to the party, but some here is some feedback on this approach:

  • Yes, we should have CRD support for ratcheting, and it should ideally be very similar to the built-in support.
  • Annotating or modifying the CRD schema to express RatchetingValidations seems to me like it scales poorly.
  • Instead of having special RatchetingValidation functions, instead, one could instead introduce a new api version with stronger validation. Then one can set the desired storage version (as CRDs do) to lazily move over objects or run a "storage version upgrade script" procedure to eagerly move over objects.
  • To address the complaint that we don't want to move pods to v2, we could introduce a "microversion" and the validation uses both apiversion and metadata.microversion to compose a tuple which is used to select what functions/schema is used for validation.

@liggitt
Copy link
Member Author

liggitt commented Jul 21, 2018

Instead of having special RatchetingValidation functions, instead, one could instead introduce a new api version with stronger validation. Then one can set the desired storage version (as CRDs do) to lazily move over objects or run a "storage version upgrade script" procedure to eagerly move over objects.

What would the storage migration do with existing objects that would not pass the new stricter validation?

As long as the previous API version is still usable (or data persisted via the previous API version is still able to be encountered), we have to handle round-tripping and updating that object via the new API version. Making an update of an unrelated field (like an annotation or a finalizer) fail for validation reasons is problematic.

@bgrant0607
Copy link
Member

Quick note: We don't have versioned validation yet: #8550

allErrs := validation.ValidateCronJob(cronJob)
allErrs = append(allErrs, apimachineryvalidation.ValidateRatchetingCreate(
&cronJob.Spec.JobTemplate.Spec.Template.Spec,
field.NewPath("spec", "template", "spec", "template", "spec"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: field.NewPath("spec", "jobTemplate", "spec", "template", "spec")

@mariantalla
Copy link
Contributor

@hoegaarden and I were also wondering what you thought about the following:

  1. The capability of also applying preprogrammed "migrations" to invalid objects? In other words, the next step after a failed validation could be to apply some logic to automatically reconcile it/ make it compatible with the new api version.

  2. The validators being extracted from the body of the api server. This is perhaps more of a design consideration, but could make it easier to

  • maintain validator collections in future versions of the API server
  • maybe even allow operators to define their own additional validators and configure the API server with those. Maybe admission webhooks already offer the tools for that use case though. 🤔

@dims
Copy link
Member

dims commented Aug 23, 2018

@liggitt should we shoot for v1.12?

@liggitt
Copy link
Member Author

liggitt commented Aug 24, 2018

should we shoot for v1.12?

This isn't a top priority for the next week, so no

@liggitt liggitt changed the title introduce ratcheting validation mechanism WIP - introduce ratcheting validation mechanism Oct 29, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 29, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2019
@liggitt liggitt closed this Feb 9, 2019
@smarterclayton
Copy link
Contributor

Fate of this?

@lavalamp
Copy link
Member

@smarterclayton It's solvable but with much more difficulty-- read from this comment to the end of the issue.

@liggitt
Copy link
Member Author

liggitt commented Feb 19, 2019

@smarterclayton It's solvable but with much more difficulty-- read from this comment to the end of the issue.

If we limit ourselves to ratcheting in cases where the created objects were fatally flawed, I don't think we need to continue to accept invalid objects. I mostly closed this because I wasn't actively working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet