Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fixes the regression of dropping resources from argo-rollouts crds. Fixes #1043 #1044

Merged
merged 8 commits into from
Apr 2, 2021
Merged

fix: Fixes the regression of dropping resources from argo-rollouts crds. Fixes #1043 #1044

merged 8 commits into from
Apr 2, 2021

Conversation

anderson4u2
Copy link
Contributor

@anderson4u2 anderson4u2 commented Mar 23, 2021

Closes #1043

As Pedro Arvela explained in #1043, #480 removed the validation for 'resources' field due to the inability to support float types for the validation. However we believe this is causing k8s to prune the undefined fields in the schema.

In this MR, I am re-adding the validations removed by #480 (and a couple that were probably missed) as fields that won't be pruned, will not break with float values, and so "resources" won't be pruned and will work as expected.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

Signed-off-by: Anderson Silva <hello@anderson.codes>
@anderson4u2 anderson4u2 changed the title fix(crd): Fixes the regression of dropping resources from argo-rollouts crds. Fixes #1043. fix: Fixes the regression of dropping resources from argo-rollouts crds. Fixes #1043. Mar 23, 2021
@anderson4u2 anderson4u2 changed the title fix: Fixes the regression of dropping resources from argo-rollouts crds. Fixes #1043. fix: Fixes the regression of dropping resources from argo-rollouts crds. Mar 23, 2021
@codecov
Copy link

codecov bot commented Mar 23, 2021

Codecov Report

Merging #1044 (43c3589) into master (bff46e1) will increase coverage by 0.08%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1044      +/-   ##
==========================================
+ Coverage   80.91%   81.00%   +0.08%     
==========================================
  Files         102      103       +1     
  Lines        9097     9166      +69     
==========================================
+ Hits         7361     7425      +64     
- Misses       1242     1246       +4     
- Partials      494      495       +1     
Impacted Files Coverage Δ
utils/unstructured/unstructured.go 56.52% <0.00%> (-1.26%) ⬇️
pkg/kubectl-argo-rollouts/cmd/cmd.go 100.00% <0.00%> (ø)
pkg/kubectl-argo-rollouts/cmd/status/status.go 93.44% <0.00%> (ø)
metricproviders/newrelic/newrelic.go 92.00% <0.00%> (+0.69%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bff46e1...43c3589. Read the comment docs.

@anderson4u2 anderson4u2 changed the title fix: Fixes the regression of dropping resources from argo-rollouts crds. fix: Fixes the regression of dropping resources from argo-rollouts crds. Fixes #1043 Mar 23, 2021
Signed-off-by: Anderson Silva <hello@anderson.codes>
@jessesuen
Copy link
Member

jessesuen commented Mar 24, 2021

I'm unable to reproduce this issue using the sample gist in https://gist.github.com/PedroArvela/66dec9c51ce0872caf272f90603e28e8 and Kubernetes v1.20.2. Is this specific to kubernetes version?

@jessesuen
Copy link
Member

jessesuen commented Mar 24, 2021

Note that when we run controller-gen, we do not run it with preserveUnknownFields=false, meaning we are telling Kubernetes not to prune unknown fields:

	crdYamlBytes, err := exec.Command(
		"controller-gen",
		"paths=./pkg/apis/rollouts/...",
		"crd:trivialVersions=true",
		// cannot use preserveUnknownFields=false until controller-gen generates proper support for
		// resource.Quantity, which we remove validation for
		//"crd:preserveUnknownFields=false",
		"crd:crdVersions=v1beta1",
		"output:crd:stdout",
	).Output()

So I do not understand how this could happen.

@jessesuen
Copy link
Member

OK I understand what's going on. From kubernetes documentation:

https://v1-18.docs.kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#preserving-unknown-fields

For CustomResourceDefinitions created in apiextensions.k8s.io/v1, structural OpenAPI v3 validation schemas are required and pruning is enabled and cannot be disabled (note that CRDs converted from apiextensions.k8s.io/v1beta1 to apiextensions.k8s.io/v1 might lack structural schemas, and spec.preserveUnknownFields might be true).

In other words, kubernetes no longer supports the preservation of unknown fields as a top-level option, as of apiextensions.k8s.io/v1. But because Argo Rollouts still vends apiextensions.k8s.io/v1beta1 CRD definitions, we were able to get away with disabling this. So instead of relying on the deprecated spec.preserveUnknownFields ability, we must preserve unknown fields explicitly through the custom open API annotations which is being added in this PR.

I notice that when I get the v1 CRD object, i see the following condition:

status:
  conditions:
  - lastTransitionTime: "2021-03-19T11:33:45Z"
    message: 'spec.preserveUnknownFields: Invalid value: true: must be false'
    reason: Violations
    status: "True"
    type: NonStructuralSchema

So technically speaking, Argo Rollouts is vending a CRD that violates current CRD requirements and this PR is fixing that violation.

@jessesuen
Copy link
Member

@anderson4u2 - this PR is valid, but I'm still curious how you wound up in this situation, given that Argo Rollouts vends the v1beta1 CRDs.

@jsoref
Copy link
Member

jsoref commented Mar 24, 2021

The argo-helm repo is vending 1.0: https://github.com/argoproj/argo-helm/blob/master/charts/argo-rollouts/templates/crds/rollout-crd.yaml

And it's impossible to merge anything into that repository without vending 1.0.

pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
type: object
x-kubernetes-preserve-unknown-fields: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still understanding the implications of dropping the resource quantity validation here, and why this only applied to ephemeral containers. Once I do that, the PR looks good to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in theory one would want to provide a full spec. But if you can't, then providing a "we don't know what belongs here, we're labeling it dragons -- here be dragons". This allows the dragons to live here.

Ideally someone will later spend the time to properly spec it out and replace the dragon gate.

But, also, when they do that, (probably in a new version,) it'll force people to revalidate the data that's hiding there instead naively trusting it (which is the reason k8s to force default omit -- to prevent pre-poisoning)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I'm only setting the not-prune on values of resources/requests and resources/limits. So it limits the scope for failure to only those fields, not its subfields nor its parent fields, which limits the scope for failure significantly.

Signed-off-by: Anderson Silva <hello@anderson.codes>
@anderson4u2
Copy link
Contributor Author

Hello, I've pushed another commit to upgrade the API to v1 CRD. Let me know your thoughts. I've removed the function removeNestedItems as it seems it hasn't been run in the argo-helm templates, and we're running that without any problems. Any feedback greatly appreciated, thanks!

Signed-off-by: Anderson Silva <hello@anderson.codes>
@jessesuen
Copy link
Member

Great! will take another look

@jessesuen
Copy link
Member

jessesuen commented Mar 26, 2021

I've removed the function removeNestedItems as it seems it hasn't been run in the argo-helm templates, and we're running that without any problems.

In order for you to notice any problems, you would have to have had a volume like:

  volumes:
    # You set volumes at the Pod level, then mount them into containers inside that Pod
    - name: config
      configMap:
        # Provide the name of the ConfigMap you want to mount.
        name: game-demo
        # An array of keys from the ConfigMap to create as files
        items:
        - key: "game.properties"
          path: "game.properties"
        - key: "user-interface.properties"
          path: "user-interface.properties"

Before removing this, can you test that this works with the new CRD?

Signed-off-by: Anderson Silva <hello@anderson.codes>
Signed-off-by: Anderson Silva <hello@anderson.codes>
Signed-off-by: Anderson Silva <hello@anderson.codes>
@anderson4u2
Copy link
Contributor Author

I've removed the function removeNestedItems as it seems it hasn't been run in the argo-helm templates, and we're running that without any problems.

In order for you to notice any problems, you would have to have had a volume like:

  volumes:
    # You set volumes at the Pod level, then mount them into containers inside that Pod
    - name: config
      configMap:
        # Provide the name of the ConfigMap you want to mount.
        name: game-demo
        # An array of keys from the ConfigMap to create as files
        items:
        - key: "game.properties"
          path: "game.properties"
        - key: "user-interface.properties"
          path: "user-interface.properties"

Before removing this, can you test that this works with the new CRD?

Hi, I've checked that it works with k8s 1.17+. Tested it with k8s 1.16 and got this error, the merge that fixed it was this, which was released in 1.17. I've also added a test in the test-e2e that fails in k8s 1.16 but succeeds in 1.17+.

@jessesuen
Copy link
Member

Hi, I've checked that it works with k8s 1.17+. Tested it with k8s 1.16 and got this error, the merge that fixed it was this, which was released in 1.17. I've also added a test in the test-e2e that fails in k8s 1.16 but succeeds in 1.17+.

I just polled the community in slack and looks like we still have a significant portion of our users on v1.16 or below (4/10 responses at time of writing).

So I think it's better to preserve the existing behavior in order to continue to support our v1.16/v1.15 users. Could you add back removeNestedItems to support this?

Signed-off-by: Anderson Silva <hello@anderson.codes>
@sonarcloud
Copy link

sonarcloud bot commented Apr 2, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@anderson4u2
Copy link
Contributor Author

Hi, I've checked that it works with k8s 1.17+. Tested it with k8s 1.16 and got this error, the merge that fixed it was this, which was released in 1.17. I've also added a test in the test-e2e that fails in k8s 1.16 but succeeds in 1.17+.

I just polled the community in slack and looks like we still have a significant portion of our users on v1.16 or below (4/10 responses at time of writing).

So I think it's better to preserve the existing behavior in order to continue to support our v1.16/v1.15 users. Could you add back removeNestedItems to support this?

Hi, I've checked with the refactored function but I still got errors when trying to deploy a configmap volume in k8s 1.16. I've unvalidated the whole volumes block and it now works. I've added this unvalidation to all CRDs. Let me know your thoughts :)

Copy link
Member

@jessesuen jessesuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement!

Comment on lines +79 to +84
// The only possible value is 'false' since 'apiextensions.k8s.io/v1'
// https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#field-pruning
// It is possible though to opt-out of pruning for specifc sub-trees of fields by adding x-kubernetes-preserve-unknown-fields: true
// by using the 'setValidationOverride' function in this file.
"crd:preserveUnknownFields=false",
"crd:crdVersions=v1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, glad to help!

@jessesuen jessesuen merged commit 01adc2e into argoproj:master Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource requests/limits are ignored by Kubernetes
3 participants