New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator: fix crash that occurs when cep CRD is disabled and CES enabled with kvstore #25798
operator: fix crash that occurs when cep CRD is disabled and CES enabled with kvstore #25798
Conversation
86fe7e2
to
e5fa687
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @doniacld
thanks for the PR - looks good so far! only some proposals regarding the failing checks and missing labels from my side.
Checks
looks like the check BPF checks / checkpatch
is failing due to the long commit message subject.
Error: ERROR:CUSTOM: Please avoid long commit subjects (max: 75, found: 85)
Not sure what's going on with the other failing CIlium Runtime
checks that have been introduced lately. Might be worth to rebase to main
to fetch the latest version while renaming the commit message.
Labels
you should also add a relevant release note label to this PR (https://docs.cilium.io/en/latest/contributing/development/contributing_guide/#submitting-a-pull-request - 11.) - IMO release-note/misc
should be appropriate in this case (which doesn't require an explicit release note).
it's also helpful to add some labels to bring the PR into context - e.g. area/operator
, kind/bug
eb3407c
to
cd20d17
Compare
Thanks @mhofstetter for your feedback! I added the labels you mentioned and reduced the char size of my commit subject. |
cd20d17
to
8fdf21a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doniacld thanks for adding the labels and fixing the commit subject.
Check in the Go-Code when starting the operator looks good to me.
But it might be worth to add this check to the Helm Chart too - to prevent an installation with these incompatible values in an earlier stage. This would also cover an installation via Cilium CLI (with helm mode).
The two config values are configured in the configmap (disableEndpointCRD & enableCiliumEndpointSlice).
I'd recommend to place a check based on the two helm values .Values.enableCiliumEndpointSlice
& .Values.disableEndpointCRD
in the helm validation file (https://github.com/cilium/cilium/blob/main/install/kubernetes/cilium/templates/validate.yaml)
Thanks for your comment regarding the helm check, we had this conversation with @gandro who gave me some history regarding the |
0e82161
to
a13ff4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @doniacld patch LGTM!
d26bed2
to
6b8fec5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doniacld thanks for adding the helm validation!
looks like the docker build failed due to some infra issues (and therefore all dependent checks).
i'd recommend to rebase to main
to re-trigger all basic checks. In addition you can trigger the e2e-tests by commenting /test
(in a single comment)
@tommyp1ckles your review became mandatory (with sig-k8s). please chime in :)
@doniacld thanks for the heads-up. in general i prefer having explicit types. it's also nice that helm automatically converts the strings to booleans. it's just that setting
i don't know whether we want that and whether the note in the upgrade guide is enough 🤷♂️ maybe a question for more seasoned cilium teammembers? @qmonnet @kaworu |
Right, I'm no Helm expert but this sounds like the ideal thing to introduce bugs, good catch. Is there a way to add some validation on the type (or value), so that users get an error if they pass |
@qmonnet Schema validation is possible with Helm - but we don't use it yet 🤔 https://helm.sh/docs/topics/charts/#schema-files otherwise we have to enforce an explicit type-check in the if itself |
@@ -131,9 +131,9 @@ data: | |||
skip-cnp-status-startup-clean: "{{ .Values.operator.skipCNPStatusStartupClean }}" | |||
{{- end }} | |||
|
|||
{{- if hasKey .Values "disableEndpointCRD" }} | |||
{{- if .Values.disableEndpointCRD }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to enforce an explicit type check - we have to change this to {{- if eq .Values.disableEndpointCRD true }}
this will result in an error if trying to pass an incompatible type.
error calling eq: incompatible types for comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated and tested.
Configuration with the wrong type
enableCiliumEndpointSlice: false
disableEndpointCRD: "true"
Error is as expected
Error: template: cilium/templates/cilium-configmap.yaml:134:7: executing "cilium/templates/cilium-configmap.yaml" at <eq .Values.disableEndpointCRD true>: error calling eq: incompatible types for comparison
Configuration with the wrong type BUT enableCiliumEndpointSlice: true
enableCiliumEndpointSlice: true
disableEndpointCRD: "true"
The error type is not thrown since I guess validate.yaml is the first test to be executed.
Error: execution error at (cilium/templates/validate.yaml:76:7): if Cilium Endpoint Slice is enabled (.Values.enableCiliumEndpointSlice=true), it requires .Values.disableEndpointCRD=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doniacld great
we're missing one case:
enableCiliumEndpointSlice: true
disableEndpointCRD: "false"
the validation error gets thrown even though the endpointCRD isn't actually disabled
❯ helm template ./install/kubernetes/cilium -f ./contrib/testing/kind-values.yaml
Error: execution error at (cilium/templates/validate.yaml:76:7): if Cilium Endpoint Slice is enabled (.Values.enableCiliumEndpointSlice=true), it requires .Values.disableEndpointCRD=false
-> we should change the validation to enforce type checks too
{{- if and (eq .Values.enableCiliumEndpointSlice true ) (eq .Values.disableEndpointCRD true ) }}
{{ fail "if Cilium Endpoint Slice is enabled (.Values.enableCiliumEndpointSlice=true), it requires .Values.disableEndpointCRD=false" }}
{{- end }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little update:
Config enableCiliumEndpointSlice:false & disableEndpointCRD: "true"
enableCiliumEndpointSlice: false
disableEndpointCRD: "true"
Error
Error: template: cilium/templates/cilium-configmap.yaml:134:7: executing "cilium/templates/cilium-configmap.yaml" at <eq .Values.disableEndpointCRD true>: error calling eq: incompatible types for comparison
Config enableCiliumEndpointSlice:true & disableEndpointCRD: "true"
enableCiliumEndpointSlice: true
disableEndpointCRD: "true"
Error
Error: template: cilium/templates/validate.yaml:75:9: executing "cilium/templates/validate.yaml" at <eq .Values.disableEndpointCRD true>: error calling eq: incompatible types for comparison
Helm command enableCiliumEndpointSlice=true & disableEndpointCRD="true"
--set enableCiliumEndpointSlice=true --set-string disableEndpointCRD="true"
Error
Error: template: cilium/templates/validate.yaml:75:9: executing "cilium/templates/validate.yaml" at <eq .Values.disableEndpointCRD true>: error calling eq: incompatible types for comparison
Now we should be good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @doniacld
I was thinking more of something along what we have in |
Ah, just seeing your suggestion now on |
405a25b
to
eb9a751
Compare
4e0a8d4
to
81fc102
Compare
Nice work! The helm validation makes me wonder if we should somehow unify checking/exiting config incompatibilities and doing helm based validation one day (failing to install is a lot better than crashing). |
/test |
Commit 8764de570cb04cbf5709beb91feaef869bfa4170 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
8764de5
to
da6fb22
Compare
Commit 8764de570cb04cbf5709beb91feaef869bfa4170 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
kvstore Make the operator crash early in config.go if CiliumEndpointSlice is enabled and CiliumEndpointCRD is disabled. Note that CiliumEndpointSlice feature needs CiliumEndpoint CRDs to run. In operator/cmd/root.go, remove the condition on CiliumEndpointCRD since at this point of the lifecycle, the operator should have crash if we end up in this case. This fix is only relevant for kvstore mode, because in identity-allocation-mode CRD, we forcefully enable the CiliumEndpoint CRD. Fixes cilium#24396 Signed-off-by: Donia Chaiehloudj <donia.cld@isovalent.com>
Signed-off-by: Donia Chaiehloudj <donia.cld@isovalent.com>
…d too This commit adds a supplementary check during the deployment to prevent installing Cilium with incompatibles flags enableCiliumEndpointSlice=true disableEndpointCRD=true. Signed-off-by: Donia Chaiehloudj <donia.cld@isovalent.com>
da6fb22
to
37a5d8f
Compare
/test |
1 similar comment
/test |
Description
Make the operator crash early if CiliumEndpointSlice is enabled and CiliumEndpoint CRDs is disabled since it does not make sense to have CES without cep running.
Add a check at helm installation level.
Fixes #24396
Test
Test at runtime
Adding the following configuration in kind-values.yaml and deploy a kind cluster
NB: Tested with and without
identityAllocationMode: "kvstore"
Check the logs from the operator
Test during deployment
The error is properly raised.