New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
helm chart: add helm option to customize nodeinit scripts #24375
Conversation
AFAIK this is already being addressed with #24288 - but would be nice as a feature not in particular related to the linked issue 馃憤 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me! Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there is an issue with the generated artifacts:
From CI:
+++ /home/runner/work/cilium/cilium/install/kubernetes/cilium/values.yaml 2023-03-15 12:38:31.571414468 +0000
@@ -2168,7 +2168,7 @@
# -- bootstrapFile is the location of the file where the bootstrap timestamp is
# written by the node-init DaemonSet
bootstrapFile: "/tmp/cilium-bootstrap.d/cilium-bootstrap-time"
-
+
# -- startup offers way to customize startup nodeinit script (pre and post position)
startup:
preScript: ""
error: cilium/values.yaml has been modified
Make sure to apply your changes to cilium/values.yaml.tmpl and run 'make -C install/kubernetes cilium/values.yaml' to regenerate.
make: *** [Makefile:49: check-values-yaml] Error 1
make: Leaving directory '/home/runner/work/cilium/cilium/install/kubernetes'
b6534d9
to
88a0934
Compare
@gandro done |
Thank you. There still seems to be an issue with some generated artifacts, should be fixed if you run Could you also please squash the commits into on? Thanks!
|
48cbc77
to
6bc09b2
Compare
Hopefully the last issue: Documentation lint is failing because of
|
6bc09b2
to
f3a5649
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, assuming CI also agrees
/test |
this ability to run custom script in node-init startup daemonset container is indeed very useful. Thanks @mblaschke On AWS EKS clusters, which come online with aws-vpc-cni addon installed automatically (EKS cluster create API still don't provide any way to disable the addon during cluster create, only allows disabling after the addon is installed : aws/containers-roadmap#923 open for a long time without resolution), running cilium in overlay routing mode (tunnel=enabled and not eni enabled) requires cleaning up some leftover residual iptables rules created by aws-vpc-cni (deleting and disabling the aws-vpc-cni addon doesn't cleanup some iptables rules) as This PR allows the cilium chart users to customize the startup script in node-init container by including the above cleanup bits as chart value override, when installing Cilium on EKS in overlay routing mode (tunnel=enabled) and solves a real problem that exists today. @gandro due to above reason, any chances of this PR getting cherry-picked after this merge to cilium-v1.12 and v1.13 release branches also ? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good but I would love to have a bit more context in the commit description, ideally with an example use case.
this is how we intend to use this for AWS EKS setups (with Cilium tunnel=enabled overlay routing mode): cilium helm chart's values_override.yaml snippet:
|
add systemd settings before cilium is starting to avoid that systemd is cleaning up stuff in managed k8s clusters there is no way to proper provision is right before cilium is starting, so that's the use case. also according to the comment here https://github.com/cilium/cilium/blob/master/install/kubernetes/cilium/templates/cilium-nodeinit/daemonset.yaml#L66 it gave me the feeling that i could customize the startup script but that's not possible as it's hardcoded in the chart./ |
I think there are ways around this, for example you can create the nodes with a custom node taint and have a custom DaemonSet configure the nodes for you and then remove the taint once the nodes are ready for Cilium. Cilium's getting started docs itself use a similar trick to avoid cloud CNIs for being installed on the nodes too. Or you can use a custom EKS bootstrap command (overrideBootstrapCommand in eksctl) to achieve similar things too. But I do think it makes sense to allow some customization in the Helm charts, so I'm fine with the change as is.
According to your back-porting criteria, we only backport bug fixes, for which I don't think this PR qualifies. Sorry. |
These iptables rules are added by the AWS VPC CNI, right? If I recall correctly, the AWS VPC DaemonSet does respect taints. For that use-case, you might be able to make use of the EKS node groups can be configured with that taint: https://docs.cilium.io/en/v1.13/installation/k8s-install-helm/#install-cilium (check the EKS tab) |
Thanks for the details! What I meant is that I would like to also have that info in the commit description, not just in the PR discussion, because that's the first place we're looking at when trying to understand the motivation for a code change through the Git history. |
No, that's not the case. aws-vpc-cni daemonset auto-installed by EKS on cluster create, tolerates everything. Also, the Cilium official docs mentions about these rules cleanup to be done manually for Cilium overlay routing mode on EKS: https://github.com/cilium/cilium/blob/3f9d44ecddb74e37641d779e3944f64abbb65cf4/Documentation/installation/k8s-install-helm.rst#install-cilium (search for AWS-SNAT-CHAIN-0 there). |
I see, thanks for checking. I guess that's why in our CI, we always manually patch the aws-vpc-cni DaemonSet to not be scheduled on any new nodes. |
In Azure you cannot touch anything which is maintained by Azure because they revert all changes. |
Sounds good, I think the approach here is reasonably flexible. @mblaschke could you add the motivation to the git commit messages as requested by @qmonnet here? |
67a0c23
to
197da79
Compare
@gandro is this enough? |
197da79
to
fc8e951
Compare
Looks good to me! Thanks. Let's wait for the review from sig-k8s and then this should be ready to merge. Previous CI was green. |
fc8e951
to
fdc5589
Compare
Allows customization of nodeinit scripts to adapt node configuration before cilium is starting to allow eg. systemd configuration changes and reload of networkd. This customization also allows a possibility to add hotfixes if node VMs needs fixed due to updates. Signed-off-by: Markus Blaschke <mblaschke82@gmail.com>
fdc5589
to
8f0a1d1
Compare
did a rebase for ci run |
/test |
/test-runtime |
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: <commit-id>
tag, thenplease add the commit author[s] as reviewer[s] to this issue.
implement a way to customize cilium nodeinit script to be able to adapt it to different environments before cilium is started. might be a possible way for workaround in #18706