-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10 backports 2022-05-06 #19744
v1.10 backports 2022-05-06 #19744
Conversation
[ upstream commit 21e6e6a ] Currently Hubble-Relay builds its client list by querying the Peer Service over the local Hubble Unix domain socket. This goes against best security practices (sharing files across pods) and is not allowed on platforms that strictly enforce SELinux policies (e.g. OpenShift). This PR enables, by default, the creation of a Kubernetes Service that proxies the Hubble Peer Service so that Hubble-Relay can use it to build its client list, eliminating the need for a shared Unix domain socket completely. Helm values and configurations have been added to enable the service in a cilium deployment. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aanm @nathanjsweet @sayboras this backport was complicated due to conflicts and the size of upstream PRs, please review and test carefully 🙏
@@ -1,4 +1,8 @@ | |||
{{- if .Values.hubble.relay.enabled }} | |||
{{- $peerSvcPort := .Values.hubble.peerService.servicePort -}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nathanjsweet note that the condition above in master
guard for .Value.hubble.enabled
as well. I don't think that's an issue, but to note.
@@ -0,0 +1,24 @@ | |||
{{- if and .Values.hubble.enabled .Values.hubble.listenAddress .Values.hubble.peerService.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nathanjsweet I've left this file in templates/hubble like in the upstream commit, although it's at odd with the v1.10 layout.
c.NodeGCInterval = viper.GetDuration(NodesGCInterval) | ||
c.NodesGCInterval = viper.GetDuration(NodesGCInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sayboras I've moved this new line right next to the "old" c.NodesGCInterval
(notice the Nodes plural in the name) which is deprecated in v1.11 and recycled in v1.12 (the upstream commit). Is it OK to have this behaviour introduced in v1.10 (basically piggy backing on the old CLI flag)?
Also note that c.NodesGCInterval
is set twice in this func (second time new line 432), but I don't think it matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be fine for bug fixing 💯
CiliumStableHelmChartVersion = "1.9-dev" | ||
CiliumStableHelmChartVersion = "1.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aanm as suggested
/test-backport-1.10 Job 'Cilium-PR-K8s-1.18-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
2ab7ad2
to
d1dcf03
Compare
/test-backport-1.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot. Looks good for my commit, also I tested the changes here locally 💯
$ ksyslo deployment/cilium-operator --timestamps | grep "subsys=watchers"
2022-05-09T13:31:38.609543593Z level=info msg="Starting to garbage collect stale CiliumNode custom resources" subsys=watchers
2022-05-09T13:31:38.609590071Z level=debug msg="Cilium pod not running for node" nodeName=minikube subsys=watchers
2022-05-09T13:31:38.609597765Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
...
2022-05-09T13:36:38.610401836Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
2022-05-09T13:36:38.610488690Z level=info msg="Add CiliumNode to garbage collector candidates" nodeName=dummy-one subsys=watchers
2022-05-09T13:36:44.338787619Z level=debug msg="Node without taint and with CiliumIsUp condition" nodeName=minikube subsys=watchers
...
$ ksyslo deployment/cilium-operator --timestamps | grep "subsys=watchers"
2022-05-09T13:31:38.609543593Z level=info msg="Starting to garbage collect stale CiliumNode custom resources" subsys=watchers
2022-05-09T13:31:38.609590071Z level=debug msg="Cilium pod not running for node" nodeName=minikube subsys=watchers
2022-05-09T13:31:38.609597765Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
...
2022-05-09T13:36:38.610401836Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
2022-05-09T13:36:38.610488690Z level=info msg="Add CiliumNode to garbage collector candidates" nodeName=dummy-one subsys=watchers
2022-05-09T13:36:44.338787619Z level=debug msg="Node without taint and with CiliumIsUp condition" nodeName=minikube subsys=watchers
...
2022-05-09T13:41:38.611438448Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
2022-05-09T13:41:38.611484484Z level=info msg="Perform GC for invalid CiliumNode" nodeName=dummy-one subsys=watchers
2022-05-09T13:41:38.618173682Z level=info msg="CiliumNode is garbage collected successfully" nodeName=dummy-one subsys=watchers
2022-05-09T13:41:50.655388256Z level=debug msg="Node without taint and with CiliumIsUp condition" nodeName=minikube subsys=watchers
To not break CI, we need to kind of backport the following PRs: |
d1dcf03
to
68690a0
Compare
/test-only --focus="K8sUpdates.*Tests upgrade and downgrade from a Cilium stable image to master" --k8s_version=1.16 --kernel_version=net-next |
[ upstream commit 88d31cd ] The upgrade tests are using the official helm charts with unreleased Cilium images. This can might cause the upgrade tests to fail in case the changes done in the unreleased Cilium versions require a new helm chart release. To fix this problem the upgrade tests will now use the unreleased helm charts as well. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 67f74ff ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit e80f60d ] This commit is to perform lift and shift current initialization for k8s node watcher to another function with single scope of work, so that it can be re-used later. There is no change in logic. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 64b37e1 ] This is to make sure that k8s node watcher is only setup at max once. Also, synced channel is added, so that the consumer of this store knows if syncing process is done. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit edc1a0a ] In the normal scenario, CiliumNode is created by agent with owner references attached all time in below PR[0]. However, there could be the case that CiliumNode is created by IPAM module[1], which didn't have any ownerReferences at all. For this case, if the corresponding node got terminated and never came back with same name, the CiliumNode resource is still dangling, and needs to be garbage collected. This commit is to add garbage collector for CiliumNode, with below logic: - Gargage collector will run every predefined interval (e.g. specify by flag --nodes-gc-interval) - Each run will check if CiliumNode is having a counterpart k8s node resource. Also, remove this node from GC candidate if required. - If yes, CiliumNode is considered as valid, happy day. - If no, check if ownerReferences are set. - If yes, let k8s perform garbage collection. - If no, mark the node as GC candidate. If in the next run, this node is still in GC candidate, remove it. References: [0]: cilium#17329 [1]: https://github.com/cilium/cilium/blob/master/pkg/ipam/allocator/podcidr/podcidr.go#L258 Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 167fa2b ] This commit is to map operator CLI node GC interval flag in helm value and config map. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
68690a0
to
4e7ca33
Compare
/test-only --focus="K8sUpdates.*Tests upgrade and downgrade from a Cilium stable image to master" --k8s_version=1.16 --kernel_version=net-next |
/test-backport-1.10 |
@nathanjsweet for both v1.11 and v1.10 correct? |
Once this PR is merged, you can update the PR labels via:
or with