Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.10 backports 2022-05-06 #19744

Merged
merged 7 commits into from
May 9, 2022
Merged

Conversation

kaworu
Copy link
Member

@kaworu kaworu commented May 6, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 18620 19710 19711 19576; do contrib/backporting/set-labels.py $pr done 1.10; done

or with

$ make add-label BRANCH=v1.10 ISSUES=18620,19710,19711,19576

[ upstream commit 21e6e6a ]

Currently Hubble-Relay builds its client list by querying
the Peer Service over the local Hubble Unix domain socket.
This goes against best security practices (sharing files
across pods) and is not allowed on platforms that strictly
enforce SELinux policies (e.g. OpenShift).

This PR enables, by default, the creation of a Kubernetes Service
that proxies the Hubble Peer Service so that Hubble-Relay
can use it to build its client list, eliminating the need
for a shared Unix domain socket completely.

Helm values and configurations have been added to enable
the service in a cilium deployment.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
@kaworu kaworu requested a review from a team as a code owner May 6, 2022 20:59
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels May 6, 2022
Copy link
Member Author

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aanm @nathanjsweet @sayboras this backport was complicated due to conflicts and the size of upstream PRs, please review and test carefully 🙏

@@ -1,4 +1,8 @@
{{- if .Values.hubble.relay.enabled }}
{{- $peerSvcPort := .Values.hubble.peerService.servicePort -}}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanjsweet note that the condition above in master guard for .Value.hubble.enabled as well. I don't think that's an issue, but to note.

@@ -0,0 +1,24 @@
{{- if and .Values.hubble.enabled .Values.hubble.listenAddress .Values.hubble.peerService.enabled }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanjsweet I've left this file in templates/hubble like in the upstream commit, although it's at odd with the v1.10 layout.

Comment on lines +419 to 420
c.NodeGCInterval = viper.GetDuration(NodesGCInterval)
c.NodesGCInterval = viper.GetDuration(NodesGCInterval)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayboras I've moved this new line right next to the "old" c.NodesGCInterval (notice the Nodes plural in the name) which is deprecated in v1.11 and recycled in v1.12 (the upstream commit). Is it OK to have this behaviour introduced in v1.10 (basically piggy backing on the old CLI flag)?

Also note that c.NodesGCInterval is set twice in this func (second time new line 432), but I don't think it matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine for bug fixing 💯

CiliumStableHelmChartVersion = "1.9-dev"
CiliumStableHelmChartVersion = "1.9"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aanm as suggested

@kaworu
Copy link
Member Author

kaworu commented May 9, 2022

/test-backport-1.10

Job 'Cilium-PR-K8s-1.18-kernel-4.9' failed:

Click to show.

Test Name

K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: cannot install connectivity-check

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.18-kernel-4.9 so I can create one.

test/k8sT/Updates.go Outdated Show resolved Hide resolved
@kaworu kaworu force-pushed the pr/v1.10-backport-2022-05-06 branch from 2ab7ad2 to d1dcf03 Compare May 9, 2022 12:12
@kaworu
Copy link
Member Author

kaworu commented May 9, 2022

/test-backport-1.10

@kaworu kaworu requested a review from aanm May 9, 2022 12:34
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. Looks good for my commit, also I tested the changes here locally 💯

$ ksyslo deployment/cilium-operator --timestamps | grep "subsys=watchers"                                              
2022-05-09T13:31:38.609543593Z level=info msg="Starting to garbage collect stale CiliumNode custom resources" subsys=watchers
2022-05-09T13:31:38.609590071Z level=debug msg="Cilium pod not running for node" nodeName=minikube subsys=watchers
2022-05-09T13:31:38.609597765Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
...
2022-05-09T13:36:38.610401836Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
2022-05-09T13:36:38.610488690Z level=info msg="Add CiliumNode to garbage collector candidates" nodeName=dummy-one subsys=watchers
2022-05-09T13:36:44.338787619Z level=debug msg="Node without taint and with CiliumIsUp condition" nodeName=minikube subsys=watchers
...
$ ksyslo deployment/cilium-operator --timestamps | grep "subsys=watchers"                                              
2022-05-09T13:31:38.609543593Z level=info msg="Starting to garbage collect stale CiliumNode custom resources" subsys=watchers
2022-05-09T13:31:38.609590071Z level=debug msg="Cilium pod not running for node" nodeName=minikube subsys=watchers
2022-05-09T13:31:38.609597765Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
...
2022-05-09T13:36:38.610401836Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
2022-05-09T13:36:38.610488690Z level=info msg="Add CiliumNode to garbage collector candidates" nodeName=dummy-one subsys=watchers
2022-05-09T13:36:44.338787619Z level=debug msg="Node without taint and with CiliumIsUp condition" nodeName=minikube subsys=watchers
...
2022-05-09T13:41:38.611438448Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
2022-05-09T13:41:38.611484484Z level=info msg="Perform GC for invalid CiliumNode" nodeName=dummy-one subsys=watchers
2022-05-09T13:41:38.618173682Z level=info msg="CiliumNode is garbage collected successfully" nodeName=dummy-one subsys=watchers
2022-05-09T13:41:50.655388256Z level=debug msg="Node without taint and with CiliumIsUp condition" nodeName=minikube subsys=watchers

@nathanjsweet
Copy link
Member

To not break CI, we need to kind of backport the following PRs:

@aanm aanm force-pushed the pr/v1.10-backport-2022-05-06 branch from d1dcf03 to 68690a0 Compare May 9, 2022 17:39
@aanm
Copy link
Member

aanm commented May 9, 2022

/test-only --focus="K8sUpdates.*Tests upgrade and downgrade from a Cilium stable image to master" --k8s_version=1.16 --kernel_version=net-next

aanm and others added 6 commits May 9, 2022 21:09
[ upstream commit 88d31cd ]

The upgrade tests are using the official helm charts with unreleased
Cilium images. This can might cause the upgrade tests to fail in case
the changes done in the unreleased Cilium versions require a new helm
chart release. To fix this problem the upgrade tests will now use the
unreleased helm charts as well.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 67f74ff ]

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit e80f60d ]

This commit is to perform lift and shift current initialization for k8s
node watcher to another function with single scope of work, so that it
can be re-used later. There is no change in logic.

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 64b37e1 ]

This is to make sure that k8s node watcher is only setup at max once.
Also, synced channel is added, so that the consumer of this store
knows if syncing process is done.

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit edc1a0a ]

In the normal scenario, CiliumNode is created by agent with owner
references attached all time in below PR[0]. However, there could
be the case that CiliumNode is created by IPAM module[1], which
didn't have any ownerReferences at all. For this case, if the
corresponding node got terminated and never came back with same
name, the CiliumNode resource is still dangling, and needs to be
garbage collected.

This commit is to add garbage collector for CiliumNode, with below
logic:

- Gargage collector will run every predefined interval (e.g. specify
  by flag --nodes-gc-interval)
- Each run will check if CiliumNode is having a counterpart k8s node
  resource. Also, remove this node from GC candidate if required.
- If yes, CiliumNode is considered as valid, happy day.
- If no, check if ownerReferences are set.
  - If yes, let k8s perform garbage collection.
  - If no, mark the node as GC candidate. If in the next run, this
    node is still in GC candidate, remove it.

References:

[0]: cilium#17329
[1]: https://github.com/cilium/cilium/blob/master/pkg/ipam/allocator/podcidr/podcidr.go#L258

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 167fa2b ]

This commit is to map operator CLI node GC interval flag in helm value
and config map.

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
@aanm aanm force-pushed the pr/v1.10-backport-2022-05-06 branch from 68690a0 to 4e7ca33 Compare May 9, 2022 19:10
@aanm
Copy link
Member

aanm commented May 9, 2022

/test-only --focus="K8sUpdates.*Tests upgrade and downgrade from a Cilium stable image to master" --k8s_version=1.16 --kernel_version=net-next

@aanm
Copy link
Member

aanm commented May 9, 2022

/test-backport-1.10

@aanm aanm merged commit 3016cfb into cilium:v1.10 May 9, 2022
@kaworu kaworu deleted the pr/v1.10-backport-2022-05-06 branch May 10, 2022 07:11
@kaworu
Copy link
Member Author

kaworu commented May 10, 2022

To not break CI, we need to kind of backport the following PRs:

#19679
#19665
And wherever this line/commit came from

@nathanjsweet for both v1.11 and v1.10 correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants