-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.15 Backports 2024-01-12 #30230
v1.15 Backports 2024-01-12 #30230
Conversation
[ upstream commit 4ad5718 ] Signed-off-by: renovate[bot] <bot@renovateapp.com>
[ upstream commit f8a7616 ] We want to mimic GetBPFCPU(), which selects mcpu=v3 for 5.10+ kernels. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 3faa67b ] The device reloader introduced a very similar variant of the race it was created to avoid. Specifically, if the node addressing table was updated after a device change, but before the device reloader re-queriyed the table, the reloader would miss the update. This results in node-port services potentially not being available on newly added devices, even though runtime device detection is enabled. Fixes: 8901eab (daemon: add a device reloader that isn't as racy) Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1f68e26 ] The loopback address mask value should be '0xff000000': https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/in.h?h=v6.7-rc6#n38 static inline bool ipv4_is_loopback(__be32 addr) { return (addr & htonl(0xff000000)) == htonl(0x7f000000); } Fixes: 50c947d ("bpf: refine wild card lookup for node port services from host") Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 69823aa ] This enhances the BGP GetRoutes API to allow retrieving advertised BGP routes for all configured peers, without the need for specifying an exact peer address. The peer address is included in the BgpRoute response in case that the source of the route is adj-rib-in or adj-rib-out. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 0c051b5 ] This enhances the `cilium-dbg bgp routes` command to allow listing advertised routes without the need to specify an exact peer IP - in that case, advertised routes of all peers will be listed. Also adds defaulting of table type and AFI / SAFI, to allow simple use with even less typing. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 87f1106 ] Includes cilium-dbg commands for dumping IPv4 and IPv6 advertised BGP routes of each configured BGP peer in the bugtool. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 684c58e ] Use the predefined const. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1281f1c ] Allocate a label map of the correct known size. name old time/op new time/op delta NewFrom-8 638ns ± 8% 71ns ± 6% -88.87% (p=0.000 n=10+10) name old alloc/op new alloc/op delta NewFrom-8 624B ± 0% 0B -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta NewFrom-8 2.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 4a1bc1b ] Following a bug on the v1.13 branch in the way we update the list of ipset entries in the node manager, add a test to ensure that entries are added or removed as we expect on node updates. This is a forward port of the test added first to branch v1.13, commit 9ec163aff6f ("pkg/node/manager: Add test to validate ipset entries"), although the test has been reworked to use a mock ipset manager instead of checking the real ipset (which required privileges). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit cbe3037 ] With iptable-based masquerading, we use some iptables rules to exclude certain flows from masquerading. These rules rely on IP sets (managed with ipset). There are some cases where we create entries and immediately remove them afterwards. The issue only occurs for the extra Kubernetes IPs) not the CiliumInternalIP) on the node, and only with both node encryption disabled and --enable-remote-node-identity=false. Here is a trimmed version of the previous code to follow the logics. When we update an existing node, and that the configuration is such that nodeAddressSkipsIPCache() returns true, then we create an ipset entry and immediately delete it after. Follow the numeroted comments in the code: func (m *manager) legacyNodeIpBehavior() bool { if m.conf.RemoteNodeIdentitiesEnabled() { return false } if m.conf.NodeEncryptionEnabled() { return false } return true } func (m *manager) nodeAddressSkipsIPCache(address nodeTypes.Address) bool { return m.legacyNodeIpBehavior() && address.Type != addressing.NodeCiliumInternalIP } func (m *manager) NodeUpdated(n nodeTypes.Node) { for _, address := range n.IPAddresses { // 1) We add the IP to the ipset if m.conf.NodeIpsetNeeded() && address.Type == addressing.NodeInternalIP { m.ipsetMgr.AddToNodeIpset(address.IP) } // 2) Say we skip the ipcache update here based on config... if m.nodeAddressSkipsIPCache(address) { continue } ... // 3) ... Then we don't add the IP to nodeIPsAdded, even though we added it to the ipset nodeIPsAdded = append(nodeIPsAdded, prefix) } ... entry, oldNodeExists := m.nodes[nodeIdentifier] if oldNodeExists { ... // 4) We call m.removeNodeFromIPCache(), passing nodeIPsAdded m.removeNodeFromIPCache(oldNode, resource, nodeIPsAdded, healthIPsAdded, ingressIPsAdded) ... } ... } func (m *manager) removeNodeFromIPCache(oldNode nodeTypes.Node, resource ipcacheTypes.ResourceID, nodeIPsAdded, healthIPsAdded, ingressIPsAdded []netip.Prefix) { ... for _, address := range oldNode.IPAddresses { oldPrefix := ip.IPToNetPrefix(address.IP) // 5) oldPrefix is not in nodeIPsAdded, we don't loop yet, we carry on to 6) if slices.Contains(nodeIPsAdded, oldPrefix) { continue } // 6) We remove the ipset entry that we just added if m.conf.NodeIpsetNeeded() && address.Type == addressing.NodeInternalIP { m.ipsetMgr.RemoveFromNodeIpset(address.IP) } if m.nodeAddressSkipsIPCache(address) { continue } ... } ... } This looks incorrect. If we just created the entry, we should keep it. Worse, if the node exists, and we try to create the ipset entry, then the ipset invocation will be a no-op because ipset won't add a duplicate entry, but then removing the entry just after that will remove the initial entry from node creation and may impact masquerading on the node. How do we change this? There are a comple of things that we can consider: - We can move 3) towards 1), before 2). In other words: we can update nodeIPsAdded as soon as we insert the ipset entry, making this list consistent with the ipset entries we created. However, this is incorrect, because if we skip the ipcache entry insertion in 2), then the list in nodeIPsAdded is inconsistent regarding the ipcache. - Another option would be to move 1) towards 3), after 2). In other words, insert the new ipset entry only if we do not skip the ipcache update, thus tying the creation of the ipset and ipcache entries together. However, this would change the current behaviour, and is not the desirable outcome: we want to skip masquerading for the NodeInternalIP (by adding the corresponding entry to the ipset) regardless of whether remote node identities are set or not (thanks to Paul Chaignon for the clarification here). - Given that we want to preserve the current dissociation between the insertion of the ipset and ipcache entries, we need a separate list. Therefore, this commit addresses the issue by adding a separate slice to record the ipset entries we created during the node update, to avoid removing any of them even if we skip the ipcache update. Historically: - Commit 49cb220 ("iptables: Don't masquerade traffic to cluster nodes") introduced the ipset entry creation. It's fine because we wouldn't ever remove the ipset entries on node updates, so we could 1) add the ipset entry, 2) skipIPCache() and avoid updating ipsAdded, since we didn't use ipsAdded for ipset operations. - Commit d5e5bf3 ("node/manager: Remove ipset config from previous node state") introduced the issue, because it adds the ipset entry removal to NodeUpdated(), regardless of whether we just added the entry. - Commit e7e4abb ("node/manager: Only remove old IPs if they weren't already added") attempts to fix this, but is broken because 1) the slice lookup never succeed (fixed in f2a4312 ("nodemanager: Move to async IPCache API")) and 2) ipsAdded may or may not contain the relevant entries, depending on whether we skipped the ipcache operations. Found by code inspection, while investigating CI flakes on branch v1.13 due to the slice lookups on ipsAdded never succeeding and the IP not being excluded from masquerading as we expected. Fixes: d5e5bf3 ("node/manager: Remove ipset config from previous node state") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit bf42e1f ] Signed-off-by: Tamilmani <tamanoha@microsoft.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 88d4230 ] When the cilium-envoy ServiceMonitor was last updated, the required lookup for .annotation was left out. Signed-off-by: cornfeedhobo <cornfeedhobo@fuzzlabs.org> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 2f3e3ec ] This commit adds the app.kubernetes.io/name label to cilium-envoy ServiceMonitor, matching the convention in cilium-operator. Signed-off-by: cornfeedhobo <cornfeedhobo@fuzzlabs.org> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 5931a80 ] The CEP and CES stores getter was originally meant for Hubble, but it is not used anymore, so it is possible to clean it up. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 56f4f94 ] Use a releasable Resource[CiliumSlimEndpoint] to refactor the agent CEP watcher. Like the previous implementation, the underlying informer is stopped as soon as the subscription to the events stream is canceled. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit b0164d4 ] Use a releasable Resource[CiliumEndpointSlice] to refactor the agent CES watcher. Like the previous implementation, the underlying informer is stopped as soon as the subscription to the events stream is canceled. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 9c336fb ] CiliumEndpoint and CiliumEndpointSlice resources are backed by a releasable resource. To allow stopping the underlying informer when it is not needed anymore, the cleanup routine should release the reference to the store after collecting the stale CEPs. Also, this keeps the CiliumEndpoint cleanup routine fully working even when the agent is configured to use an external key-value store. In that scenario, the routine starts the k8s informer to list the CEPs or CESs and, once all the stale CEPs have been collected, releases the resource to stop the informer. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit ae64d83 ] Signed-off-by: Nico Vibert <nicolas.vibert@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 20975c9 ] This re-creates similar behavior as in v1.14 - FQDN requests are serialized on a per-name basis. This was removed as part of the larger fqdn refactor but, it turns out, that was premature. The difference between v1.14 and now is that this locks per-name, rather than per-IP. It preserves the prime-sized count of bucketed locks. This is needed because we release the lock on the NameManager before policy updates have been fully pushed out to all endpoints. If another endpoint requests the same name, it may have the DNS response returned to it before policy updates are complete. However, the NameManager will detect the noop case, take a shortcut, and release the DNS response before policy updates are potentially complete. For precise details, see the comment in fqdn.go. Fixes: #30014 Fixes: 507259f Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit a661456 ] Currently, errors during writing the status in the defer function aren't returned to the caller (controller-runtime). Hence, a successful reconciliation (error = nil) but a failure during writing the status would result in a non-updated status (because it wouldn't result in an additional reconciliation). To avoid having to introduce named return params (to let the defer function modifying the returned error), this commit gets rid of the defer statement at all and replaced it with explicit calls to a newly introduced function `handleReconcileErrorWithStatus` that tries to update the resource status and reflects any errors during updating the status in the returned error. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit eed73af ] This small fix prevents allocating a local identity for IPs with names that are not selected by a toFQDN selector. Without this change, an identity is allocated for every IP included in an intercepted DNS response. For the (common) case where all DNS requests are proxied, this could potentially lead to a waste of resources and thus a performance regression. Previously (v1.14 and before), we did not allocate identities for un-selected IPs. That was inadvertently changed in #29036. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit e71407b ] This change adds a check in the auth manager not to do authentication for either IDs if they are reserved. This could have caused failed handshakes to happen should any of these entities be allowed by policy but with mutual auth enabled. These IDs do are not able to ever complete a handshake as they are not generated by design. This commit also replaces all IDs used in tests to be high enough that they do not conflict with reserve IDs. Signed-off-by: Maartje Eyskens <maartje@eyskens.me> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit b193daa ] This sets the default toleration for SPIRE agent to be allowed on the control plane nodes. This allows Cilium Agent on these nodes to get attested by SPIRE for Mutual Authentication to work. Signed-off-by: Maartje Eyskens <maartje.eyskens@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit cddb591 ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 5d0757d ] In case the requirements for BPF masquerade are not met, fail the daemon initialization instead of silently fall back to iptables based masquerading. This is needed because the iptables cell does not support runtime configuration changes. Fixes: a6a2b73 ("iptables: Add a cell for iptables config manager") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jibi !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commit looks good, thanks!
/test-backport-1.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good, thanks Jibi!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 my PR LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and looks good for my commits
cilium-dbg bgp routes advertised
command without specifying a peer #30033 (@rastislavs)Once this PR is merged, a GitHub action will update the labels of these PRs: