Skip to content

[pull] main from cilium:main#117

Merged
pull[bot] merged 11 commits intodolfly:mainfrom
cilium:main
Aug 13, 2025
Merged

[pull] main from cilium:main#117
pull[bot] merged 11 commits intodolfly:mainfrom
cilium:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Aug 13, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

joamaki and others added 11 commits August 13, 2025 07:01
Wait for the prune to actually happen in the lb/prune command
to make tests that e.g. do BPF state restoration more reliable
as then we won't have a prune racing in the background.

Update migrate-any-proto.txtar to call lb/prune before restoration
to avoid a race.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
While the StateDB reconciler never calls the Update/Delete/Prune
concurrently, we do want to be able to do BPFOps.ResetAndRestore
from a test script to clear out the state.

Since [sync.Mutex.Lock] is very cheap on an unlocked mutex, add
a mutex around the BPFOps state so that we can inspect and manipulate
it safely from tests and avoid very odd failures.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
This had changed when client-go was updated and this was causing
false positive goroutine leak failures.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
The backends table wasn't checked after service and endpoint slice removal
leading to sometimes adding the endpoints back before the deletions were
processed leading to re-use of old IDs.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
This should have never moved into 'Cell' as the whole point was to keep the legacy
metrics and global variables out of 'Cell' so tests can use it.

Fixes: 0b3672f ("pkg/metrics: prepare *metrics.Registry for use by operator.")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
This commit changes default GKE test region from us-east1-c to us-west1-c, due to test failure rate being hight.
This change reduces the number failures.

Signed-off-by: brlbil <birol@cilium.io>
When an L7 policy is in effect, iptables masquerading is used, and IPsec
is enabled, TCP/IPv6 connections going through the L7 proxy can't be
established, because Envoy connects to the upstream server using the pod
source IP, and it's never SNATed to the node IP, as it happens for IPv4.

Turns out, we simply lack an iptables rule that was added for IPv4 only
in the commit mentioned below in the Fixes tag. As described in its
commit message, they Envoy-to-upstream packet goes through POSTROUTING
twice: when egressing from cilium_host, and when ingressing into
cilium_net, but the kernel performs NAT only once, and the actual rule
that replaces the source IP is intended to be triggered on the second
POSTROUTING pass, when the input interface is cilium_net, and the output
interface is the external-facing one:

ip6tables -t nat -A CILIUM_POST_nat -s fd00:10:244:1::/64 ! -d fd00:10:244::/56 ! -o cilium_+ -m comment --comment "cilium masquerade non-cluster" -j MASQUERADE

Apply the same fix that we have for IPv4, and skip the first (useless)
NAT pass for IPv6 packets as well.

Fixes: 3384d73 ("iptables: Ensure iptables masquerading works for proxy traffic")
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
The comment says "No conntrack for proxy upstream traffic that is
heading to lxc+", but the actual rule applies to the matchProxyReply
mark, so the comment should actually read "proxy return traffic". Fix
the comment and swap the rules for lxc+ and cilium_host for consistency
with IPv4.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
We have these rules for IPv4, but not for IPv6. Add the missing rules.
Initially added for IPv4 in commit d1d8e7a ("datapath: Add support
for re-entering LXC egress path after L7 LB").

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Hadrien Patte <hadrien.patte@datadoghq.com>
When Kubelet gets started with --cloud-provider=external, then new Nodes get that annotation. CCM picks these new nodes and sets then ProviderID. But before CCM can start on the first control-plane of a new cluster, the CNI must be running. This means Cilum Operator needs a toleration for that taint.

Related: https://app.slack.com/client/T1MATJ4SZ/C53TG4J4R

In Cilium v1.17 the Cilium Operator had a toleration for all taints. This was changed in that PR: #40475
This PR extends the list of tolerations.

Fixes: aa9a24c (Change the default taints that Cilium tolerates to avoid deploying to a drained node)

Signed-off-by: Thomas Guettler <thomas.guettler@syself.com>
@pull pull bot locked and limited conversation to collaborators Aug 13, 2025
@pull pull bot added the ⤵️ pull label Aug 13, 2025
@pull pull bot merged commit 765ee79 into dolfly:main Aug 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants