Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf,init.sh: make netdev bpf filter cleanup less eager #24336

Merged
merged 2 commits into from Mar 14, 2023

Conversation

ti-mo
Copy link
Contributor

@ti-mo ti-mo commented Mar 13, 2023

Due to an oversight when updating init.sh to deal with the new tc filter names for bpf progs after the introduction of Go-based loader/netlink attach, all interfaces in the host namespace that didn't contain the word 'cilium' would have their egress and ingress filters stripped. This included lxc interfaces and many others. lxc programs in particular would only be reattached when the endpoint got regenerated, which can take a while on nodes with many Pods. This caused connectivity interruptions in the meantime.

This commit changes the tc filter naming convention to converge on the changes introduced in 2e40d67 ("bpf: Finish rename of BPF programs to cil_ prefix"), using the bpf program (function) name containing the 'cil_' prefix. The 'cilium_' prefix is no longer included explicitly, instead opting for the program name suffixed by the interface name, e.g. cil_from_netdev-eth0.

init.sh no longer uses the term 'cilium' to trigger a removal of the interface's tc filters.

Fixes commit 2a7cef4 ("init,cleanup: remove TC filters containing 'cilium' in their names").

Fix Pod connectivity interruption during agent restart

Fixes: #24191

@ti-mo ti-mo added kind/bug This is a bug in the Cilium logic. sig/loader Impacts the loading of BPF programs into the kernel. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. labels Mar 13, 2023
@ti-mo ti-mo requested review from a team as code owners March 13, 2023 12:15
@ti-mo ti-mo requested review from rgo3 and rolinh March 13, 2023 12:15
@ti-mo ti-mo added the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Mar 13, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.13.1 Mar 13, 2023
@ti-mo
Copy link
Contributor Author

ti-mo commented Mar 13, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 192.168.56.11:80 from testclient-4z9zf

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-4.19 so I can create one.

@borkmann borkmann requested a review from ldelossa March 13, 2023 12:51
Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are here, let's extend the K8sUpdates test to check whether flows got interrupted. In particular, we can reuse https://github.com/cilium/cilium/blob/master/test/k8s/updates.go#L328, and call the function after restarting Cilium.

@pchaigno
Copy link
Member

While we are here, let's extend the K8sUpdates test to check whether flows got interrupted. In particular, we can reuse https://github.com/cilium/cilium/blob/master/test/k8s/updates.go#L328, and call the function after restarting Cilium.

Do we mean to supplant K8sAgentChaosTest Restart with long lived connections with K8sUpdates?

Due to an oversight when updating init.sh to deal with the new tc filter names
for bpf progs after the introduction of Go-based loader/netlink attach, all
interfaces in the host namespace that didn't contain the word 'cilium' would
have their egress and ingress filters stripped. This included lxc interfaces
and many others. lxc programs in particular would only be reattached when the
endpoint got regenerated, which can take a while on nodes with many Pods.
This caused connectivity interruptions in the meantime.

This commit changes the tc filter naming convention to converge on the changes
introduced in 2e40d67 ("bpf: Finish rename of BPF programs to cil_ prefix"),
using the bpf program (function) name containing the 'cil_' prefix. The
'cilium_' prefix is no longer included explicitly, instead opting for the
program name suffixed by the interface name, e.g. cil_from_netdev-eth0.

init.sh no longer uses the term 'cilium' to trigger a removal of the
interface's tc filters. Also switched over to a regex that acts on a word
boundary to reduce the chance of a false positive (e.g. a filter pencil_foo
installed by another tool should not trigger removal).

Fixes commit 2a7cef4 ("init,cleanup: remove TC filters containing 'cilium'
in their names").

Signed-off-by: Timo Beckers <timo@isovalent.com>
@ti-mo ti-mo requested review from a team as code owners March 14, 2023 09:59
@ti-mo ti-mo requested a review from christarazi March 14, 2023 09:59
We would only test service interruptions using up/downgrades, but agent
restarts should also be clean.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Co-authored-by: Martynas Pumputis <m@lambda.lt>
@ti-mo
Copy link
Contributor Author

ti-mo commented Mar 14, 2023

/test

@pchaigno pchaigno requested a review from brb March 14, 2023 14:21
@ldelossa
Copy link
Contributor

Can confirm, no RST or client/server disruption is present after this PR.

@joestringer joestringer merged commit bf1171c into cilium:master Mar 14, 2023
56 checks passed
@nebril nebril mentioned this pull request Mar 14, 2023
3 tasks
@nebril nebril added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Mar 14, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.13 in 1.13.1 Mar 14, 2023
@ti-mo ti-mo deleted the tb/fix-bpf-host-detach branch March 15, 2023 08:18
@nebril nebril mentioned this pull request Mar 15, 2023
1 task
@nebril nebril added this to Backport pending to v1.13 in 1.13.2 Mar 15, 2023
@nebril nebril removed this from Backport pending to v1.13 in 1.13.1 Mar 15, 2023
@julianwiedmann julianwiedmann added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Apr 10, 2023
@gentoo-root gentoo-root moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.2 Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
No open projects
1.13.2
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

Cilium 1.13.0 restart requires an app reconnect
7 participants