Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cilium: Remove attached bpf_xdp upon "cilium cleanup" #19735

Merged
merged 1 commit into from
Jul 29, 2022

Conversation

zhanghe9702
Copy link
Contributor

@zhanghe9702 zhanghe9702 commented May 6, 2022

cilium cleanup doesn't remove a previously attached XDP progs by Cilium,
this commit is an complementary.

Fixes: #19586

Signed-off-by: zhang he zhanghe9702@163.com

@zhanghe9702 zhanghe9702 requested a review from a team as a code owner May 6, 2022 14:06
@zhanghe9702 zhanghe9702 requested review from a team and twpayne May 6, 2022 14:06
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 6, 2022
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for the late response and thanks for your contribution!

Could you use vishvananda/netlink and cilium/ebpf instead of ip and bpftool? We are now basically removing those two from our code base.

#19159

@zhanghe9702
Copy link
Contributor Author

that's no problem! i will update commit

@zhanghe9702
Copy link
Contributor Author

@YutaroHayakawa , as far as I know, BPF_PROG_GET_FD_BY_ID/BPF_OBJ_GET_INFO_BY_FD equivalence cilium/ebpf call is

@YutaroHayakawa
Copy link
Member

Ah, alright. In that case, please use bpftool for now. We appreciate if you add support for that :)

@zhanghe9702
Copy link
Contributor Author

i could add an bpftool show prog id equivalent func in link/program.go, do you have a better recommendation?

@zhanghe9702 zhanghe9702 force-pushed the pr/cleanup_xdp branch 2 times, most recently from f656b21 to 4951887 Compare May 20, 2022 05:47
@brb
Copy link
Member

brb commented May 23, 2022

@zhanghe9702 Thanks for the PR!

but it is go internal package function, so cilium/cilium can't use it directly! :-)

Maybe you could ping https://github.com/cilium/ebpf maintainers about exporting the functions (e.g., @ti-mo)? It would be good to not add a new code which depends on tools from which we are trying to migrate off.

@ti-mo
Copy link
Contributor

ti-mo commented May 23, 2022

sys.ProgGetFdById is only used by NewProgramFromID. which should be what you're looking for.

Copy link
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please s/Xdp/XDP/ everywhere. (except for netlink.LinkSetXdpFd, of course)

cilium/cmd/cleanup.go Outdated Show resolved Hide resolved
bpf/bpf_xdp.c Outdated Show resolved Hide resolved
cilium/cmd/cleanup.go Outdated Show resolved Hide resolved
@zhanghe9702 zhanghe9702 force-pushed the pr/cleanup_xdp branch 2 times, most recently from cca5837 to 673de27 Compare May 24, 2022 05:47
@zhanghe9702 zhanghe9702 requested a review from ti-mo May 24, 2022 06:14
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with Nit

@zhanghe9702 zhanghe9702 force-pushed the pr/cleanup_xdp branch 2 times, most recently from b7aecb7 to 4a4afa1 Compare May 31, 2022 08:44
cilium/cmd/cleanup.go Show resolved Hide resolved
cilium/cmd/cleanup.go Show resolved Hide resolved
@brb brb self-requested a review May 31, 2022 09:48
@twpayne twpayne added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label May 31, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 31, 2022
@zhanghe9702 zhanghe9702 force-pushed the pr/cleanup_xdp branch 2 times, most recently from 7b0e5ea to ab83db0 Compare May 31, 2022 11:19
cilium/cmd/cleanup.go Outdated Show resolved Hide resolved
@pchaigno pchaigno added the sig/loader Impacts the loading of BPF programs into the kernel. label Jun 6, 2022
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor comments below, but looks good to me otherwise. Thanks for tackling this! 🚀

cilium/cmd/cleanup.go Outdated Show resolved Hide resolved
cilium/cmd/cleanup.go Outdated Show resolved Hide resolved
return false, err
}

if strings.Contains(info.Name, "bpf_xdp_entry") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we ensure that we won't forget to update this if/when we update the function's name in bpf_xdp.c? AFAIK, we don't have any tests covering this. At the very least, we should add a comment in bpf_xdp.c.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is still needed IMO.

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

@brb brb requested a review from pchaigno June 16, 2022 09:39
@brb brb added dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs and removed dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs labels Jun 16, 2022
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging still needs to be fixed. See previous review.

@aanm
Copy link
Member

aanm commented Jul 8, 2022

/test

cilium cleanup doesn't remove a previously attached XDP progs by Cilium,
this commit is an complementary.

Fixes: cilium#19586

Signed-off-by: zhang he <zhanghe9702@163.com>
@joestringer
Copy link
Member

joestringer commented Jul 12, 2022

/test

Job 'Cilium-PR-K8s-1.22-kernel-5.4' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 10.0.0.62:80 from testclient-host-txjjm

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-5.4 so I can create one.

@zhanghe9702
Copy link
Contributor Author

zhanghe9702 commented Jul 13, 2022

just run it in local CI,passed :-)
K8S_VERSION=1.22 KERNEL=54 ginkgo --focus="k8s.*Host firewall With VXLAN$" --tags=integration_tests

00:05:49 STEP: Ensuring the namespace kube-system exists
00:05:54 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs")
00:05:56 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs") => <nil>
00:05:56 STEP: Preparing cluster
00:05:56 STEP: Labelling nodes
00:05:56 STEP: Cleaning up Cilium components
00:05:57 STEP: Running BeforeAll block for EntireTestsuite K8sDatapathConfig
00:05:58 STEP: Ensuring the namespace kube-system exists
00:05:58 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs")
00:05:58 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs") => <nil>
00:05:59 STEP: Running BeforeAll block for EntireTestsuite K8sDatapathConfig Host firewall
00:05:59 STEP: Installing Cilium
00:06:00 STEP: Waiting for Cilium to become ready
00:06:28 STEP: Restarting unmanaged pods coredns-69b675786c-58fz9 in namespace kube-system
00:06:39 STEP: Validating if Kubernetes DNS is deployed
00:06:39 STEP: Checking if deployment is ready
00:06:39 STEP: Checking if kube-dns service is plumbed correctly
00:06:39 STEP: Checking if DNS can resolve
00:06:39 STEP: Checking if pods have identity
00:06:40 STEP: Kubernetes DNS is not ready: %!s(<nil>)
00:06:40 STEP: Restarting Kubernetes DNS (-l k8s-app=kube-dns)
00:06:40 STEP: Waiting for Kubernetes DNS to become operational
00:06:40 STEP: Checking if deployment is ready
00:06:40 STEP: Kubernetes DNS is not ready yet: only 0 of 1 replicas are available
00:06:41 STEP: Checking if deployment is ready
00:06:41 STEP: Kubernetes DNS is not ready yet: only 0 of 1 replicas are available
00:06:42 STEP: Checking if deployment is ready
00:06:42 STEP: Checking if pods have identity
00:06:42 STEP: Checking if kube-dns service is plumbed correctly
00:06:42 STEP: Checking if DNS can resolve
00:06:43 STEP: Checking service kube-system/kube-dns plumbing in cilium pod cilium-svqmf: unable to find service backend 10.0.0.217:53 in cilium pod cilium-svqmf
00:06:43 STEP: Checking service kube-system/kube-dns plumbing in cilium pod cilium-j9m7s: unable to find service backend 10.0.0.217:53 in cilium pod cilium-j9m7s
00:06:48 STEP: Validating Cilium Installation
00:06:48 STEP: Performing Cilium controllers preflight check
00:06:48 STEP: Performing Cilium status preflight check
00:06:48 STEP: Performing Cilium health check
00:06:48 STEP: Checking whether host EP regenerated
00:06:49 STEP: Performing Cilium service preflight check
00:06:49 STEP: Performing K8s service preflight check
00:06:49 STEP: Cilium is not ready yet: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-j9m7s': Exitcode: 1 
Err: Process exited with status 1
Stdout:
         
Stderr:
         Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
         Error: Cannot get status/probe: Put "http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe": dial unix /var/run/cilium/health.sock: connect: no such file or directory
         
         command terminated with exit code 1
         

00:06:49 STEP: Performing Cilium status preflight check
00:06:49 STEP: Performing Cilium controllers preflight check
00:06:49 STEP: Performing Cilium health check
00:06:49 STEP: Checking whether host EP regenerated
00:06:50 STEP: Performing Cilium service preflight check
00:06:50 STEP: Performing K8s service preflight check
00:06:52 STEP: Waiting for cilium-operator to be ready
00:06:52 STEP: WaitforPods(namespace="kube-system", filter="-l name=cilium-operator")
00:06:52 STEP: WaitforPods(namespace="kube-system", filter="-l name=cilium-operator") => <nil>
00:06:52 STEP: Making sure all endpoints are in ready state
00:06:53 STEP: Creating namespace 202207140006k8sdatapathconfighostfirewallwithvxlan
00:06:53 STEP: Deploying demo_hostfw.yaml in namespace 202207140006k8sdatapathconfighostfirewallwithvxlan
00:07:40 STEP: Waiting for 4m0s for 8 pods of deployment demo_hostfw.yaml to become ready
00:07:40 STEP: WaitforNPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="")
00:07:44 STEP: WaitforNPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="") => <nil>
00:07:44 STEP: Applying policies /home/vagrant/go/src/github.com/cilium/cilium/test/k8s/manifests/host-policies.yaml
00:07:48 STEP: Checking host policies on egress to remote node
00:07:48 STEP: Checking host policies on egress to local pod
00:07:48 STEP: Checking host policies on ingress from remote node
00:07:48 STEP: Checking host policies on ingress from local pod
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClient")
00:07:48 STEP: Checking host policies on ingress from remote pod
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClient")
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClientHost")
00:07:48 STEP: Checking host policies on egress to remote pod
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClientHost")
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServerHost")
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClientHost")
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClient") => <nil>
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClientHost") => <nil>
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClientHost") => <nil>
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClient") => <nil>
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClientHost") => <nil>
00:07:48 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServerHost") => <nil>
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServer")
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServerHost")
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServer")
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServerHost")
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServerHost")
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServer") => <nil>
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClientHost")
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServerHost") => <nil>
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServer") => <nil>
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServerHost") => <nil>
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testServerHost") => <nil>
00:07:49 STEP: WaitforPods(namespace="202207140006k8sdatapathconfighostfirewallwithvxlan", filter="-l zgroup=testClientHost") => <nil>
=== Test Finished at 2022-07-14T00:07:55+08:00====
00:07:55 STEP: Running JustAfterEach block for EntireTestsuite K8sDatapathConfig
00:07:56 STEP: Running AfterEach for block EntireTestsuite K8sDatapathConfig Host firewall
00:07:56 STEP: Running AfterEach for block EntireTestsuite K8sDatapathConfig
00:07:56 STEP: Deleting deployment demo_hostfw.yaml
00:07:56 STEP: Deleting namespace 202207140006k8sdatapathconfighostfirewallwithvxlan
00:08:11 STEP: Running AfterEach for block EntireTestsuite
<Checks>
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 1
No errors/warnings found in logs
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
No errors/warnings found in logs
Number of "context deadline exceeded" in logs: 2
Number of "level=error" in logs: 0
⚠️  Number of "level=warning" in logs: 11
Number of "Cilium API handler panicked" in logs: 0
⚠️  Number of "Goroutine took lock for more than" in logs: 18
Top 4 errors/warnings:
Failed to remove old router IPs (restored IP: <nil>) from cilium_host. Manual intervention is required to remove all other old IPs.
Key allocation attempt failed
Session affinity for host reachable services needs kernel 5.7.0 or newer to work properly when accessed from inside cluster: the same service endpoint will be selected from all network namespaces on the host.
Waiting for k8s node information

</Checks>


• [SLOW TEST:9178.761 seconds]
K8sDatapathConfig
/home/zhanghe/go/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:473
  Host firewall
  /home/zhanghe/go/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:473
    With VXLAN
    /home/zhanghe/go/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:527
------------------------------
SSSSSSSSSSSSAfterSuite: not running on Jenkins; leaving VMs running for debugging

Ran 1 of 151 Specs in 9178.762 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 150 Skipped

@tklauser
Copy link
Member

/test-1.22-5.4

@aanm aanm merged commit 6ed1fe5 into cilium:master Jul 29, 2022
dylandreimerink added a commit to dylandreimerink/cilium that referenced this pull request Jul 29, 2022
PR cilium#20615 added a new test which relied on the name of the XDP entry
program. PR cilium#19735 changed this name. They were merged within short time
of each other, since the changes were not overlapping no merge conflict
resulted.

This commit fixes the test so it works with the new name.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
joestringer pushed a commit that referenced this pull request Jul 29, 2022
PR #20615 added a new test which relied on the name of the XDP entry
program. PR #19735 changed this name. They were merged within short time
of each other, since the changes were not overlapping no merge conflict
resulted.

This commit fixes the test so it works with the new name.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dezmodue pushed a commit to dezmodue/cilium that referenced this pull request Aug 10, 2022
PR cilium#20615 added a new test which relied on the name of the XDP entry
program. PR cilium#19735 changed this name. They were merged within short time
of each other, since the changes were not overlapping no merge conflict
resulted.

This commit fixes the test so it works with the new name.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove attached bpf_xdp upon "cilium cleanup"
9 participants