Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: RuntimePolicies L3/L4 Checks: Found a JoinEP in Cilium logs: cilium_health is being torn down #10446

Closed
joestringer opened this issue Mar 4, 2020 · 1 comment
Labels
area/CI Continuous Integration testing issue or flake
Projects

Comments

@joestringer
Copy link
Member

Found in #10443:

https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/17678/testReport/junit/(root)/Suite-runtime/RuntimePolicies_L3_L4_Checks/

test_results_Cilium-PR-Ginkgo-Tests-Validated_17678_BDD-Test-PR-runtime.zip

Stacktrace

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/runtime-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:347
Found a "JoinEP: " in Cilium Logs
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/runtime-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:536

Standard Output

⚠️  Found a "JoinEP: " in logs
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 5
⚠️  Number of "level=warning" in logs: 7
Number of "Cilium API handler panicked" in logs: 0
⚠️  Number of "Goroutine took lock for more than" in logs: 16
Top 5 errors/warnings:
Error while rewriting endpoint BPF program
endpoint regeneration failed
Command execution failed
Regeneration of endpoint failed
JoinEP: Failed to load program

Interesting part in the cilium logs is here:

Mar 04 02:29:51 runtime cilium-agent[30623]: level=info msg="Exiting due to signal" signal=terminated subsys=daemon
Mar 04 02:29:51 runtime systemd[1]: Stopping cilium...
Mar 04 02:29:51 runtime cilium-agent[30623]: level=debug msg="Killing old health endpoint process" pidfile=/var/run/cilium/state/health-endpoint.pid subsys=cilium-health-launcher
Mar 04 02:29:51 runtime cilium-agent[30623]: level=debug msg="Didn't find existing device" error="Link not found" subsys=cilium-health-launcher veth=cilium_health
Mar 04 02:29:51 runtime cilium-agent[30623]: level=info msg="Shutting down... " subsys=daemon
Mar 04 02:29:51 runtime cilium-agent[30623]: level=info msg="Stopped serving cilium at unix:///var/run/cilium/cilium.sock" subsys=daemon
Mar 04 02:29:51 runtime cilium-agent[30623]: level=error msg="Command execution failed" cmd="[tc filter replace dev lxc_health ingress prio 1 handle 1 bpf da obj 3071_next/bpf_lxc.o sec from-container]" error="signal: terminated" subsys=datapath-loader
Mar 04 02:29:51 runtime cilium-agent[30623]: level=warning msg="JoinEP: Failed to load program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3071 error="Failed to load tc filter: signal: terminated" file-path=3071_next/bpf_lxc.o identity=4 ipv4=10.15.66.241 ipv6="f00d::a0f:0:0:3d85" k8sPodName=/ subsys=datapath-loader veth=lxc_health
Mar 04 02:29:51 runtime cilium-agent[30623]: level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3071 error="Failed to load tc filter: signal: terminated" identity=4 ipv4=10.15.66.241 ipv6="f00d::a0f:0:0:3d85" k8sPodName=/ subsys=endpoint

So the "JoinEP" log shows up, but the error is "Failed to load tc filter: signal: terminated".

Flake due to shutting down Cilium while regenerating.

If we see a flake like this on master, in theory we should be able to check for the context being cancelled and avoid complaining about it. But this should not have any real adverse effect on runtime.

@joestringer joestringer added the area/CI Continuous Integration testing issue or flake label Mar 4, 2020
@joestringer
Copy link
Member Author

I'm going to close this assuming we will not fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake
Projects
No open projects
1.6 CI
Awaiting triage
Development

No branches or pull requests

1 participant