Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complexity issue with socket-level LB disabled on Linux 5.10 and Cilium 1.8.7 #15249

Closed
dimitri-fert opened this issue Mar 8, 2021 · 6 comments
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/complexity-issue Relates to BPF complexity or program size issues pinned These issues are not marked stale by our issue bot. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@dimitri-fert
Copy link

Bug report

Our Cilium pods report endless tc filter command execution failure and failed endpoint regeneration.

2021-03-08T15:07:06.340417395Z level=warning msg="Error fetching program/map!" subsys=datapath-loader
2021-03-08T15:07:06.340420108Z level=warning msg="Unable to load program" subsys=datapath-loader
2021-03-08T15:07:06.341410331Z level=warning msg="JoinEP: Failed to load program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1189 error="Failed to load tc filter: exit status 1" file-path=1189_next/bpf_lxc.o identity=18253 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=lxce93e412f00d4
2021-03-08T15:07:06.341422098Z level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1189 error="Failed to load tc filter: exit status 1" identity=18253 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-03-08T15:07:06.341470624Z level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1189 file-path=1189_next_fail identity=18253 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-03-08T15:07:06.341660534Z level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=18.259241069s bpfWaitForELF="3.782µs" bpfWriteELF="143.926µs" buildDuration=18.261654996s containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1189 error="Failed to load tc filter: exit status 1" identity=18253 ipv4= ipv6= k8sPodName=/ mapSync="3.032µs" policyCalculation="4.532µs" prepareBuild="315.831µs" proxyConfiguration="8.471µs" proxyPolicyCalculation="15.876µs" proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint waitingForCTClean=1.344295ms waitingForLock=884ns
2021-03-08T15:07:06.341771668Z level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1189 error="Failed to load tc filter: exit status 1" identity=18253 ipv4= ipv6= k8sPodName=/ subsys=endpoint
2021-03-08T15:07:10.388245933Z level=error msg="Command execution failed" cmd="[tc filter replace dev lxc_health ingress prio 1 handle 1 bpf da obj 738_next/bpf_lxc.o sec from-container]" error="exit status 1" subsys=datapath-loader
2021-03-08T15:07:10.388278136Z level=warning subsys=datapath-loader
2021-03-08T15:07:10.388283365Z level=warning msg="Prog section 'from-container' rejected: Argument list too long (7)!" subsys=datapath-loader
2021-03-08T15:07:10.388287863Z level=warning msg=" - Type:         3" subsys=datapath-loader
2021-03-08T15:07:10.388291809Z level=warning msg=" - Attach Type:  0" subsys=datapath-loader
2021-03-08T15:07:10.388295914Z level=warning msg=" - Instructions: 3016 (0 over limit)" subsys=datapath-loader
2021-03-08T15:07:10.388299994Z level=warning msg=" - License:      GPL" subsys=datapath-loader
2021-03-08T15:07:10.388304162Z level=warning subsys=datapath-loader
2021-03-08T15:07:10.388308095Z level=warning msg="Verifier analysis:" subsys=datapath-loader
2021-03-08T15:07:10.388312125Z level=warning subsys=datapath-loader
2021-03-08T15:07:10.388315926Z level=warning msg="Skipped 200975 bytes, use 'verb' option for the full verbose log." subsys=datapath-loader
2021-03-08T15:07:10.388320560Z level=warning msg="[...]" subsys=datapath-loader
2021-03-08T15:07:10.388324773Z level=warning msg="=0,ks=4,vs=1,imm=0) R6_r=ctx(id=0,off=0,imm=0) R7=inv2 R8_r=invP1 R9_r=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=????mmmm fp-32_r=????mmmm fp-40_r=mmmm0000 fp-80_r=mmmmmmmm fp-88_r=mmmmmmmm fp-96_r=mmmmmmmm fp-104=????mmmm fp-112_r=??mmmmmm fp-120_r=mmmmmmmm fp-128_r=??mmmmmm fp-136=????00mm fp-144=00000000 fp-152_r=0000mmmm fp-176_r=????00mm fp-184=inv fp-192=inv fp-200=inv16 fp-208=00000000 fp-216_r=invP1 fp-224_rw=inv fp-232_r=inv fp-240_r=inv fp-248_r=invP fp-256_r=invP19594921 fp-264_r=inv fp-272_r=invP9001 fp-280=00000000" subsys=datapath-loader
2021-03-08T15:07:10.388343336Z level=warning msg="parent already had regs=0 stack=4000000 marks" subsys=datapath-loader
2021-03-08T15:07:10.388349005Z level=warning msg="1191: (15) if r2 == 0x1 goto pc+51" subsys=datapath-loader
2021-03-08T15:07:10.388353429Z level=warning msg="1243: (79) r1 = *(u64 *)(r10 -232)" subsys=datapath-loader
2021-03-08T15:07:10.388357404Z level=warning msg="1244: (57) r1 &= 65535" subsys=datapath-loader
2021-03-08T15:07:10.388361851Z level=warning msg="1245: (69) r2 = *(u16 *)(r10 -152)" subsys=datapath-loader
2021-03-08T15:07:10.388372204Z level=warning msg="1246: (1d) if r2 == r1 goto pc+992" subsys=datapath-loader
2021-03-08T15:07:10.388377062Z level=warning msg=" R0=map_value(id=0,off=0,ks=8,vs=24,imm=0) R1_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R2_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R6=ctx(id=0,off=0,imm=0) R7=inv(id=57680,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=inv4294967133 R9=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0 fp-8=????mmmm fp-32=????mmmm fp-40=mmmmmmmm fp-80=mmmmmmmm fp-88=mmmmmmmm fp-96=mmmmmmmm fp-104=????mmmm fp-112=??mmmmmm fp-120=mmmmmmmm fp-128=??mmmmmm fp-136=????00mm fp-144=00000000 fp-152=0000mmmm fp-176=????00mm fp-184=inv fp-192=inv fp-200=inv16 fp-208=00000000 fp-216=inv1 fp-224=inv fp-232=inv fp-240=inv fp-248=inv fp-256=inv19594921 fp-264=inv fp-272=inv9001 fp-280=00000000" subsys=datapath-loader
2021-03-08T15:07:10.388383599Z level=warning msg="1247: (05) goto pc+508" subsys=datapath-loader
2021-03-08T15:07:10.388388142Z level=warning msg="1756: (79) r4 = *(u64 *)(r10 -248)" subsys=datapath-loader
2021-03-08T15:07:10.388392237Z level=warning msg="1757: (bf) r2 = r4" subsys=datapath-loader
2021-03-08T15:07:10.388396441Z level=warning msg="1758: (67) r2 <<= 3" subsys=datapath-loader
2021-03-08T15:07:10.388401229Z level=warning msg="1759: (57) r2 &= 8" subsys=datapath-loader
2021-03-08T15:07:10.388405309Z level=warning msg="1760: (67) r4 <<= 4" subsys=datapath-loader
2021-03-08T15:07:10.388409132Z level=warning msg="1761: (bf) r1 = r4" subsys=datapath-loader
2021-03-08T15:07:10.388413102Z level=warning msg="1762: (57) r1 &= 32" subsys=datapath-loader
2021-03-08T15:07:10.388417279Z level=warning msg="1763: (4f) r1 |= r2" subsys=datapath-loader
2021-03-08T15:07:10.388422228Z level=warning msg="1764: (57) r4 &= 128" subsys=datapath-loader
2021-03-08T15:07:10.388426346Z level=warning msg="1765: (71) r2 = *(u8 *)(r10 -108)" subsys=datapath-loader
2021-03-08T15:07:10.388430386Z level=warning msg="1766: (b7) r3 = 0" subsys=datapath-loader
2021-03-08T15:07:10.388434438Z level=warning msg="1767: (7b) *(u64 *)(r10 -56) = r3" subsys=datapath-loader
2021-03-08T15:07:10.388438503Z level=warning msg="last_idx 1767 first_idx 1190" subsys=datapath-loader
2021-03-08T15:07:10.388442326Z level=warning msg="regs=8 stack=0 before 1766: (b7) r3 = 0" subsys=datapath-loader
2021-03-08T15:07:10.388446519Z level=warning msg="1768: (79) r5 = *(u64 *)(r10 -240)" subsys=datapath-loader
2021-03-08T15:07:10.388455816Z level=warning msg="1769: (6b) *(u16 *)(r10 -56) = r5" subsys=datapath-loader
2021-03-08T15:07:10.388461274Z level=warning msg="1770: (7b) *(u64 *)(r10 -64) = r3" subsys=datapath-loader
2021-03-08T15:07:10.388465403Z level=warning msg="1771: (79) r5 = *(u64 *)(r10 -232)" subsys=datapath-loader
2021-03-08T15:07:10.388469655Z level=warning msg="1772: (6b) *(u16 *)(r10 -58) = r5" subsys=datapath-loader
2021-03-08T15:07:10.388473583Z level=warning msg="1773: (18) r5 = 0xffff929ab1214400" subsys=datapath-loader
2021-03-08T15:07:10.388477463Z level=warning msg="1775: (7b) *(u64 *)(r10 -184) = r5" subsys=datapath-loader
2021-03-08T15:07:10.388481296Z level=warning msg="1776: (15) if r2 == 0x6 goto pc+3" subsys=datapath-loader
2021-03-08T15:07:10.388485418Z level=warning msg=" R0=map_value(id=0,off=0,ks=8,vs=24,imm=0) R1=inv(id=0,umin_value=8,umax_value=40,var_off=(0x8; 0x20)) R2=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R3=invP0 R4=inv(id=0,umax_value=128,var_off=(0x0; 0x80)) R5=map_ptr(id=0,off=0,ks=14,vs=56,imm=0) R6=ctx(id=0,off=0,imm=0) R7=inv(id=57680,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=inv4294967133 R9=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0 fp-8=????mmmm fp-32=????mmmm fp-40=mmmmmmmm fp-56=000000mm fp-64=mm000000 fp-80=mmmmmmmm fp-88=mmmmmmmm fp-96=mmmmmmmm fp-104=????mmmm fp-112=??mmmmmm fp-120=mmmmmmmm fp-128=??mmmmmm fp-136=????00mm fp-144=00000000 fp-152=0000mmmm fp-176=????00mm fp-184=map_ptr fp-192=inv fp-200=inv16 fp-208=00000000 fp-216=inv1 fp-224=inv fp-232=inv fp-240=inv fp-248=inv fp-256=inv19594921 fp-264=inv fp-272=inv9001 fp-280=00000000" subsys=datapath-loader
2021-03-08T15:07:10.388492661Z level=warning msg="1777: (18) r5 = 0xffff929ab1210800" subsys=datapath-loader
2021-03-08T15:07:10.388496788Z level=warning msg="1779: (7b) *(u64 *)(r10 -184) = r5" subsys=datapath-loader
2021-03-08T15:07:10.388500653Z level=warning msg="1780: (4f) r1 |= r4" subsys=datapath-loader
2021-03-08T15:07:10.388504640Z level=warning msg="1781: (7b) *(u64 *)(r10 -232) = r9" subsys=datapath-loader
2021-03-08T15:07:10.388508590Z level=warning msg="1782: (bf) r5 = r9" subsys=datapath-loader
2021-03-08T15:07:10.388512560Z level=warning msg="1783: (67) r5 <<= 32" subsys=datapath-loader
2021-03-08T15:07:10.388516570Z level=warning msg="1784: (c7) r5 s>>= 32" subsys=datapath-loader
2021-03-08T15:07:10.388520870Z level=warning msg="1785: (b7) r9 = 1" subsys=datapath-loader
2021-03-08T15:07:10.388524679Z level=warning msg="1786: (b7) r4 = 1" subsys=datapath-loader
2021-03-08T15:07:10.388528460Z level=warning msg="1787: (65) if r5 s> 0x0 goto pc+1" subsys=datapath-loader
2021-03-08T15:07:10.388533449Z level=warning msg=" R0=map_value(id=0,off=0,ks=8,vs=24,imm=0) R1_w=inv(id=0) R2=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R3=invP0 R4_w=inv1 R5_w=inv0 R6=ctx(id=0,off=0,imm=0) R7=inv(id=57680,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=inv4294967133 R9_w=inv1 R10=fp0 fp-8=????mmmm fp-32=????mmmm fp-40=mmmmmmmm fp-56=000000mm fp-64=mm000000 fp-80=mmmmmmmm fp-88=mmmmmmmm fp-96=mmmmmmmm fp-104=????mmmm fp-112=??mmmmmm fp-120=mmmmmmmm fp-128=??mmmmmm fp-136=????00mm fp-144=00000000 fp-152=0000mmmm fp-176=????00mm fp-184_w=map_ptr fp-192=inv fp-200=inv16 fp-208=00000000 fp-216=inv1 fp-224=inv fp-232_w=inv fp-240=inv fp-248=inv fp-256=inv19594921 fp-264=inv fp-272=inv9001 fp-280=00000000" subsys=datapath-loader
2021-03-08T15:07:10.388544248Z level=warning msg="1788: (b7) r4 = 0" subsys=datapath-loader
2021-03-08T15:07:10.388548928Z level=warning msg="1789: (67) r4 <<= 6" subsys=datapath-loader
2021-03-08T15:07:10.388553093Z level=warning msg="BPF program is too large. Processed 1000001 insn" subsys=datapath-loader
2021-03-08T15:07:10.388557280Z level=warning msg="processed 1000001 insns (limit 1000000) max_states_per_insn 34 total_states 71285 peak_states 1988 mark_read 86" subsys=datapath-loader
2021-03-08T15:07:10.388561428Z level=warning subsys=datapath-loader

General Information

  • Cilium version : 1.8.7
  • Kernel version : 5.10.0-0.bpo.3-amd64 | (2021-02-11) x86_64 GNU/Linux
  • Orchestration system version in use : ** Kubernetes v1.18.16 ** / ** kOps v1.18.3 **
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.7", GitCommit:"bfb38f707bc4a8edfcd73472ec3d96b500b8b781", GitTreeState:"clean", BuildDate:"2020-08-12T20:27:48Z", GoVersion:"go1.13.14", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.16", GitCommit:"7a98bb2b7c9112935387825f2fce1b7d40b76236", GitTreeState:"clean", BuildDate:"2021-02-17T11:52:32Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

How to reproduce the issue

We are encountering this error after using kops rolling-update in order to upgrade our AMI tand upgrade our cluster to :

  • Kubernetes v1.18.16 --> v1.19.1
  • kOps v1.18.3 --> v1.19.1
  • Cilium v1.7.12 --> v1.8.7

We have a 2nd cluster using the same kernel version (5.10.0) but running Kubernetes v1.18.16 / kOps v1.18.3 and Cilium 1.7.12 and we don't encounter this problem so far.

@dimitri-fert dimitri-fert added the kind/bug This is a bug in the Cilium logic. label Mar 8, 2021
@pchaigno pchaigno added kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/complexity-issue Relates to BPF complexity or program size issues need-more-info More information is required to further debug or fix the issue. needs/triage This issue requires triaging to establish severity and next steps. labels Mar 9, 2021
@pchaigno
Copy link
Member

pchaigno commented Mar 9, 2021

Thanks for reporting this! It looks like a BPF complexity issue caused by the newer 5.10 kernel, similar to #14964 (comment).

Could you share a bugtool of one of the failing nodes so that we can triage this based on datapath configuration?

@dimitri-fert
Copy link
Author

Hi @pchaigno, thank you for your time. It seems like a BPF complexity issue indeed. Sure, I'll provide a bugtool, but I'll need some time to sanitize our sensitive data. I'll keep you posted.

@dimitri-fert
Copy link
Author

Here it is, sorry for the delay : bugtool-cilium.tar.gz

@pchaigno
Copy link
Member

From the config map:

  enable-ipv4: "true"
  enable-ipv6: "false"
  enable-metrics: "true"
  enable-node-port: "false"
  enable-remote-node-identity: "true"
  identity-allocation-mode: crd
  install-iptables-rules: "true"
  ipam: kubernetes
  kube-proxy-replacement: partial

It's a variation of #14726 but on v1.8.

@pchaigno pchaigno removed the need-more-info More information is required to further debug or fix the issue. label Mar 27, 2021
@stale

This comment has been minimized.

@stale stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jun 3, 2021
@pchaigno pchaigno added pinned These issues are not marked stale by our issue bot. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Jun 3, 2021
@pchaigno pchaigno changed the title Encountering many "tc filter replace" failure and "endpoint regeneration failed" errors on Cilium 1.8.7 Complexity issue with socket-level LB disabled on Linux 5.10 and Cilium 1.8.7 Jun 3, 2021
@pchaigno pchaigno removed the needs/triage This issue requires triaging to establish severity and next steps. label Jun 3, 2021
@aanm aanm added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Jan 6, 2022
@joestringer
Copy link
Member

Given @pchaigno 's previous post, it looks like this should have been addressed in v1.9 and later versions. It's likely that fixing this on v1.8 would prove too invasive. At this point v1.8.x is no longer supported by the community, so I'll close this out. If you observe a bug like this on a newer version of Cilium, please file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/complexity-issue Relates to BPF complexity or program size issues pinned These issues are not marked stale by our issue bot. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

No branches or pull requests

4 participants