tc nat make connection with two pods under the same service using the same port, causing the connection reset by peer #17457

ChangyuWang · 2021-09-23T07:54:29Z

Bug report

General Information

Cilium version (run cilium version)
Client: 1.9.0 go version go1.15.4 linux/amd64
Daemon: 1.9.0 go version go1.15.4 linux/amd64
Kernel version (run uname -a)
Linux 4.14.105-19-0019 SMP Fri Jan 15 11:39:34 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
Orchestration system version in use (e.g. kubectl version, ...)
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.18", GitCommit:"6b913dbde30aa95b247be30a5318fb912f8fe29e", GitTreeState:"clean", BuildDate:"2021-08-11T10:20:21Z", GoVersion:"go1.15.11", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.18-57+776098ae2e7bf3-dirty", GitCommit:"776098ae2e7bf358cce0af0b0faf139fe66c6c48", GitTreeState:"dirty", BuildDate:"2021-09-01T07:38:52Z", GoVersion:"go1.15.11", Compiler:"gc", Platform:"linux/amd64"}
Link to relevant artifacts (policies, deployments scripts, ...)
Generate and upload a system zip:

curl -sLO https://git.io/cilium-sysdump-latest.zip && python cilium-sysdump-latest.zip

How to reproduce the issue

service with clusterIP[192.168.11.21] has two pods: 172.16.2.48 and 172.16.0.54 running on diffenent nodes, in source pod(172.16.1.83), exec "curl http://192.168.11.21" would got response: connection reset by peer

logs below:
# cilium monitor --related-to 2732(src pod ciliumendpoint) -vv

------------------------------------------------------------------------------
CPU 02: MARK 0xcf11d808 FROM 2732 to-stack: 74 bytes (74 captured), **state new**, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..62..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..40..] Version=4 IHL=5 TOS=0 Length=60 Id=62251 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=60652 SrcIP=172.16.1.83 DstIP=**172.16.2.48** Options=[] Padding=[]}
TCP	{Contents=[..40..] Payload=[] SrcPort=**34768** DstPort=8080(http-alt) Seq=2665712526 Ack=0 DataOffset=10 FIN=false SYN=true RST=false PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=28800 Checksum=23506 Urgent=0 Options=[..5..] Padding=[]}
------------------------------------------------------------------------------
CPU 13: MARK 0x1bfedde FROM 2732 to-endpoint: 74 bytes (74 captured), **state reply**, interface lxcfcbb0616d54c, identity 27454->15441, orig-ip 172.16.2.48, to endpoint 2732
Ethernet	{Contents=[..14..] Payload=[..62..] SrcMAC=ae:11:6e:7b:dc:0b DstMAC=1e:92:25:6f:82:a8 EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..40..] Version=4 IHL=5 TOS=0 Length=60 Id=0 Flags=DF FragOffset=0 TTL=60 Protocol=TCP Checksum=50587 SrcIP=**192.168.11.21** DstIP=172.16.1.83 Options=[] Padding=[]}
TCP	{Contents=[..40..] Payload=[] SrcPort=80(http) DstPort=**34768** Seq=633059423 Ack=2665712527 DataOffset=10 FIN=false SYN=true RST=false PSH=false ACK=true URG=false ECE=false CWR=false NS=false Window=28560 Checksum=11711 Urgent=0 Options=[..5..] Padding=[]}
------------------------------------------------------------------------------
CPU 13: MARK 0xcf11d808 FROM 2732 to-stack: 66 bytes (66 captured), **state new**, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..54..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..32..] Version=4 IHL=5 TOS=0 Length=52 Id=62252 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=61165 SrcIP=172.16.1.83 DstIP=**172.16.0.54** Options=[] Padding=[]}
TCP	{Contents=[..32..] Payload=[] SrcPort=**34768** DstPort=8080(http-alt) Seq=2665712527 Ack=633059424 DataOffset=8 FIN=false SYN=false RST=false PSH=false ACK=true URG=false ECE=false CWR=false NS=false Window=57 Checksum=22992 Urgent=0 Options=[TCPOption(NOP:), TCPOption(NOP:), TCPOption(Timestamps:2872462143/3723349126 0xab364b3fddedcc86)] Padding=[]}
------------------------------------------------------------------------------
CPU 07: MARK 0xcf11d808 FROM 2732 to-stack: 697 bytes (128 captured), **state established**, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..118..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..98..] Version=4 IHL=5 TOS=0 Length=683 Id=62253 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=60533 SrcIP=172.16.1.83 DstIP=172.16.0.54 Options=[] Padding=[]}
TCP	{Contents=[..32..] Payload=[..66..] SrcPort=34768 DstPort=8080(http-alt) Seq=2665712527 Ack=633059424 DataOffset=8 FIN=false SYN=false RST=false PSH=true ACK=true URG=false ECE=false CWR=false NS=false Window=57 Checksum=23623 Urgent=0 Options=[TCPOption(NOP:), TCPOption(NOP:), TCPOption(Timestamps:2872462143/3723349126 0xab364b3fddedcc86)] Padding=[]}
  Packet has been truncated
------------------------------------------------------------------------------
CPU 09: MARK 0x9ba566e0 FROM 2732 to-endpoint: 54 bytes (54 captured), state reply, interface lxcfcbb0616d54c, identity 27454->15441, orig-ip 172.16.0.54, to endpoint 2732
Ethernet	{Contents=[..14..] Payload=[..46..] SrcMAC=ae:11:6e:7b:dc:0b DstMAC=1e:92:25:6f:82:a8 EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..20..] Version=4 IHL=5 TOS=0 Length=40 Id=0 Flags=DF FragOffset=0 TTL=60 Protocol=TCP Checksum=50607 SrcIP=192.168.11.21 DstIP=172.16.1.83 Options=[] Padding=[]}
TCP	{Contents=[..20..] Payload=[] SrcPort=80(http) DstPort=34768 Seq=633059424 Ack=0 DataOffset=5 FIN=false SYN=false RST=true PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=0 Checksum=53379 Urgent=0 Options=[] Padding=[]}
------------------------------------------------------------------------------
CPU 13: MARK 0x0 FROM 2732 to-stack: 54 bytes (54 captured), state established, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..46..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..20..] Version=4 IHL=5 TOS=0 Length=40 Id=0 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=57894 SrcIP=172.16.1.83 DstIP=172.16.0.54 Options=[] Padding=[]}
TCP	{Contents=[..20..] Payload=[] SrcPort=34768 DstPort=8080(http-alt) Seq=2665712527 Ack=0 DataOffset=5 FIN=false SYN=false RST=true PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=0 Checksum=33891 Urgent=0 Options=[] Padding=[]}
------------------------------------------------------------------------------
CPU 01: MARK 0x7986a3aa FROM 2732 to-endpoint: 66 bytes (66 captured), state reply, interface lxcfcbb0616d54c, identity 27454->15441, orig-ip 172.16.0.54, to endpoint 2732
Ethernet	{Contents=[..14..] Payload=[..54..] SrcMAC=ae:11:6e:7b:dc:0b DstMAC=1e:92:25:6f:82:a8 EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..32..] Version=4 IHL=5 TOS=0 Length=52 Id=43055 Flags=DF FragOffset=0 TTL=60 Protocol=TCP Checksum=7540 SrcIP=192.168.11.21 DstIP=172.16.1.83 Options=[] Padding=[]}
TCP	{Contents=[..32..] Payload=[] SrcPort=80(http) DstPort=56746 Seq=1224512877 Ack=325310456 DataOffset=8 FIN=false SYN=false RST=false PSH=false ACK=true URG=false ECE=false CWR=false NS=false Window=59 Checksum=55518 Urgent=0 Options=[TCPOption(NOP:), TCPOption(NOP:), TCPOption(Timestamps:1087010861/2872449172 0x40ca782dab361894)] Padding=[]}
------------------------------------------------------------------------------
CPU 01: MARK 0x0 FROM 2732 to-stack: 66 bytes (66 captured), state established, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..54..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..32..] Version=4 IHL=5 TOS=0 Length=52 Id=0 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=57882 SrcIP=172.16.1.83 DstIP=172.16.0.54 Options=[] Padding=[]}
TCP	{Contents=[..32..] Payload=[] SrcPort=56746 DstPort=8080(http-alt) Seq=325310456 Ack=1224512878 DataOffset=8 FIN=false SYN=false RST=false PSH=false ACK=true URG=false ECE=false CWR=false NS=false Window=57 Checksum=5208 Urgent=0 Options=[TCPOption(NOP:), TCPOption(NOP:), TCPOption(Timestamps:2872464276/1086980588 0xab36539440ca01ec)] Padding=[]}
------------------------------------------------------------------------------
CPU 01: MARK 0x0 FROM 2732 to-stack: 54 bytes (54 captured), state established, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..46..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..20..] Version=4 IHL=5 TOS=0 Length=40 Id=0 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=57894 SrcIP=172.16.1.83 DstIP=172.16.0.54 Options=[] Padding=[]}
TCP	{Contents=[..20..] Payload=[] SrcPort=34768 DstPort=8080(http-alt) Seq=2665712527 Ack=0 DataOffset=5 FIN=false SYN=false RST=true PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=0 Checksum=33891 Urgent=0 Options=[] Padding=[]}
------------------------------------------------------------------------------

After two handshake with backend 172.16.2.48, client make a wrong connection with another pod 172.16.0.54, which causes connection reset by peer. So, with existed conntrack 172.16.2.48 , why client create a new conntrack with backend 172.16.0.54?

Take it brief. cilium monitor --related-to xx -v

-> stack flow 0x5cc2852d identity 23527->18201 state new ifindex 0 orig-ip 0.0.0.0: 172.16.2.23:47176 -> 172.16.3.9:80 tcp SYN
-> endpoint 1724 flow 0xd6848b99 identity 18201->23527 state reply ifindex lxcfff879ba3305 orig-ip 172.16.3.9: 192.168.4.198:80 -> 172.16.2.23:47176 tcp SYN, ACK
-> stack flow 0x5cc2852d identity 23527->2117 state new ifindex 0 orig-ip 0.0.0.0: 172.16.2.23:47176 -> 172.16.4.5:80 tcp ACK
-> endpoint 1724 flow 0xe1c4b74a identity 2117->23527 state reply ifindex lxcfff879ba3305 orig-ip 172.16.4.5: 192.168.4.198:80 -> 172.16.2.23:47176 tcp RST

The text was updated successfully, but these errors were encountered:

brb · 2021-09-24T14:07:16Z

@ChangyuWang Can you please upload the sysdump?

brb · 2021-09-27T09:50:53Z

@ChangyuWang Also, please do the sysdump immediately after hitting the issue.

trigger-happy · 2021-10-18T13:08:12Z

Hi, I believe I'm encountering the same issue in my own cluster. I've attached a link to a sysdump I did in my cluster right after repeatedly hitting the issue. In this particular case, I was trying to get my gitea instance to request the .well-known/openid-configuration endpoint of my keycloak instance. Hope this helps in figuring out what's wrong

https://drive.google.com/file/d/1CRlea3KfkOovBc8jRPA8knDwDSc8ln5F/view?usp=sharing

ChangyuWang · 2022-01-12T02:33:02Z

The reason is: Access service clusterIP from souce pod managed by cilium network, connection tracking entries would be created in cilium_ct4_global map, where service ct holds 6h until gc. Between gc interval, if map table is full, alive connection ct entry would be deleted by lru map while inserting some new entries. For the situation, new connection tracking entry with a different backend pod would be created when the packets with same flow in or out, which cause connection reset by peer.

github-actions · 2022-07-09T02:15:48Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2022-09-10T02:18:50Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2022-11-13T02:14:32Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2022-11-28T02:06:37Z

This issue has not seen any activity since it was marked stale.
Closing.

ChangyuWang added the kind/bug This is a bug in the Cilium logic. label Sep 23, 2021

ChangyuWang changed the title ~~Two pods under the same service use the same port, causing the connection reset by peer~~ tc nat make connection with two pods under the same service using the same port, causing the connection reset by peer Sep 23, 2021

pchaigno added needs/triage This issue requires triaging to establish severity and next steps. kind/community-report This was reported by a user in the Cilium community, eg via Slack. labels Sep 23, 2021

aanm added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Jan 6, 2022

maxwellxxx mentioned this issue Jan 11, 2022

Service CT entries leaking #18442

Closed

2 tasks

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 9, 2022

brb removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 11, 2022

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 10, 2022

brb removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 13, 2022

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Nov 13, 2022

github-actions bot closed this as completed Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tc nat make connection with two pods under the same service using the same port, causing the connection reset by peer #17457

tc nat make connection with two pods under the same service using the same port, causing the connection reset by peer #17457

ChangyuWang commented Sep 23, 2021 •

edited

brb commented Sep 24, 2021

brb commented Sep 27, 2021

trigger-happy commented Oct 18, 2021

ChangyuWang commented Jan 12, 2022

github-actions bot commented Jul 9, 2022

github-actions bot commented Sep 10, 2022

github-actions bot commented Nov 13, 2022

github-actions bot commented Nov 28, 2022

tc nat make connection with two pods under the same service using the same port, causing the connection reset by peer #17457

tc nat make connection with two pods under the same service using the same port, causing the connection reset by peer #17457

Comments

ChangyuWang commented Sep 23, 2021 • edited

Bug report

brb commented Sep 24, 2021

brb commented Sep 27, 2021

trigger-happy commented Oct 18, 2021

ChangyuWang commented Jan 12, 2022

github-actions bot commented Jul 9, 2022

github-actions bot commented Sep 10, 2022

github-actions bot commented Nov 13, 2022

github-actions bot commented Nov 28, 2022

ChangyuWang commented Sep 23, 2021 •

edited