Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tc nat make connection with two pods under the same service using the same port, causing the connection reset by peer #17457

Closed
ChangyuWang opened this issue Sep 23, 2021 · 8 comments
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Comments

@ChangyuWang
Copy link

ChangyuWang commented Sep 23, 2021

Bug report

General Information

  • Cilium version (run cilium version)
    Client: 1.9.0 go version go1.15.4 linux/amd64
    Daemon: 1.9.0 go version go1.15.4 linux/amd64
  • Kernel version (run uname -a)
    Linux 4.14.105-19-0019 SMP Fri Jan 15 11:39:34 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Orchestration system version in use (e.g. kubectl version, ...)
    Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.18", GitCommit:"6b913dbde30aa95b247be30a5318fb912f8fe29e", GitTreeState:"clean", BuildDate:"2021-08-11T10:20:21Z", GoVersion:"go1.15.11", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.18-57+776098ae2e7bf3-dirty", GitCommit:"776098ae2e7bf358cce0af0b0faf139fe66c6c48", GitTreeState:"dirty", BuildDate:"2021-09-01T07:38:52Z", GoVersion:"go1.15.11", Compiler:"gc", Platform:"linux/amd64"}
  • Link to relevant artifacts (policies, deployments scripts, ...)
  • Generate and upload a system zip:
curl -sLO https://git.io/cilium-sysdump-latest.zip && python cilium-sysdump-latest.zip

How to reproduce the issue

service with clusterIP[192.168.11.21] has two pods: 172.16.2.48 and 172.16.0.54 running on diffenent nodes, in source pod(172.16.1.83), exec "curl http://192.168.11.21" would got response: connection reset by peer

logs below:
# cilium monitor --related-to 2732(src pod ciliumendpoint) -vv

------------------------------------------------------------------------------
CPU 02: MARK 0xcf11d808 FROM 2732 to-stack: 74 bytes (74 captured), **state new**, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..62..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..40..] Version=4 IHL=5 TOS=0 Length=60 Id=62251 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=60652 SrcIP=172.16.1.83 DstIP=**172.16.2.48** Options=[] Padding=[]}
TCP	{Contents=[..40..] Payload=[] SrcPort=**34768** DstPort=8080(http-alt) Seq=2665712526 Ack=0 DataOffset=10 FIN=false SYN=true RST=false PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=28800 Checksum=23506 Urgent=0 Options=[..5..] Padding=[]}
------------------------------------------------------------------------------
CPU 13: MARK 0x1bfedde FROM 2732 to-endpoint: 74 bytes (74 captured), **state reply**, interface lxcfcbb0616d54c, identity 27454->15441, orig-ip 172.16.2.48, to endpoint 2732
Ethernet	{Contents=[..14..] Payload=[..62..] SrcMAC=ae:11:6e:7b:dc:0b DstMAC=1e:92:25:6f:82:a8 EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..40..] Version=4 IHL=5 TOS=0 Length=60 Id=0 Flags=DF FragOffset=0 TTL=60 Protocol=TCP Checksum=50587 SrcIP=**192.168.11.21** DstIP=172.16.1.83 Options=[] Padding=[]}
TCP	{Contents=[..40..] Payload=[] SrcPort=80(http) DstPort=**34768** Seq=633059423 Ack=2665712527 DataOffset=10 FIN=false SYN=true RST=false PSH=false ACK=true URG=false ECE=false CWR=false NS=false Window=28560 Checksum=11711 Urgent=0 Options=[..5..] Padding=[]}
------------------------------------------------------------------------------
CPU 13: MARK 0xcf11d808 FROM 2732 to-stack: 66 bytes (66 captured), **state new**, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..54..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..32..] Version=4 IHL=5 TOS=0 Length=52 Id=62252 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=61165 SrcIP=172.16.1.83 DstIP=**172.16.0.54** Options=[] Padding=[]}
TCP	{Contents=[..32..] Payload=[] SrcPort=**34768** DstPort=8080(http-alt) Seq=2665712527 Ack=633059424 DataOffset=8 FIN=false SYN=false RST=false PSH=false ACK=true URG=false ECE=false CWR=false NS=false Window=57 Checksum=22992 Urgent=0 Options=[TCPOption(NOP:), TCPOption(NOP:), TCPOption(Timestamps:2872462143/3723349126 0xab364b3fddedcc86)] Padding=[]}
------------------------------------------------------------------------------
CPU 07: MARK 0xcf11d808 FROM 2732 to-stack: 697 bytes (128 captured), **state established**, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..118..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..98..] Version=4 IHL=5 TOS=0 Length=683 Id=62253 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=60533 SrcIP=172.16.1.83 DstIP=172.16.0.54 Options=[] Padding=[]}
TCP	{Contents=[..32..] Payload=[..66..] SrcPort=34768 DstPort=8080(http-alt) Seq=2665712527 Ack=633059424 DataOffset=8 FIN=false SYN=false RST=false PSH=true ACK=true URG=false ECE=false CWR=false NS=false Window=57 Checksum=23623 Urgent=0 Options=[TCPOption(NOP:), TCPOption(NOP:), TCPOption(Timestamps:2872462143/3723349126 0xab364b3fddedcc86)] Padding=[]}
  Packet has been truncated
------------------------------------------------------------------------------
CPU 09: MARK 0x9ba566e0 FROM 2732 to-endpoint: 54 bytes (54 captured), state reply, interface lxcfcbb0616d54c, identity 27454->15441, orig-ip 172.16.0.54, to endpoint 2732
Ethernet	{Contents=[..14..] Payload=[..46..] SrcMAC=ae:11:6e:7b:dc:0b DstMAC=1e:92:25:6f:82:a8 EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..20..] Version=4 IHL=5 TOS=0 Length=40 Id=0 Flags=DF FragOffset=0 TTL=60 Protocol=TCP Checksum=50607 SrcIP=192.168.11.21 DstIP=172.16.1.83 Options=[] Padding=[]}
TCP	{Contents=[..20..] Payload=[] SrcPort=80(http) DstPort=34768 Seq=633059424 Ack=0 DataOffset=5 FIN=false SYN=false RST=true PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=0 Checksum=53379 Urgent=0 Options=[] Padding=[]}
------------------------------------------------------------------------------
CPU 13: MARK 0x0 FROM 2732 to-stack: 54 bytes (54 captured), state established, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..46..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..20..] Version=4 IHL=5 TOS=0 Length=40 Id=0 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=57894 SrcIP=172.16.1.83 DstIP=172.16.0.54 Options=[] Padding=[]}
TCP	{Contents=[..20..] Payload=[] SrcPort=34768 DstPort=8080(http-alt) Seq=2665712527 Ack=0 DataOffset=5 FIN=false SYN=false RST=true PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=0 Checksum=33891 Urgent=0 Options=[] Padding=[]}
------------------------------------------------------------------------------
CPU 01: MARK 0x7986a3aa FROM 2732 to-endpoint: 66 bytes (66 captured), state reply, interface lxcfcbb0616d54c, identity 27454->15441, orig-ip 172.16.0.54, to endpoint 2732
Ethernet	{Contents=[..14..] Payload=[..54..] SrcMAC=ae:11:6e:7b:dc:0b DstMAC=1e:92:25:6f:82:a8 EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..32..] Version=4 IHL=5 TOS=0 Length=52 Id=43055 Flags=DF FragOffset=0 TTL=60 Protocol=TCP Checksum=7540 SrcIP=192.168.11.21 DstIP=172.16.1.83 Options=[] Padding=[]}
TCP	{Contents=[..32..] Payload=[] SrcPort=80(http) DstPort=56746 Seq=1224512877 Ack=325310456 DataOffset=8 FIN=false SYN=false RST=false PSH=false ACK=true URG=false ECE=false CWR=false NS=false Window=59 Checksum=55518 Urgent=0 Options=[TCPOption(NOP:), TCPOption(NOP:), TCPOption(Timestamps:1087010861/2872449172 0x40ca782dab361894)] Padding=[]}
------------------------------------------------------------------------------
CPU 01: MARK 0x0 FROM 2732 to-stack: 66 bytes (66 captured), state established, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..54..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..32..] Version=4 IHL=5 TOS=0 Length=52 Id=0 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=57882 SrcIP=172.16.1.83 DstIP=172.16.0.54 Options=[] Padding=[]}
TCP	{Contents=[..32..] Payload=[] SrcPort=56746 DstPort=8080(http-alt) Seq=325310456 Ack=1224512878 DataOffset=8 FIN=false SYN=false RST=false PSH=false ACK=true URG=false ECE=false CWR=false NS=false Window=57 Checksum=5208 Urgent=0 Options=[TCPOption(NOP:), TCPOption(NOP:), TCPOption(Timestamps:2872464276/1086980588 0xab36539440ca01ec)] Padding=[]}
------------------------------------------------------------------------------
CPU 01: MARK 0x0 FROM 2732 to-stack: 54 bytes (54 captured), state established, identity 15441->27454, orig-ip 0.0.0.0
Ethernet	{Contents=[..14..] Payload=[..46..] SrcMAC=1e:92:25:6f:82:a8 DstMAC=ae:11:6e:7b:dc:0b EthernetType=IPv4 Length=0}
IPv4	{Contents=[..20..] Payload=[..20..] Version=4 IHL=5 TOS=0 Length=40 Id=0 Flags=DF FragOffset=0 TTL=63 Protocol=TCP Checksum=57894 SrcIP=172.16.1.83 DstIP=172.16.0.54 Options=[] Padding=[]}
TCP	{Contents=[..20..] Payload=[] SrcPort=34768 DstPort=8080(http-alt) Seq=2665712527 Ack=0 DataOffset=5 FIN=false SYN=false RST=true PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=0 Checksum=33891 Urgent=0 Options=[] Padding=[]}
------------------------------------------------------------------------------

After two handshake with backend 172.16.2.48, client make a wrong connection with another pod 172.16.0.54, which causes connection reset by peer. So, with existed conntrack 172.16.2.48 , why client create a new conntrack with backend 172.16.0.54?

Take it brief. cilium monitor --related-to xx -v

-> stack flow 0x5cc2852d identity 23527->18201 state new ifindex 0 orig-ip 0.0.0.0: 172.16.2.23:47176 -> 172.16.3.9:80 tcp SYN
-> endpoint 1724 flow 0xd6848b99 identity 18201->23527 state reply ifindex lxcfff879ba3305 orig-ip 172.16.3.9: 192.168.4.198:80 -> 172.16.2.23:47176 tcp SYN, ACK
-> stack flow 0x5cc2852d identity 23527->2117 state new ifindex 0 orig-ip 0.0.0.0: 172.16.2.23:47176 -> 172.16.4.5:80 tcp ACK
-> endpoint 1724 flow 0xe1c4b74a identity 2117->23527 state reply ifindex lxcfff879ba3305 orig-ip 172.16.4.5: 192.168.4.198:80 -> 172.16.2.23:47176 tcp RST
@ChangyuWang ChangyuWang added the kind/bug This is a bug in the Cilium logic. label Sep 23, 2021
@ChangyuWang ChangyuWang changed the title Two pods under the same service use the same port, causing the connection reset by peer tc nat make connection with two pods under the same service using the same port, causing the connection reset by peer Sep 23, 2021
@pchaigno pchaigno added needs/triage This issue requires triaging to establish severity and next steps. kind/community-report This was reported by a user in the Cilium community, eg via Slack. labels Sep 23, 2021
@brb
Copy link
Member

brb commented Sep 24, 2021

@ChangyuWang Can you please upload the sysdump?

@brb
Copy link
Member

brb commented Sep 27, 2021

@ChangyuWang Also, please do the sysdump immediately after hitting the issue.

@trigger-happy
Copy link

Hi, I believe I'm encountering the same issue in my own cluster. I've attached a link to a sysdump I did in my cluster right after repeatedly hitting the issue. In this particular case, I was trying to get my gitea instance to request the .well-known/openid-configuration endpoint of my keycloak instance. Hope this helps in figuring out what's wrong

https://drive.google.com/file/d/1CRlea3KfkOovBc8jRPA8knDwDSc8ln5F/view?usp=sharing

@aanm aanm added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Jan 6, 2022
@ChangyuWang
Copy link
Author

The reason is: Access service clusterIP from souce pod managed by cilium network, connection tracking entries would be created in cilium_ct4_global map, where service ct holds 6h until gc. Between gc interval, if map table is full, alive connection ct entry would be deleted by lru map while inserting some new entries. For the situation, new connection tracking entry with a different backend pod would be created when the packets with same flow in or out, which cause connection reset by peer.

@github-actions
Copy link

github-actions bot commented Jul 9, 2022

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 9, 2022
@brb brb removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 11, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 10, 2022
@brb brb removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 13, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Nov 13, 2022
@github-actions
Copy link

This issue has not seen any activity since it was marked stale.
Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

No branches or pull requests

5 participants