New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tc nat make connection with two pods under the same service using the same port, causing the connection reset by peer #17457
Comments
@ChangyuWang Can you please upload the sysdump? |
@ChangyuWang Also, please do the sysdump immediately after hitting the issue. |
Hi, I believe I'm encountering the same issue in my own cluster. I've attached a link to a sysdump I did in my cluster right after repeatedly hitting the issue. In this particular case, I was trying to get my gitea instance to request the https://drive.google.com/file/d/1CRlea3KfkOovBc8jRPA8knDwDSc8ln5F/view?usp=sharing |
The reason is: Access service clusterIP from souce pod managed by cilium network, connection tracking entries would be created in cilium_ct4_global map, where service ct holds 6h until gc. Between gc interval, if map table is full, alive connection ct entry would be deleted by lru map while inserting some new entries. For the situation, new connection tracking entry with a different backend pod would be created when the packets with same flow in or out, which cause connection reset by peer. |
This issue has been automatically marked as stale because it has not |
This issue has been automatically marked as stale because it has not |
This issue has been automatically marked as stale because it has not |
This issue has not seen any activity since it was marked stale. |
Bug report
General Information
cilium version
)Client: 1.9.0 go version go1.15.4 linux/amd64
Daemon: 1.9.0 go version go1.15.4 linux/amd64
uname -a
)Linux 4.14.105-19-0019 SMP Fri Jan 15 11:39:34 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
kubectl version
, ...)Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.18", GitCommit:"6b913dbde30aa95b247be30a5318fb912f8fe29e", GitTreeState:"clean", BuildDate:"2021-08-11T10:20:21Z", GoVersion:"go1.15.11", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.18-57+776098ae2e7bf3-dirty", GitCommit:"776098ae2e7bf358cce0af0b0faf139fe66c6c48", GitTreeState:"dirty", BuildDate:"2021-09-01T07:38:52Z", GoVersion:"go1.15.11", Compiler:"gc", Platform:"linux/amd64"}
How to reproduce the issue
service with clusterIP[192.168.11.21] has two pods: 172.16.2.48 and 172.16.0.54 running on diffenent nodes, in source pod(172.16.1.83), exec "curl http://192.168.11.21" would got response: connection reset by peer
logs below:
# cilium monitor --related-to 2732(src pod ciliumendpoint) -vv
After two handshake with backend 172.16.2.48, client make a wrong connection with another pod 172.16.0.54, which causes connection reset by peer. So, with existed conntrack 172.16.2.48 , why client create a new conntrack with backend 172.16.0.54?
Take it brief. cilium monitor --related-to xx -v
The text was updated successfully, but these errors were encountered: