You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using wsAtomicTransaction in a Kubernetes environment without peer recovery, if externalURLPrefix is set to use ${env.POD_IP}, and if the client pod communicates through a Kubernetes service, and if the service communication is NATted to the pod IP (e.g. OpenShift SDN), then, under heavy load, if a socket opened to the service IP of the target pod is closed and enters TIME_WAIT, and if a new socket is attempted to be created by WS-AT on the pod IP and it happens to use the same ephemeral local port, then SYN packets will be dropped and retransmitted during the TIME_WAIT period causing undesired latency. If this duration exceeds 30 seconds, the WS-AT call will fail with a java.net.SocketTimeoutException with diagnostic trace *=info:org.apache.cxf.*=all:
java.net.SocketTimeoutException: connect timed out
[...]
at com.ibm.ws.wsat.service.impl.WebClientImpl$3.call(WebClientImpl.java:120)
This manifests as retransmitting SYN packets in network trace on the client; for example:
In other words, because of the NATting, although the client sends to the service IP, the pod receives the traffic on the pod IP, so although the client's 4-tuple is (client IP, client ephemeral port, service IP, service port), the pod's 4-tuple is (client IP, client ephemeral port, pod IP, pod port). Example tcpdumps demonstrating this differing destination IP (the 4-tuple is columns 3-6):
So when the socket to the service is closed and enters TIME_WAIT, if WS-AT then sends a new socket directly to the pod IP (client IP, client ephemeral port, pod IP, pod port), the pod's 4-tuple is the same as before (client IP, client ephemeral port, pod IP, pod port) and cannot be re-used. It is believed this is due to the default conntrack TIME_WAIT time of 120 seconds. Using net.ipv4.tcp_tw_reuse=1 does not help. It is possible that net.netfilter.nf_conntrack_tcp_timeout_time_wait=60 may resolve the issue.
One workaround is to use a different port for externalURLPrefix than what is used for the service which avoids the 4-tuple TCP conflict. This requires adding an additional httpEndpoint and adding the port hostAlias to the virtualHost. Since a virtualHost may not have been explicitly defined previously and the default was used, take care to include all incoming virtual hosts. In a Kubernetes environment, although traffic might come into a particular port (e.g. 9443), the request Host header might be on a service port (e.g. 443) and missing such a hostAlias will result in 404s. For example:
In addition, ensure that the new port is accessible by the client. For example, although a ports.containerPort entry on the pod specification is not needed as "Any port which is listening on the default '0.0.0.0' address inside a container will be accessible from the network", there may be a NetworkPolicy that restricts available ports and thus an additional NetworkPolicy may be required.
While this issue affects OpenShift SDN networking it does not appear to affect OVN-Kubernetes networking. OVN-Kubernetes has replaced OpenShift SDN as the default networking plugin as of OpenShift 4.12. It is believed that switching to OVN-Kubernetes may also resolve this issue.
Then, search for that same 4-tuple before the first SYN and check if any packets (usually FIN or ACK, but could also be RST) are within 60 seconds of the first SYN.
The text was updated successfully, but these errors were encountered:
This has since been observed with a different symptom of SYNs being dropped while another conversation is still active, thus this issue is not limited to TIME_WAIT and the hypothesis about net.netfilter.nf_conntrack_tcp_timeout_time_wait=60 is incorrect or incomplete. Investigating further.
kgibm
added a commit
to kgibm/docs
that referenced
this issue
Apr 29, 2024
When using
wsAtomicTransaction
in a Kubernetes environment without peer recovery, ifexternalURLPrefix
is set to use${env.POD_IP}
, and if the client pod communicates through a Kubernetes service, and if the service communication is NATted to the pod IP (e.g. OpenShift SDN), then, under heavy load, if a socket opened to the service IP of the target pod is closed and entersTIME_WAIT
, and if a new socket is attempted to be created by WS-AT on the pod IP and it happens to use the same ephemeral local port, thenSYN
packets will be dropped and retransmitted during theTIME_WAIT
period causing undesired latency. If this duration exceeds 30 seconds, the WS-AT call will fail with ajava.net.SocketTimeoutException
with diagnostic trace*=info:org.apache.cxf.*=all
:This manifests as retransmitting
SYN
packets in network trace on the client; for example:In other words, because of the NATting, although the client sends to the service IP, the pod receives the traffic on the pod IP, so although the client's 4-tuple is (client IP, client ephemeral port, service IP, service port), the pod's 4-tuple is (client IP, client ephemeral port, pod IP, pod port). Example tcpdumps demonstrating this differing destination IP (the 4-tuple is columns 3-6):
Client pod:
Target pod:
So when the socket to the service is closed and enters
TIME_WAIT
, if WS-AT then sends a new socket directly to the pod IP (client IP, client ephemeral port, pod IP, pod port), the pod's 4-tuple is the same as before (client IP, client ephemeral port, pod IP, pod port) and cannot be re-used. It is believed this is due to the default conntrackTIME_WAIT
time of 120 seconds. Usingnet.ipv4.tcp_tw_reuse=1
does not help. It is possible thatnet.netfilter.nf_conntrack_tcp_timeout_time_wait=60
may resolve the issue.One workaround is to use a different port for
externalURLPrefix
than what is used for the service which avoids the 4-tuple TCP conflict. This requires adding an additionalhttpEndpoint
and adding the porthostAlias
to thevirtualHost
. Since avirtualHost
may not have been explicitly defined previously and the default was used, take care to include all incoming virtual hosts. In a Kubernetes environment, although traffic might come into a particular port (e.g. 9443), the requestHost
header might be on a service port (e.g. 443) and missing such ahostAlias
will result in 404s. For example:In addition, ensure that the new port is accessible by the client. For example, although a
ports.containerPort
entry on the pod specification is not needed as "Any port which is listening on the default '0.0.0.0' address inside a container will be accessible from the network", there may be a NetworkPolicy that restricts available ports and thus an additionalNetworkPolicy
may be required.While this issue affects OpenShift SDN networking it does not appear to affect OVN-Kubernetes networking. OVN-Kubernetes has replaced OpenShift SDN as the default networking plugin as of OpenShift 4.12. It is believed that switching to OVN-Kubernetes may also resolve this issue.
Diagnostic notes:
Gather
tcpdump
on both client and server pods usingnsenter
: https://access.redhat.com/solutions/4569211 and https://access.redhat.com/solutions/1611883Quickly finding retransmitting SYNs on the socket connect:
TZ=UTC tshark -t ud -T fields -e frame.number -e _ws.col.Time -e ip.src -e tcp.srcport -e ip.dst -e tcp.dstport -e tcp.stream -e frame.len -e _ws.col.Protocol -e _ws.col.Info -r *pcap* -Y "tcp.flags.syn == 1 && tcp.analysis.retransmission"
SYN
and check if any packets (usuallyFIN
orACK
, but could also beRST
) are within 60 seconds of the firstSYN
.The text was updated successfully, but these errors were encountered: