Document wsAtomicTransaction externalURLPrefix limitations in Kubernetes #7223

kgibm · 2024-02-15T15:22:17Z

When using wsAtomicTransaction in a Kubernetes environment without peer recovery, if externalURLPrefix is set to use ${env.POD_IP}, and if the client pod communicates through a Kubernetes service, and if the service communication is NATted to the pod IP (e.g. OpenShift SDN), then, under heavy load, if a socket opened to the service IP of the target pod is closed and enters TIME_WAIT, and if a new socket is attempted to be created by WS-AT on the pod IP and it happens to use the same ephemeral local port, then SYN packets will be dropped and retransmitted during the TIME_WAIT period causing undesired latency. If this duration exceeds 30 seconds, the WS-AT call will fail with a java.net.SocketTimeoutException with diagnostic trace *=info:org.apache.cxf.*=all:

java.net.SocketTimeoutException: connect timed out
        [...]
        at com.ibm.ws.wsat.service.impl.WebClientImpl$3.call(WebClientImpl.java:120)

This manifests as retransmitting SYN packets in network trace on the client; for example:

$ TZ=UTC tshark -t ud -T fields -e frame.number -e _ws.col.Time -e ip.src -e tcp.srcport -e ip.dst -e tcp.dstport -e tcp.stream -e frame.len -e _ws.col.Protocol -e _ws.col.Info -r *pcap* -Y "tcp.flags.syn == 1 && tcp.analysis.retransmission"
409708	2024-02-05 16:59:08.192371	10.1.2.3	5502	10.1.2.4	9443	3946	76	TCP	[TCP Retransmission] 5502 → 9443 [SYN] Seq=0 Win=26733 Len=0 MSS=8911 SACK_PERM=1 TSval=123 TSecr=0 WS=128
411507	2024-02-05 16:59:10.241377	10.1.2.3	5502	10.1.2.4	9443	3946	76	TCP	[TCP Retransmission] 5502 → 9443 [SYN] Seq=0 Win=26733 Len=0 MSS=8911 SACK_PERM=1 TSval=123 TSecr=0 WS=128
412178	2024-02-05 16:59:14.272379	10.1.2.3	5502	10.1.2.4	9443	3946	76	TCP	[TCP Retransmission] 5502 → 9443 [SYN] Seq=0 Win=26733 Len=0 MSS=8911 SACK_PERM=1 TSval=123 TSecr=0 WS=128
419229	2024-02-05 16:59:22.784377	10.1.2.3	5502	10.1.2.4	9443	3946	76	TCP	[TCP Retransmission] 5502 → 9443 [SYN] Seq=0 Win=26733 Len=0 MSS=8911 SACK_PERM=1 TSval=123 TSecr=0 WS=128

In other words, because of the NATting, although the client sends to the service IP, the pod receives the traffic on the pod IP, so although the client's 4-tuple is (client IP, client ephemeral port, service IP, service port), the pod's 4-tuple is (client IP, client ephemeral port, pod IP, pod port). Example tcpdumps demonstrating this differing destination IP (the 4-tuple is columns 3-6):

Client pod:

389156	2024-02-05 16:58:30.554889	10.1.2.3	5502	172.1.2.3	9443	3709	68	TCP	5502 → 9443 [ACK] Seq=4697 Ack=635 Win=123 Len=0 TSval=123 TSecr=123

Target pod:

554314	2024-02-05 16:58:30.554905	10.1.2.3	5502	10.1.2.4	9443	11791	68	TCP	5502 → 9443 [ACK] Seq=4697 Ack=635 Win=123 Len=0 TSval=123 TSecr=123

So when the socket to the service is closed and enters TIME_WAIT, if WS-AT then sends a new socket directly to the pod IP (client IP, client ephemeral port, pod IP, pod port), the pod's 4-tuple is the same as before (client IP, client ephemeral port, pod IP, pod port) and cannot be re-used. It is believed this is due to the default conntrack TIME_WAIT time of 120 seconds. Using net.ipv4.tcp_tw_reuse=1 does not help. It is possible that net.netfilter.nf_conntrack_tcp_timeout_time_wait=60 may resolve the issue.

One workaround is to use a different port for externalURLPrefix than what is used for the service which avoids the 4-tuple TCP conflict. This requires adding an additional httpEndpoint and adding the port hostAlias to the virtualHost. Since a virtualHost may not have been explicitly defined previously and the default was used, take care to include all incoming virtual hosts. In a Kubernetes environment, although traffic might come into a particular port (e.g. 9443), the request Host header might be on a service port (e.g. 443) and missing such a hostAlias will result in 404s. For example:

<?xml version="1.0" encoding="UTF-8"?>
<server>
  <httpEndpoint id="wsatHttpEndpoint" host="*" httpsPort="9444" />
  <wsAtomicTransaction SSLEnabled="true" SSLRef="cssSSLSettings" externalURLPrefix="https://${env.POD_IP}:9444" />
  <virtualHost id="default_host">
    <hostAlias>*:443</hostAlias>
    <hostAlias>*:9443</hostAlias>
    <hostAlias>*:9444</hostAlias>
  </virtualHost>
</server>

In addition, ensure that the new port is accessible by the client. For example, although a ports.containerPort entry on the pod specification is not needed as "Any port which is listening on the default '0.0.0.0' address inside a container will be accessible from the network", there may be a NetworkPolicy that restricts available ports and thus an additional NetworkPolicy may be required.

While this issue affects OpenShift SDN networking it does not appear to affect OVN-Kubernetes networking. OVN-Kubernetes has replaced OpenShift SDN as the default networking plugin as of OpenShift 4.12. It is believed that switching to OVN-Kubernetes may also resolve this issue.

Diagnostic notes:

Gather tcpdump on both client and server pods using nsenter: https://access.redhat.com/solutions/4569211 and https://access.redhat.com/solutions/1611883
Quickly finding retransmitting SYNs on the socket connect:

TZ=UTC tshark -t ud -T fields -e frame.number -e _ws.col.Time -e ip.src -e tcp.srcport -e ip.dst -e tcp.dstport -e tcp.stream -e frame.len -e _ws.col.Protocol -e _ws.col.Info -r *pcap* -Y "tcp.flags.syn == 1 && tcp.analysis.retransmission"
1. Then, search for that same 4-tuple before the first SYN and check if any packets (usually FIN or ACK, but could also be RST) are within 60 seconds of the first SYN.

The text was updated successfully, but these errors were encountered:

kgibm · 2024-04-17T13:26:32Z

This has since been observed with a different symptom of SYNs being dropped while another conversation is still active, thus this issue is not limited to TIME_WAIT and the hypothesis about net.netfilter.nf_conntrack_tcp_timeout_time_wait=60 is incorrect or incomplete. Investigating further.

Fixes OpenLiberty#7223 Signed-off-by: Kevin Grigorenko <kevin.grigorenko@us.ibm.com>

kgibm self-assigned this Feb 21, 2024

kgibm added a commit to kgibm/docs that referenced this issue Apr 29, 2024

Add section Considerations when using a proxy server

a47f46f

Fixes OpenLiberty#7223 Signed-off-by: Kevin Grigorenko <kevin.grigorenko@us.ibm.com>

kgibm linked a pull request Apr 29, 2024 that will close this issue

Add section Considerations when using a proxy server #7307

Open

dmuelle added the serviceability label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document wsAtomicTransaction externalURLPrefix limitations in Kubernetes #7223

Document wsAtomicTransaction externalURLPrefix limitations in Kubernetes #7223

kgibm commented Feb 15, 2024 •

edited

kgibm commented Apr 17, 2024 •

edited

Document wsAtomicTransaction externalURLPrefix limitations in Kubernetes #7223

Document wsAtomicTransaction externalURLPrefix limitations in Kubernetes #7223

Comments

kgibm commented Feb 15, 2024 • edited

kgibm commented Apr 17, 2024 • edited

kgibm commented Feb 15, 2024 •

edited

kgibm commented Apr 17, 2024 •

edited