Intermittent connection reset and delay running time #245

younsl · 2024-04-11T02:27:28Z

What happened:

Background

After migrating the network policy provider from Calico v3.25.1 and Tigera Operator to VPC CNI v1.18.0-eksbuild.1, the following two network policy issues occurred on EKS v1.26 cluster:

Note:
The two network issues mentioned in this issue only occur in pods that have applied NetworkPolicy via VPC CNI.

Cluster environment

EKS v1.26
Affected Worker Node
- AMI ID : ami-08318e09445e8b38f (amazon-eks-node-1.26-v20240117), not customized.
- Kernel : 5.10.205-195.804.amzn2.x86_64
EKS Addons
- VPC CNI v1.18.0-eksbuild.1 : enabled network policy feature by apply {"enableNetworkPolicy":"true"} setting in advanced configuration
- kube-proxy : v1.26.11-minimal-eksbuild.1 / enabled in iptables mode.

$ kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2-3
amazon-k8s-cni-init:v1.18.0-eksbuild.1
amazon-k8s-cni:v1.18.0-eksbuild.1
amazon/aws-network-policy-agent:v1.1.0-eksbuild.1

Network policy issues

1. Intermittent connection reset by peer

occurs during Pod to Pod or Pod to EC2 communication.

Similar issues: #204, #210, #236

2. Delayed Running time

Delay in the time it takes for the pod to run.

For pods to which Network Policy is applied, the time it takes to activate the Readiness Probe is up to 3 times slower.

Category	1st time	2nd time
egress netpol applied	86s	91s
egress netpol not applied	30s	32s

Similar issues: #189, #186

Attach logs

1. Intermittent connection reset by peer

[tcpdump] From workload pod to EC2 instance

$ tcpdump -i any '(dst host xx.xxx.xxx.247 and dst port 8080) or (tcp[tcpflags] & tcp-rst != 0)'
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

07:40:05.607233 eth0  Out IP <REDACTED>-5bc847767c-56l5j.47664 > ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080: Flags [S], seq 4216205005, win 62727, options [mss 8961,sackOK,TS val 896125324 ecr 0,nop,wscale 7], length 0
07:40:05.607607 eth0  In  IP ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080 > <REDACTED>-5bc847767c-56l5j.47664: Flags [R.], seq 0, ack 4216205006, win 0, length 0

... repeated ...

07:42:25.807039 eth0  Out IP <REDACTED>-5bc847767c-56l5j.39080 > ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080: Flags [S], seq 1628133200, win 62727, options [mss 8961,sackOK,TS val 896265524 ecr 0,nop,wscale 7], length 0
07:42:25.807463 eth0  In  IP ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080 > <REDACTED>-5bc847767c-56l5j.39080: Flags [R.], seq 0, ack 1628133201, win 0, length 0

Intermittently, the workload pod receives a reset^RST packet response from EC2.

[kubectl sniff] From workload pod to EC2 instance

# Source IP started with 221 is workload pod
# Destination IP ended with 247 is EC2 instance.
INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - port 8080]' on container: '<REDACTED>', pod: '<REDACTED>-5bc847767c-56l5j', namespace: 'b<REDACTED>'
    1   0.000000 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 53456 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897677821 TSecr=0 WS=128
    2   0.000433 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 53456 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    3  70.178815 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 33610 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897748000 TSecr=0 WS=128
    4  70.179236 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 33610 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    5 140.194787 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 51794 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897818016 TSecr=0 WS=128
    6 140.195225 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 51794 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    7 210.383682 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 32906 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897888204 TSecr=0 WS=128
    8 210.384104 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 32906 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    9 280.399946 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 55182 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897958221 TSecr=0 WS=128
   10 280.400365 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 55182 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   11 351.055956 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 42620 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898028877 TSecr=0 WS=128
   12 351.056403 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 42620 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   13 421.076549 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 49318 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898098897 TSecr=0 WS=128
   14 421.076943 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 49318 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   15 495.679313 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 45716 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898173500 TSecr=0 WS=128
   16 495.679744 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 45716 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   17 565.699845 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 46792 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898243521 TSecr=0 WS=128
   18 565.700281 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 46792 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

intermittently, the workload pod receives a reset^RST packet response from EC2.

If a issue occurs in the workload pod, the slack notification below is output.

2. Delayed Running time

Captured ebpf-sdk log on worker node immediately after pod restart

$ tail -f /var/log/aws-routed-eni/ebpf-sdk.log | grep 'xx.xxx.xx.117'
{"level":"info","ts":"2024-04-11T02:25:29.435Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.437Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.584Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.586Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.750Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.752Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.888Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.890Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.012Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.014Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.073Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.074Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.596Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.598Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.665Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.667Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.831Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.832Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.929Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.931Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.009Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.010Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.182Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.185Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}

A Deny log occurs from the destination Service (Cluster IP) IP 172.20.67.165.

$ tail -f /var/log/aws-routed-eni/network-policy-agent.log | egrep 'xx.xxx.xxx.198'
{"level":"info","ts":"2024-04-11T02:18:47.938Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:18:49.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":45868,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:18:51.394Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:18:58.054Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:18:59.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":41568,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:09.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":49904,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:09.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":49908,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:10.642Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:10.758Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:10.926Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.011Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.250Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.358Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.362Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:19:19.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":56462,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:29.430Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":42600,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.242Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.247Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.198","Src Port":42681,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.249Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.198","Src Port":47922,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.249Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.198","Src Port":57360,"Dest IP":"172.20.67.165","Dest Port":80,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:39.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":37192,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:39.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":37196,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

The same network issue occurred in all VPC CNI v1.16.0, v1.16.1, and v1.18.0 versions.

Environment:

Kubernetes version (use kubectl version): v1.26.12-eks-5e0fdde
CNI Version: VPC CNI v1.18.0-eksbuild.1
OS (e.g: cat /etc/os-release): Amazon Linux release 2 (Karoo)
Kernel (e.g. uname -a): 5.10.205-195.804.amzn2.x86_64

The text was updated successfully, but these errors were encountered:

jayanthvn · 2024-04-11T02:55:59Z

We will look into this and get back. Btw is this easily reproducible?

younsl · 2024-04-11T02:58:58Z

@jayanthvn Yes, reproducible. But intermittent, it may take time for it to recur.

jayanthvn · 2024-04-11T03:12:11Z

Thanks. We will review the logs and get back to you.

younsl · 2024-04-12T08:06:53Z

Our team found workarounds for two network policy issues through internal testing today.

Maybe it is not a fundamental solution.

Note:
Sensitive information such as IP address and application name has been REDACT^<REDACTED>.

TLDR

Symptom 1. Intermittent connection reset by peer is resolved.

Workaround: Connect to the worker node and manually mount bpffs

Symptom 2. Delayed readiness time is resolved.

Workaround: explicitly allows ingress from 172.20.0.0/16, which is the Kubernetes service IP range.

Workarounds

1. Intermittent connection reset by peer

1-1. Workaround

As mentioned in Amazon EKS official documentation, intermitten connection reset does not occur when bpffs is mounted on an EC2 Worker Node.

1-2. Record for issue resolution

Check the Kernel version and AMI version of the worker node.

The workload pod(source) that was experiencing the intermitten connection reset symptom was scheduled on the worker node.

$ kubectl get node -o wide ip-xx-xxx-xx-98.ap-northeast-2.compute.internal
NAME                                              STATUS   ROLES    AGE   VERSION                INTERNAL-IP    EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-xx-xxx-xx-98.ap-northeast-2.compute.internal   Ready    <none>   69d   v1.26.12-eks-5e0fdde   xx.xxx.xx.98   <none>        Amazon Linux 2   5.10.205-195.804.amzn2.x86_64   containerd://1.7.2

EKS v1.26
Affected Worker Node
- AMI ID : ami-08318e09445e8b38f (amazon-eks-node-1.26-v20240117), not customized.
- Kernel : 5.10.205-195.804.amzn2.x86_64

Connect to the worker node and manually mount bpffs:

# Mount bpf filesystem in worker node
sudo mount -t bpf bpffs /sys/fs/bpf

$ mount -l | grep bpf
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
none on /sys/fs/bpf type bpf (rw,relatime)
none on /sys/fs/bpf type bpf (rw,relatime)
none on /sys/fs/bpf type bpf (rw,relatime)

Question:
Even though prerequisites are met, why do I have to mount the BPF filesystem^bpffs on the worker node to resolve the symptom?

Prerequisite for network policy of VPC CNI
For all other cluster versions, if you upgrade the Amazon EKS optimized Amazon Linux to version v20230703 or later or you upgrade the Bottlerocket AMI to version v1.0.2 or later, you can skip this step.

Reference: EKS User guide

Since mounting bpffs before work on April 11th, the read ECONNRESET error has not occurred.

2. Delayed readiness time

2-1. Workaround

Add an Ingress netpol to the workload pod (source).
This Ingress netpol explicitly allows ingress from 172.20.0.0/16, which is the Kubernetes service IP range.

2-2. Record for issue resolution

Network policy enforcing mode is set to standard, as the default.

$ kubectl get ds -n kube-system aws-node -o yaml
      containers:
      - env:
        - name: NETWORK_POLICY_ENFORCING_MODE
          value: standard
        name: aws-node

Create a new ingress netpol that ‘explicitly’ allows ingress from the Kubernetes Service IP range.

$ kubectl get netpol -n <REDACTED> ingress-service -o yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  ...
spec:
  ingress:
  - from:
    - ipBlock:
        cidr: 172.20.0.0/16
    ports:
    - endPort: 65535
      port: 1
      protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/networkpolicy-ingress-service: apply
  policyTypes:
  - Ingress
status: {}

$ kubectl get pod -n <REDACTED> t<REDACTED> -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/restartedAt: "2024-04-12T12:30:49+09:00"
  creationTimestamp: "2024-04-12T06:14:02Z"
  generateName: t<REDACTED>-cc878cb69-
  labels:
    ...
    app.kubernetes.io/networkpolicy-ingress-service: apply
    app.kubernetes.io/networkpolicy-ingress-t<REDACTED>: apply
    pod-template-hash: cc878cb69

To explicitly allow Kubernetes service IP range, added app.kubernetes.io/networkpolicy-ingress-service to the Pod label.

Delayed readiness time issue was resolved after explicitly attaching the ingress netpol to the workload pod(source) as shown in this comment.

$ tail -f /var/log/aws-routed-eni/network-policy-agent.log | egrep 'xx.xxx.xx.159'
{"level":"info","ts":"2024-04-12T06:09:36.079Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":43798,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:36.080Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":53898,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:36.081Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":55208,"Dest IP":"172.20.67.165","Dest Port":80,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:41.037Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":56070,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:41.037Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":46937,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}

... No deny logs from kubernetes service IP starting with 172.20.67.165 ...

{"level":"info","ts":"2024-04-12T06:10:03.016Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.294Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.480Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.528Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.678Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.716Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:11.661Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.34.98","Src Port":38850,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:10:13.554Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.10.163","Src Port":19310,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:10:13.606Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.11.99","Src Port":22244,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:10:14.004Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.29.161","Src Port":52804,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}

Captured conntrack list between workload pod(source, ends with 35.242) and cluster IP for destination pod(172.20.67.165)

# Run on the worker node where the source pod is scheduled
$ conntrack -L --src ss.sss.35.242 --dst 172.20.67.165
tcp      6 118 TIME_WAIT src=ss.sss.35.242 dst=172.20.67.165 sport=55938 dport=80 src=ss.sss.19.208 dst=ss.sss.35.242 sport=8080 dport=55938 [ASSURED] mark=0 use=1
tcp      6 431998 ESTABLISHED src=ss.sss.35.242 dst=172.20.67.165 sport=55944 dport=80 src=ss.sss.21.1 dst=ss.sss.35.242 sport=8080 dport=55944 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 2 flow entries have been shown.

I observed with conntrack -L command that connection ESTABLISHED rapidly between workload pod(source) and destination pod(service IP).

# Run on the worker node where the source pod is scheduled
$ conntrack -L --src ss.sss.35.242 --dst 172.20.67.165
tcp      6 98 TIME_WAIT src=ss.sss.35.242 dst=172.20.67.165 sport=55938 dport=80 src=ss.sss.19.208 dst=ss.sss.35.242 sport=8080 dport=55938 [ASSURED] mark=0 use=1
tcp      6 102 TIME_WAIT src=ss.sss.35.242 dst=172.20.67.165 sport=55944 dport=80 src=ss.sss.21.1 dst=ss.sss.35.242 sport=8080 dport=55944 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 2 flow entries have been shown.

It was observed that the ready time of the pod was dramatically reduced from 92 seconds to 32 seconds.

$ kubectl get pod -n <REDACTED> -l app.kubernetes.io/name=t<REDACTED>
NAME                          READY   STATUS    RESTARTS   AGE
t<REDACTED>-cc878cb69-8tmt9   1/1     Running   0          33s

At this point, Readiness time for the workload(source) pod back to normal.

The readiness time comparison is:

Source Pod	Egress netpol	Time to readiness	Status
Workload Pod	egress netpol, without ingress netpol	92s	Delayed
Workload Pod	egress netpol and attached ingress netpol newly	32s	Normal

younsl · 2024-04-12T10:08:41Z

@jayanthvn @achevuru I submitted a node-level support bundles to k8s-awscni-triage@amazon.com.

node-level support-bundle collection

# Collect node level tech-support bundle in the affected worker node
$ /opt/cni/bin/aws-cni-support.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    56  100    56    0     0  34825      0 --:--:-- --:--:-- --:--:-- 56000

	This is version 0.7.6. New versions can be found at https://github.com/awslabs/amazon-eks-ami/blob/master/log-collector-script/

Trying to collect common operating system logs...
Trying to collect kernel logs...
Trying to collect modinfo...
Trying to collect mount points and volume information...
Trying to collect SELinux status...
Trying to collect iptables information...
Trying to collect iptables-legacy information...
Trying to collect installed packages...
Trying to collect active system services...
Trying to Collect Containerd daemon information...
Trying to Collect Containerd running information...
Trying to Collect Docker daemon information...

	Warning: The Docker daemon is not running.

Trying to collect kubelet information...
Trying to collect L-IPAMD introspection information...
Trying to collect L-IPAMD prometheus metrics...
Trying to collect L-IPAMD checkpoint...
Trying to collect Multus logs if they exist...
Trying to collect sysctls information...
Trying to collect networking infomation... conntrack v1.4.4 (conntrack-tools): 5437 flow entries have been shown.

Trying to collect CNI configuration information...
Trying to collect CNI Configuration Variables from Docker...

	Warning: The Docker daemon is not running.
Trying to collect CNI Configuration Variables from Containerd...
Trying to collect network policy ebpf loaded data...
Trying to collect Docker daemon logs...
Trying to Collect sandbox-image daemon information...
Trying to Collect CPU Throttled Process Information...
Trying to Collect IO Throttled Process Information...
Trying to archive gathered information...

	Done... your bundled logs are located in /var/log/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz

$ ls -lh /var/log/eks*
-rw-r--r-- 1 root root 54M Apr 12 09:44 /var/log/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz

# Download tech-support bundle from node filesystem to local
$ kubectl cp default/nsenter-xxxxxx:/var/log/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz $HOME/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz
tar: Removing leading `/' from member names

Reference: troubleshooting

achevuru · 2024-04-12T16:19:29Z

@younsl Thanks for sharing your findings with us.

For delayed readiness time issue :- It is expected as we discussed on the call (i.e.,) if the pod attempts to start a connection before the eBPF probes are configured against the pod interface. Response packets can potentially be dropped if the probes are setup before the response packet reaches the source pod. Refer to this comment for detailed explanation.

Our recommended solution for this is Strict mode, which will gate pod launch until policies are configured against the newly launched pod. If you don't want to migrate to Strict mode due to other limitations, then you can probably consider workarounds included in the above comment. With your workaround, you're explicitly allowing Service CIDR and so that explicitly allows the return packet coming from any cluster local service there by bypassing the conntrack requirement. If you're Ok with this ingress rule then this is a viable workaround for in cluster traffic assuming your pods only talk to other pods in the cluster via Service VIPs.
Intermittent connection reset by peer :- As you can see, there were already multiple BPF FS mounts in your command o/p (mount -l | grep bpf) and there is no need to mount the BPF FS again. If BPF FS isn't mounted then nothing will work and Network policy agent will crash. So, this shouldn't fix anything and considering your issue is intermittent it is probably pure chance that you haven't run in to it in the last 1 day. We believe you're running in to the same issue that is called out in Possible race condition in conntrack map cleanup #246 . Basically if a new connection is in the process of being established right at the time when the conntrack clean up routine runs, NP agent can incorreclty expire the entry away from it's own conntrack table leading to intermittent connection resets and the subsequent retry addresses this. We will address this asap.

younsl · 2024-04-19T07:43:56Z

@achevuru So, this shouldn't fix anything and considering your issue is intermittent it is probably pure chance that you haven't run in to it in the last 1 day.

[updated] Yes, you're right.

Even though the file system bpffs was mounted on all worker nodes, the Intermitten connection reset by peer symptom recurred 3 days after the workaround was applied.

younsl · 2024-04-23T11:26:16Z

~~Anything in progress for next release?~~ I found updated progress #204 (comment), #256.

jayanthvn · 2024-06-11T23:32:46Z

Fix is released with network policy agent v1.1.2. - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.18.2. Please test and let us know if there are any issues.

younsl · 2024-06-12T15:28:20Z

Hi, @jayanthvn. I'm still experiencing intermittent connection resets in network-policy-agent:v1.1.2-eksbuild.1.

Environment

Cluster: EKS v1.28 (amd64 EC2)
Addon: VPC CNI v1.18.2-eksbuild.1 with network-policy-agent v1.1.2-eksbuild.1

Mitigation measures

Tested the two changes below to resolve the read ECONNRESET error that occurs intermittently in some pods. Observed after change from Jun 7, 17:46 (KST).

Upgrade the VPC CNI addon from v1.18.1 to v1.18.2-eksbuild.1 and upgrade the network-policy-agent container version from v1.1.1 to v1.1.2 as well.
Change the conntrack-cache-cleanup-period argument from the default value of 300 (5m) to 21600 (6h).

Detailed changes

VPC CNI version info:

$ kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2-3
amazon-k8s-cni-init:v1.18.2-eksbuild.1
amazon-k8s-cni:v1.18.2-eksbuild.1
amazon/aws-network-policy-agent:v1.1.2-eksbuild.1

NETWORK_POLICY_ENFORCING_MODE setting currently defaults to standard, not strict.

Diff for conntrack-cache-cleanup-period argument:

      - args:
-       - --conntrack-cache-cleanup-period=300   # 5m (default)
+       - --conntrack-cache-cleanup-period=21600 # 6h

Problem

However, even after upgrading to VPC CNI v1.18.2-eksbuild.1, which includes network-policy-agent v1.1.2, some application pods are still experiencing intermittent packet drops, such as [read ECONNRESET](err: server not responding (read ECONNRESET)) errors.

After checking internally with the developer, it was found that the retry logic for connection failures is not included in the application container affected by the network issues.

Timeline of read ECONNRESET errors occurring in some pods (starting after upgrading to network-policy-agent v1.1.2):

Note

( ) indicate the time difference from the last occurrence.
All times listed in the timeline below are in KST (UTC+09:00).

6/10 08:00 (6h)
6/10 02:00 (7h 39m)
6/9 18:21 (2m)
6/9 18:19 (6m)
6/9 18:13 (1h 36m)
6/9 16:37 (1h 25m)
6/9 15:12 (7h 57m)
6/9 07:15 (2h 38m)
6/9 04:37 (12h 26m)
6/8 16:11 (4h 23m)
6/8 11:48 (5h 34m)
6/8 06:14

jayanthvn · 2024-06-12T16:11:39Z

@younsl - as shared internally with the service ticket, the timeouts are not inline with conntrack cleanup since you have the cleanup every 6 hours while timeout is happening at varied times. At these times are you noticing a spike in network policy agent conntrack cache? One suspect is cache is getting full leading to certain entries getting evicted and causing timeouts..

younsl · 2024-06-21T02:38:36Z

@jayanthvn I'm waiting to test PR #280 on my affected clusters.

younsl · 2024-07-10T00:36:47Z

Wrap-up after meeting with EKS team

I enabled network-policy-agent log on my dev EKS v1.28 cluster.

VPC CNI yaml:

      - args:
        - --enable-ipv6=false
        - --enable-network-policy=true
        - --enable-cloudwatch-logs=false
-       - --enable-policy-event-logs=false
+       - --enable-policy-event-logs=true

So I will back to submit conntrack cleanup log for network-policy-agent ASAP.

Testing environment

EKS v1.28 (amd64)
VPC CNI v1.18.2-eksbuild.1
self-builded aws-network-policy-agent include PR #280

$ kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2-3
amazon-k8s-cni-init:v1.18.2-eksbuild.1
amazon-k8s-cni:v1.18.2-eksbuild.1
aws-network-policy-agent:v1.1.1-13-gda05900-dirty

younsl added the bug Something isn't working label Apr 11, 2024

younsl mentioned this issue Apr 11, 2024

Random Client Read Timeouts to RDS After Enabling Network Policies via AWS CNI #204

Closed

jayanthvn transferred this issue from aws/amazon-vpc-cni-k8s Apr 11, 2024

jayanthvn mentioned this issue Apr 11, 2024

Possible race condition in conntrack map cleanup #246

Closed

jayanthvn closed this as completed Jun 11, 2024

jayanthvn reopened this Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent connection reset and delay running time #245

Intermittent connection reset and delay running time #245

younsl commented Apr 11, 2024 •

edited

Loading

jayanthvn commented Apr 11, 2024

younsl commented Apr 11, 2024

jayanthvn commented Apr 11, 2024

younsl commented Apr 12, 2024 •

edited

Loading

younsl commented Apr 12, 2024

achevuru commented Apr 12, 2024 •

edited

Loading

younsl commented Apr 19, 2024 •

edited

Loading

younsl commented Apr 23, 2024 •

edited

Loading

jayanthvn commented Jun 11, 2024

younsl commented Jun 12, 2024

jayanthvn commented Jun 12, 2024

younsl commented Jun 21, 2024

younsl commented Jul 10, 2024 •

edited

Loading

Intermittent connection reset and delay running time #245

Intermittent connection reset and delay running time #245

Comments

younsl commented Apr 11, 2024 • edited Loading

Background

Cluster environment

Network policy issues

1. Intermittent connection reset by peer

2. Delayed Running time

1. Intermittent connection reset by peer

2. Delayed Running time

jayanthvn commented Apr 11, 2024

younsl commented Apr 11, 2024

jayanthvn commented Apr 11, 2024

younsl commented Apr 12, 2024 • edited Loading

TLDR

Workarounds

1. Intermittent connection reset by peer

1-1. Workaround

1-2. Record for issue resolution

2. Delayed readiness time

2-1. Workaround

2-2. Record for issue resolution

younsl commented Apr 12, 2024

achevuru commented Apr 12, 2024 • edited Loading

younsl commented Apr 19, 2024 • edited Loading

younsl commented Apr 23, 2024 • edited Loading

jayanthvn commented Jun 11, 2024

younsl commented Jun 12, 2024

Environment

Mitigation measures

Detailed changes

Problem

jayanthvn commented Jun 12, 2024

younsl commented Jun 21, 2024

younsl commented Jul 10, 2024 • edited Loading

Wrap-up after meeting with EKS team

Testing environment

younsl commented Apr 11, 2024 •

edited

Loading

younsl commented Apr 12, 2024 •

edited

Loading

achevuru commented Apr 12, 2024 •

edited

Loading

younsl commented Apr 19, 2024 •

edited

Loading

younsl commented Apr 23, 2024 •

edited

Loading

younsl commented Jul 10, 2024 •

edited

Loading