Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent connection reset and delay running time #245

Open
younsl opened this issue Apr 11, 2024 · 8 comments
Open

Intermittent connection reset and delay running time #245

younsl opened this issue Apr 11, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@younsl
Copy link

younsl commented Apr 11, 2024

What happened:

Background

After migrating the network policy provider from Calico v3.25.1 and Tigera Operator to VPC CNI v1.18.0-eksbuild.1, the following two network policy issues occurred on EKS v1.26 cluster:

Note:
The two network issues mentioned in this issue only occur in pods that have applied NetworkPolicy via VPC CNI.

Cluster environment

  • EKS v1.26
  • Affected Worker Node
  • EKS Addons
    • VPC CNI v1.18.0-eksbuild.1 : enabled network policy feature by apply {"enableNetworkPolicy":"true"} setting in advanced configuration
    • kube-proxy : v1.26.11-minimal-eksbuild.1 / enabled in iptables mode.
$ kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2-3
amazon-k8s-cni-init:v1.18.0-eksbuild.1
amazon-k8s-cni:v1.18.0-eksbuild.1
amazon/aws-network-policy-agent:v1.1.0-eksbuild.1

Network policy issues

1. Intermittent connection reset by peer

occurs during Pod to Pod or Pod to EC2 communication.

Similar issues: #204, #210, #236

2. Delayed Running time

Delay in the time it takes for the pod to run.

image

For pods to which Network Policy is applied, the time it takes to activate the Readiness Probe is up to 3 times slower.

Category 1st time 2nd time
egress netpol applied 86s 91s
egress netpol not applied 30s 32s

Similar issues: #189, #186

Attach logs

1. Intermittent connection reset by peer

[tcpdump] From workload pod to EC2 instance

$ tcpdump -i any '(dst host xx.xxx.xxx.247 and dst port 8080) or (tcp[tcpflags] & tcp-rst != 0)'
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

07:40:05.607233 eth0  Out IP <REDACTED>-5bc847767c-56l5j.47664 > ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080: Flags [S], seq 4216205005, win 62727, options [mss 8961,sackOK,TS val 896125324 ecr 0,nop,wscale 7], length 0
07:40:05.607607 eth0  In  IP ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080 > <REDACTED>-5bc847767c-56l5j.47664: Flags [R.], seq 0, ack 4216205006, win 0, length 0

... repeated ...

07:42:25.807039 eth0  Out IP <REDACTED>-5bc847767c-56l5j.39080 > ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080: Flags [S], seq 1628133200, win 62727, options [mss 8961,sackOK,TS val 896265524 ecr 0,nop,wscale 7], length 0
07:42:25.807463 eth0  In  IP ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080 > <REDACTED>-5bc847767c-56l5j.39080: Flags [R.], seq 0, ack 1628133201, win 0, length 0

Intermittently, the workload pod receives a resetRST packet response from EC2.

[kubectl sniff] From workload pod to EC2 instance

image
# Source IP started with 221 is workload pod
# Destination IP ended with 247 is EC2 instance.
INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - port 8080]' on container: '<REDACTED>', pod: '<REDACTED>-5bc847767c-56l5j', namespace: 'b<REDACTED>'
    1   0.000000 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 53456 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897677821 TSecr=0 WS=128
    2   0.000433 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 53456 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    3  70.178815 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 33610 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897748000 TSecr=0 WS=128
    4  70.179236 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 33610 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    5 140.194787 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 51794 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897818016 TSecr=0 WS=128
    6 140.195225 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 51794 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    7 210.383682 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 32906 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897888204 TSecr=0 WS=128
    8 210.384104 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 32906 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    9 280.399946 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 55182 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897958221 TSecr=0 WS=128
   10 280.400365 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 55182 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   11 351.055956 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 42620 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898028877 TSecr=0 WS=128
   12 351.056403 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 42620 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   13 421.076549 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 49318 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898098897 TSecr=0 WS=128
   14 421.076943 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 49318 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   15 495.679313 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 45716 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898173500 TSecr=0 WS=128
   16 495.679744 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 45716 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   17 565.699845 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 46792 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898243521 TSecr=0 WS=128
   18 565.700281 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 46792 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

intermittently, the workload pod receives a resetRST packet response from EC2.

If a issue occurs in the workload pod, the slack notification below is output.

image

2. Delayed Running time

Captured ebpf-sdk log on worker node immediately after pod restart

$ tail -f /var/log/aws-routed-eni/ebpf-sdk.log | grep 'xx.xxx.xx.117'
{"level":"info","ts":"2024-04-11T02:25:29.435Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.437Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.584Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.586Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.750Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.752Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.888Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.890Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.012Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.014Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.073Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.074Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.596Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.598Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.665Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.667Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.831Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.832Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.929Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.931Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.009Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.010Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.182Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.185Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}

A Deny log occurs from the destination Service (Cluster IP) IP 172.20.67.165.

$ tail -f /var/log/aws-routed-eni/network-policy-agent.log | egrep 'xx.xxx.xxx.198'
{"level":"info","ts":"2024-04-11T02:18:47.938Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:18:49.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":45868,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:18:51.394Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:18:58.054Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:18:59.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":41568,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:09.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":49904,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:09.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":49908,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:10.642Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:10.758Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:10.926Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.011Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.250Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.358Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.362Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:19:19.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":56462,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:29.430Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":42600,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.242Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.247Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.198","Src Port":42681,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.249Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.198","Src Port":47922,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.249Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.198","Src Port":57360,"Dest IP":"172.20.67.165","Dest Port":80,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:39.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":37192,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:39.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":37196,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

The same network issue occurred in all VPC CNI v1.16.0, v1.16.1, and v1.18.0 versions.

Environment:

  • Kubernetes version (use kubectl version): v1.26.12-eks-5e0fdde
  • CNI Version: VPC CNI v1.18.0-eksbuild.1
  • OS (e.g: cat /etc/os-release): Amazon Linux release 2 (Karoo)
  • Kernel (e.g. uname -a): 5.10.205-195.804.amzn2.x86_64
@younsl younsl added the bug Something isn't working label Apr 11, 2024
@jayanthvn jayanthvn transferred this issue from aws/amazon-vpc-cni-k8s Apr 11, 2024
@jayanthvn
Copy link
Contributor

We will look into this and get back. Btw is this easily reproducible?

@younsl
Copy link
Author

younsl commented Apr 11, 2024

@jayanthvn Yes, reproducible. But intermittent, it may take time for it to recur.

@jayanthvn
Copy link
Contributor

Thanks. We will review the logs and get back to you.

@younsl
Copy link
Author

younsl commented Apr 12, 2024

Our team found workarounds for two network policy issues through internal testing today.

Maybe it is not a fundamental solution.

Note:
Sensitive information such as IP address and application name has been REDACT<REDACTED>.

TLDR

Symptom 1. Intermittent connection reset by peer is resolved.

  • Workaround: Connect to the worker node and manually mount bpffs

Symptom 2. Delayed readiness time is resolved.

  • Workaround: explicitly allows ingress from 172.20.0.0/16, which is the Kubernetes service IP range.

Workarounds

1. Intermittent connection reset by peer

1-1. Workaround

As mentioned in Amazon EKS official documentation, intermitten connection reset does not occur when bpffs is mounted on an EC2 Worker Node.

1-2. Record for issue resolution

Check the Kernel version and AMI version of the worker node.

The workload pod(source) that was experiencing the intermitten connection reset symptom was scheduled on the worker node.

$ kubectl get node -o wide ip-xx-xxx-xx-98.ap-northeast-2.compute.internal
NAME                                              STATUS   ROLES    AGE   VERSION                INTERNAL-IP    EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-xx-xxx-xx-98.ap-northeast-2.compute.internal   Ready    <none>   69d   v1.26.12-eks-5e0fdde   xx.xxx.xx.98   <none>        Amazon Linux 2   5.10.205-195.804.amzn2.x86_64   containerd://1.7.2
  • EKS v1.26
  • Affected Worker Node

 

Connect to the worker node and manually mount bpffs:

# Mount bpf filesystem in worker node
sudo mount -t bpf bpffs /sys/fs/bpf
$ mount -l | grep bpf
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
none on /sys/fs/bpf type bpf (rw,relatime)
none on /sys/fs/bpf type bpf (rw,relatime)
none on /sys/fs/bpf type bpf (rw,relatime)

Question:
Even though prerequisites are met, why do I have to mount the BPF filesystembpffs on the worker node to resolve the symptom?

Prerequisite for network policy of VPC CNI
For all other cluster versions, if you upgrade the Amazon EKS optimized Amazon Linux to version v20230703 or later or you upgrade the Bottlerocket AMI to version v1.0.2 or later, you can skip this step.

Reference: EKS User guide

 

Since mounting bpffs before work on April 11th, the read ECONNRESET error has not occurred.

image

 

2. Delayed readiness time

2-1. Workaround

Add an Ingress netpol to the workload pod (source).
This Ingress netpol explicitly allows ingress from 172.20.0.0/16, which is the Kubernetes service IP range.

image

2-2. Record for issue resolution

Network policy enforcing mode is set to standard, as the default.

$ kubectl get ds -n kube-system aws-node -o yaml
      containers:
      - env:
        - name: NETWORK_POLICY_ENFORCING_MODE
          value: standard
        name: aws-node

Create a new ingress netpol that ‘explicitly’ allows ingress from the Kubernetes Service IP range.

$ kubectl get netpol -n <REDACTED> ingress-service -o yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  ...
spec:
  ingress:
  - from:
    - ipBlock:
        cidr: 172.20.0.0/16
    ports:
    - endPort: 65535
      port: 1
      protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/networkpolicy-ingress-service: apply
  policyTypes:
  - Ingress
status: {}
$ kubectl get pod -n <REDACTED> t<REDACTED> -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/restartedAt: "2024-04-12T12:30:49+09:00"
  creationTimestamp: "2024-04-12T06:14:02Z"
  generateName: t<REDACTED>-cc878cb69-
  labels:
    ...
    app.kubernetes.io/networkpolicy-ingress-service: apply
    app.kubernetes.io/networkpolicy-ingress-t<REDACTED>: apply
    pod-template-hash: cc878cb69

To explicitly allow Kubernetes service IP range, added app.kubernetes.io/networkpolicy-ingress-service to the Pod label.

 

Delayed readiness time issue was resolved after explicitly attaching the ingress netpol to the workload pod(source) as shown in this comment.

$ tail -f /var/log/aws-routed-eni/network-policy-agent.log | egrep 'xx.xxx.xx.159'
{"level":"info","ts":"2024-04-12T06:09:36.079Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":43798,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:36.080Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":53898,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:36.081Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":55208,"Dest IP":"172.20.67.165","Dest Port":80,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:41.037Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":56070,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:41.037Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":46937,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}

... No deny logs from kubernetes service IP starting with 172.20.67.165 ...

{"level":"info","ts":"2024-04-12T06:10:03.016Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.294Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.480Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.528Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.678Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.716Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:11.661Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.34.98","Src Port":38850,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:10:13.554Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.10.163","Src Port":19310,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:10:13.606Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.11.99","Src Port":22244,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:10:14.004Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.29.161","Src Port":52804,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}

 

Captured conntrack list between workload pod(source, ends with 35.242) and cluster IP for destination pod(172.20.67.165)

# Run on the worker node where the source pod is scheduled
$ conntrack -L --src ss.sss.35.242 --dst 172.20.67.165
tcp      6 118 TIME_WAIT src=ss.sss.35.242 dst=172.20.67.165 sport=55938 dport=80 src=ss.sss.19.208 dst=ss.sss.35.242 sport=8080 dport=55938 [ASSURED] mark=0 use=1
tcp      6 431998 ESTABLISHED src=ss.sss.35.242 dst=172.20.67.165 sport=55944 dport=80 src=ss.sss.21.1 dst=ss.sss.35.242 sport=8080 dport=55944 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 2 flow entries have been shown.

I observed with conntrack -L command that connection ESTABLISHED rapidly between workload pod(source) and destination pod(service IP).

# Run on the worker node where the source pod is scheduled
$ conntrack -L --src ss.sss.35.242 --dst 172.20.67.165
tcp      6 98 TIME_WAIT src=ss.sss.35.242 dst=172.20.67.165 sport=55938 dport=80 src=ss.sss.19.208 dst=ss.sss.35.242 sport=8080 dport=55938 [ASSURED] mark=0 use=1
tcp      6 102 TIME_WAIT src=ss.sss.35.242 dst=172.20.67.165 sport=55944 dport=80 src=ss.sss.21.1 dst=ss.sss.35.242 sport=8080 dport=55944 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 2 flow entries have been shown.

 

It was observed that the ready time of the pod was dramatically reduced from 92 seconds to 32 seconds.

$ kubectl get pod -n <REDACTED> -l app.kubernetes.io/name=t<REDACTED>
NAME                          READY   STATUS    RESTARTS   AGE
t<REDACTED>-cc878cb69-8tmt9   1/1     Running   0          33s

At this point, Readiness time for the workload(source) pod back to normal.

The readiness time comparison is:

Source Pod Egress netpol Time to readiness Status
Workload Pod egress netpol, without ingress netpol 92s Delayed
Workload Pod egress netpol and attached ingress netpol newly 32s Normal

@younsl
Copy link
Author

younsl commented Apr 12, 2024

@jayanthvn @achevuru I submitted a node-level support bundles to k8s-awscni-triage@amazon.com.

node-level support-bundle collection
# Collect node level tech-support bundle in the affected worker node
$ /opt/cni/bin/aws-cni-support.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    56  100    56    0     0  34825      0 --:--:-- --:--:-- --:--:-- 56000

	This is version 0.7.6. New versions can be found at https://github.com/awslabs/amazon-eks-ami/blob/master/log-collector-script/

Trying to collect common operating system logs...
Trying to collect kernel logs...
Trying to collect modinfo...
Trying to collect mount points and volume information...
Trying to collect SELinux status...
Trying to collect iptables information...
Trying to collect iptables-legacy information...
Trying to collect installed packages...
Trying to collect active system services...
Trying to Collect Containerd daemon information...
Trying to Collect Containerd running information...
Trying to Collect Docker daemon information...

	Warning: The Docker daemon is not running.

Trying to collect kubelet information...
Trying to collect L-IPAMD introspection information...
Trying to collect L-IPAMD prometheus metrics...
Trying to collect L-IPAMD checkpoint...
Trying to collect Multus logs if they exist...
Trying to collect sysctls information...
Trying to collect networking infomation... conntrack v1.4.4 (conntrack-tools): 5437 flow entries have been shown.

Trying to collect CNI configuration information...
Trying to collect CNI Configuration Variables from Docker...

	Warning: The Docker daemon is not running.
Trying to collect CNI Configuration Variables from Containerd...
Trying to collect network policy ebpf loaded data...
Trying to collect Docker daemon logs...
Trying to Collect sandbox-image daemon information...
Trying to Collect CPU Throttled Process Information...
Trying to Collect IO Throttled Process Information...
Trying to archive gathered information...

	Done... your bundled logs are located in /var/log/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz
$ ls -lh /var/log/eks*
-rw-r--r-- 1 root root 54M Apr 12 09:44 /var/log/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz
# Download tech-support bundle from node filesystem to local
$ kubectl cp default/nsenter-xxxxxx:/var/log/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz $HOME/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz
tar: Removing leading `/' from member names

Reference: troubleshooting

@achevuru
Copy link
Contributor

achevuru commented Apr 12, 2024

@younsl Thanks for sharing your findings with us.

  • For delayed readiness time issue :- It is expected as we discussed on the call (i.e.,) if the pod attempts to start a connection before the eBPF probes are configured against the pod interface. Response packets can potentially be dropped if the probes are setup before the response packet reaches the source pod. Refer to this comment for detailed explanation.

    Our recommended solution for this is Strict mode, which will gate pod launch until policies are configured against the newly launched pod. If you don't want to migrate to Strict mode due to other limitations, then you can probably consider workarounds included in the above comment. With your workaround, you're explicitly allowing Service CIDR and so that explicitly allows the return packet coming from any cluster local service there by bypassing the conntrack requirement. If you're Ok with this ingress rule then this is a viable workaround for in cluster traffic assuming your pods only talk to other pods in the cluster via Service VIPs.

  • Intermittent connection reset by peer :- As you can see, there were already multiple BPF FS mounts in your command o/p (mount -l | grep bpf) and there is no need to mount the BPF FS again. If BPF FS isn't mounted then nothing will work and Network policy agent will crash. So, this shouldn't fix anything and considering your issue is intermittent it is probably pure chance that you haven't run in to it in the last 1 day. We believe you're running in to the same issue that is called out in Possible race condition in conntrack map cleanup #246 . Basically if a new connection is in the process of being established right at the time when the conntrack clean up routine runs, NP agent can incorreclty expire the entry away from it's own conntrack table leading to intermittent connection resets and the subsequent retry addresses this. We will address this asap.

@younsl
Copy link
Author

younsl commented Apr 19, 2024

@achevuru So, this shouldn't fix anything and considering your issue is intermittent it is probably pure chance that you haven't run in to it in the last 1 day.

[updated] Yes, you're right.

Even though the file system bpffs was mounted on all worker nodes, the Intermitten connection reset by peer symptom recurred 3 days after the workaround was applied.

@younsl
Copy link
Author

younsl commented Apr 23, 2024

Anything in progress for next release? I found updated progress #204 (comment), #256.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants