Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium 1.13.0 restart requires an app reconnect #24191

Closed
aanm opened this issue Mar 6, 2023 · 12 comments · Fixed by #24336
Closed

Cilium 1.13.0 restart requires an app reconnect #24191

aanm opened this issue Mar 6, 2023 · 12 comments · Fixed by #24336
Assignees
Labels
kind/bug This is a bug in the Cilium logic. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. upgrade-impact This PR has potential upgrade or downgrade impact.

Comments

@aanm
Copy link
Member

aanm commented Mar 6, 2023

Start minikube

minikube start --driver=docker --network-plugin=cni

Install Cilium

$ cilium version
cilium-cli: v0.13.1 compiled with go1.20 on linux/amd64
cilium image (default): v1.13.0
cilium image (stable): v1.13.0
$
$ cilium install --version v1.13.0

Wait until Cilium pods are ready

sleep 2s
kubectl wait --for=condition=Ready=true pods -l k8s-app=cilium -n kube-system --timeout=60s

Deploy some apps

kubectl apply --force=false -f ./test/k8s/manifests/migrate-svc-client.yaml
kubectl apply --force=false -f ./test/k8s/manifests/migrate-svc-server.yaml

Wait until the apps are ready:

kubectl wait --for=condition=Ready=true pods -l zgroup=migrate-svc
$ kubectl get pods -A -o wide -w
NAMESPACE     NAME                               READY   STATUS    RESTARTS        AGE     IP             NODE       NOMINATED NODE   READINESS GATES
default       migrate-svc-client-2jsl7           1/1     Running   0               20s     10.0.0.210     minikube   <none>           <none>
default       migrate-svc-client-4dpt8           1/1     Running   0               20s     10.0.0.45      minikube   <none>           <none>
default       migrate-svc-client-884zx           1/1     Running   0               20s     10.0.0.106     minikube   <none>           <none>
default       migrate-svc-client-bf7qd           1/1     Running   0               20s     10.0.0.158     minikube   <none>           <none>
default       migrate-svc-client-qhxds           1/1     Running   0               20s     10.0.0.93      minikube   <none>           <none>
default       migrate-svc-server-dch4f           1/1     Running   0               20s     10.0.0.241     minikube   <none>           <none>
default       migrate-svc-server-fqfjp           1/1     Running   0               20s     10.0.0.154     minikube   <none>           <none>
default       migrate-svc-server-fxxth           1/1     Running   0               20s     10.0.0.244     minikube   <none>           <none>
kube-system   cilium-m8w6j                       1/1     Running   0               63s     192.168.49.2   minikube   <none>           <none>
kube-system   cilium-operator-864b7b486c-sbz5h   1/1     Running   0               63s     192.168.49.2   minikube   <none>           <none>
kube-system   coredns-787d4945fb-prgjg           1/1     Running   0               30s     10.0.0.157     minikube   <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0               4m44s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0               4m42s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running   0               4m44s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-proxy-l595b                   1/1     Running   0               4m28s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running   0               4m42s   192.168.49.2   minikube   <none>           <none>
kube-system   storage-provisioner                1/1     Running   1 (3m58s ago)   4m41s   192.168.49.2   minikube   <none>           <none>

Restart Cilium agent:

kubectl delete pod  -n kube-system -l k8s-app=cilium

Check that apps got restarted:

kubectl get pods -A -o wide 
NAMESPACE     NAME                               READY   STATUS    RESTARTS        AGE     IP             NODE       NOMINATED NODE   READINESS GATES
default       migrate-svc-client-2jsl7           1/1     Running   2 (96s ago)     2m15s   10.0.0.210     minikube   <none>           <none>
default       migrate-svc-client-4dpt8           1/1     Running   2 (96s ago)     2m15s   10.0.0.45      minikube   <none>           <none>
default       migrate-svc-client-884zx           1/1     Running   2 (96s ago)     2m15s   10.0.0.106     minikube   <none>           <none>
default       migrate-svc-client-bf7qd           1/1     Running   2 (96s ago)     2m15s   10.0.0.158     minikube   <none>           <none>
default       migrate-svc-client-qhxds           1/1     Running   2 (96s ago)     2m15s   10.0.0.93      minikube   <none>           <none>
default       migrate-svc-server-dch4f           1/1     Running   1 (84s ago)     2m15s   10.0.0.241     minikube   <none>           <none>
default       migrate-svc-server-fqfjp           1/1     Running   1 (84s ago)     2m15s   10.0.0.154     minikube   <none>           <none>
default       migrate-svc-server-fxxth           1/1     Running   1 (84s ago)     2m15s   10.0.0.244     minikube   <none>           <none>
kube-system   cilium-l627f                       1/1     Running   0               106s    192.168.49.2   minikube   <none>           <none>
kube-system   cilium-operator-864b7b486c-sbz5h   1/1     Running   0               2m58s   192.168.49.2   minikube   <none>           <none>
kube-system   coredns-787d4945fb-prgjg           1/1     Running   0               2m25s   10.0.0.157     minikube   <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0               6m39s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0               6m37s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running   0               6m39s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-proxy-l595b                   1/1     Running   0               6m23s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running   0               6m37s   192.168.49.2   minikube   <none>           <none>
kube-system   storage-provisioner                1/1     Running   1 (5m53s ago)   6m36s   192.168.49.2   minikube   <none>           <none>

The above bug is NOT happening with Cilium 1.12.5

@aanm aanm added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. upgrade-impact This PR has potential upgrade or downgrade impact. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. labels Mar 6, 2023
@brb
Copy link
Member

brb commented Mar 6, 2023

@aanm Thanks for the issue! Do you have a sysdump before and after?

@aanm
Copy link
Member Author

aanm commented Mar 6, 2023

@aanm Thanks for the issue! Do you have a sysdump before and after?

@brb
before: cilium-sysdump-20230306-144708.zip
after: cilium-sysdump-20230306-144842.zip

$ kubectl get pods -A -o wide 
NAMESPACE     NAME                               READY   STATUS    RESTARTS       AGE     IP             NODE       NOMINATED NODE   READINESS GATES
default       migrate-svc-client-2jjbp           1/1     Running   2 (36s ago)    2m15s   10.0.0.120     minikube   <none>           <none>
default       migrate-svc-client-45s7g           1/1     Running   2 (36s ago)    2m14s   10.0.0.146     minikube   <none>           <none>
default       migrate-svc-client-dz58v           1/1     Running   2 (36s ago)    2m14s   10.0.0.211     minikube   <none>           <none>
default       migrate-svc-client-z9946           1/1     Running   2 (36s ago)    2m14s   10.0.0.74      minikube   <none>           <none>
default       migrate-svc-client-ztq2j           1/1     Running   2 (36s ago)    2m14s   10.0.0.56      minikube   <none>           <none>
default       migrate-svc-server-bkhz5           1/1     Running   0              2m14s   10.0.0.15      minikube   <none>           <none>
default       migrate-svc-server-hbb8j           1/1     Running   1 (24s ago)    2m14s   10.0.0.72      minikube   <none>           <none>
default       migrate-svc-server-rnkhf           1/1     Running   1 (24s ago)    2m14s   10.0.0.249     minikube   <none>           <none>
kube-system   cilium-operator-864b7b486c-k79k7   1/1     Running   0              2m53s   192.168.49.2   minikube   <none>           <none>
kube-system   cilium-r4dgv                       1/1     Running   0              46s     192.168.49.2   minikube   <none>           <none>
kube-system   coredns-787d4945fb-q6jsr           1/1     Running   0              2m21s   10.0.0.226     minikube   <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0              3m44s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0              3m47s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running   0              3m44s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-proxy-jtprb                   1/1     Running   0              3m31s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running   0              3m44s   192.168.49.2   minikube   <none>           <none>
kube-system   storage-provisioner                1/1     Running   1 (3m1s ago)   3m44s   192.168.49.2   minikube   <none>           <none>

@aanm aanm assigned brb Mar 6, 2023
@brb
Copy link
Member

brb commented Mar 6, 2023

Thanks. The migrate-svc-* pod logs are missing. Could you paste them collected after the upgrade?

@aanm
Copy link
Member Author

aanm commented Mar 6, 2023

Thanks. The migrate-svc-* pod logs are missing. Could you paste them collected after the upgrade?

@brb

$ kubectl get pods -A  -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS        AGE     IP             NODE       NOMINATED NODE   READINESS GATES
default       migrate-svc-client-7d5vr           1/1     Running   2 (2m42s ago)   4m38s   10.0.0.119     minikube   <none>           <none>
default       migrate-svc-client-7d745           1/1     Running   2 (2m42s ago)   4m38s   10.0.0.25      minikube   <none>           <none>
default       migrate-svc-client-bss2b           1/1     Running   2 (2m42s ago)   4m38s   10.0.0.40      minikube   <none>           <none>
default       migrate-svc-client-dpplz           1/1     Running   2 (2m42s ago)   4m38s   10.0.0.228     minikube   <none>           <none>
default       migrate-svc-client-mgb6b           1/1     Running   3 (2m29s ago)   4m38s   10.0.0.145     minikube   <none>           <none>
default       migrate-svc-server-692zr           1/1     Running   1 (2m30s ago)   4m38s   10.0.0.106     minikube   <none>           <none>
default       migrate-svc-server-gjgn5           1/1     Running   1 (2m30s ago)   4m38s   10.0.0.221     minikube   <none>           <none>
default       migrate-svc-server-nktkt           1/1     Running   0               4m38s   10.0.0.38      minikube   <none>           <none>
kube-system   cilium-8c6jl                       1/1     Running   0               2m51s   192.168.49.2   minikube   <none>           <none>
kube-system   cilium-operator-864b7b486c-9mdsx   1/1     Running   0               5m22s   192.168.49.2   minikube   <none>           <none>
kube-system   coredns-787d4945fb-b6btd           1/1     Running   0               4m41s   10.0.0.210     minikube   <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0               5m35s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0               5m35s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running   0               5m34s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-proxy-m6qfk                   1/1     Running   0               5m23s   192.168.49.2   minikube   <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running   0               5m35s   192.168.49.2   minikube   <none>           <none>
kube-system   storage-provisioner                1/1     Running   1 (4m51s ago)   5m34s   192.168.49.2   minikube   <none>           <none>

logs.zip

@brb
Copy link
Member

brb commented Mar 6, 2023

Hmm, nothing interesting in the logs. Also, no changes in the drop counters. The affected packet path is the bpf_lxc's per-packet LB to ClusterIP.

@pchaigno
Copy link
Member

pchaigno commented Mar 6, 2023

There won't be any drop counter increasing: the apps restarted because of TCP RSTs.

@pchaigno
Copy link
Member

pchaigno commented Mar 6, 2023

One example from the last after sysdump:

$ cat hubble-flows-cilium-r4dgv-20230306-144842.json | ~/hubble/hubble observe --port 48158 --numeric
Mar  6 13:48:01.067: 10.106.98.104:8000 (world) -> 10.0.0.211:48158 (ID:55227) to-endpoint FORWARDED (TCP Flags: SYN, ACK)
Mar  6 13:48:01.067: 10.106.98.104:8000 (world) -> 10.0.0.211:48158 (ID:55227) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Mar  6 13:48:01.067: 10.106.98.104:8000 (world) -> 10.0.0.211:48158 (ID:55227) to-endpoint FORWARDED (TCP Flags: ACK)
Mar  6 13:48:02.568: 10.0.0.211:48158 (ID:55227) -> 10.0.0.249:8000 (ID:22641) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Mar  6 13:48:02.568: 10.0.0.211:48158 (ID:55227) <- 10.0.0.249:8000 (ID:22641) to-endpoint FORWARDED (TCP Flags: RST)
Mar  6 13:48:17.120: 10.106.98.104:8000 (ID:22641) -> 10.0.0.211:48158 (ID:55227) to-endpoint FORWARDED (TCP Flags: ACK)
Mar  6 13:48:17.120: 10.0.0.211:48158 (ID:55227) -> 10.0.0.249:8000 (ID:22641) to-endpoint FORWARDED (TCP Flags: RST)
Mar  6 13:48:32.351: 10.106.98.104:8000 (ID:22641) -> 10.0.0.211:48158 (ID:55227) to-endpoint FORWARDED (TCP Flags: ACK)
Mar  6 13:48:32.351: 10.0.0.211:48158 (ID:55227) -> 10.0.0.249:8000 (ID:22641) to-endpoint FORWARDED (TCP Flags: RST)
Mar  6 13:48:47.711: 10.106.98.104:8000 (ID:22641) -> 10.0.0.211:48158 (ID:55227) to-endpoint FORWARDED (TCP Flags: ACK)
Mar  6 13:48:47.711: 10.0.0.211:48158 (ID:55227) -> 10.0.0.249:8000 (ID:22641) to-endpoint FORWARDED (TCP Flags: RST)

No surprises here, we have a NAT issue. The server IP goes from 10.106.98.104 (clusterIP, VIP) to 10.0.0.249 (endpoint IP, backend). That trace point is right before we deliver in the destination/client pod so we shouldn't see a backend IP.

@pchaigno
Copy link
Member

pchaigno commented Mar 6, 2023

Corresponding CT entries:

$ git grep :48158 cilium-bugtool-cilium-r4dgv-20230306-144842/cmd/cilium-bpf-ct-list-global.md
cilium-bugtool-cilium-r4dgv-20230306-144842/cmd/cilium-bpf-ct-list-global.md:TCP IN 10.0.0.211:48158 -> 10.0.0.249:8000 expires=126997 RxPackets=3 RxBytes=430 RxFlagsSeen=0x06 LastRxReport=126987 TxPackets=1 TxBytes=54 TxFlagsSeen=0x04 LastTxReport=126958 Flags=0x0013 [ RxClosing TxClosing SeenNonSyn ] RevNAT=0 SourceSecurityID=55227 IfIndex=0 
cilium-bugtool-cilium-r4dgv-20230306-144842/cmd/cilium-bpf-ct-list-global.md:TCP OUT 10.0.0.211:48158 -> 10.0.0.249:8000 expires=126997 RxPackets=1 RxBytes=54 RxFlagsSeen=0x04 LastRxReport=126958 TxPackets=3 TxBytes=430 TxFlagsSeen=0x06 LastTxReport=126987 Flags=0x0013 [ RxClosing TxClosing SeenNonSyn ] RevNAT=4 SourceSecurityID=55227 IfIndex=0 
cilium-bugtool-cilium-r4dgv-20230306-144842/cmd/cilium-bpf-ct-list-global.md:TCP IN 10.106.98.104:8000 -> 10.0.0.211:48158 expires=148587 RxPackets=7 RxBytes=1238 RxFlagsSeen=0x1a LastRxReport=126987 TxPackets=0 TxBytes=0 TxFlagsSeen=0x00 LastTxReport=0 Flags=0x0010 [ SeenNonSyn ] RevNAT=0 SourceSecurityID=2 IfIndex=0 
cilium-bugtool-cilium-r4dgv-20230306-144842/cmd/cilium-bpf-ct-list-global.md:TCP OUT 10.106.98.104:8000 -> 10.0.0.211:48158 service expires=148587 RxPackets=0 RxBytes=11 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x06 LastTxReport=126987 Flags=0x0012 [ TxClosing SeenNonSyn ] RevNAT=4 SourceSecurityID=0 IfIndex=0

Reverse NAT entries:

$ cat cilium-bugtool-cilium-r4dgv-20230306-144842/cmd/cilium-bpf-lb-list---revnat.md 
ID   BACKEND ADDRESS (REVNAT_ID) (SLOT)
2    10.96.0.10:9153      
4    10.106.98.104:8000   
1    10.96.0.1:443        
3    10.96.0.10:53

@brb brb removed their assignment Mar 7, 2023
@brb
Copy link
Member

brb commented Mar 7, 2023

I couldn't reproduce it on Kind with multiple K8s Nodes. But once I switch to a single Node cluster (i.e., only with kind-control-plane), it's fairly easy to reproduce. From tcpdump / pwru output I see the same issue as described by @pchaigno - the rev DNAT xlation for the ClusterIP didn't happen, thus the TCP RST.

@brb
Copy link
Member

brb commented Mar 7, 2023

Once we fix the issue, we should extend the K8s upgrade test to add the Cilium agents restart and check whether the migrate-svc counters didn't increase.

@ldelossa
Copy link
Contributor

ldelossa commented Mar 13, 2023

Issue debugged on a fairly stock config:

debug:
  enabled: true
  verbose: datapath
bpf:
  monitorAggregation: none
image:
  repository: quay.io/cilium/cilium-dev
  tag: latest
  pullPolicy: IfNotPresent
  useDigest: false
operator:
  enabled: true
  replicas: 1
  image:
    override: ~
    repository: quay.io/cilium/operator
    tag: latest
    useDigest: false
    pullPolicy: IfNotPresent
    suffix: ""

And a single node kind cluster:

kind: Cluster
name: cilium-testing
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  disableDefaultCNI: true
  ipFamily: dual
  podSubnet: "10.1.0.0/16,c::/63"
  serviceSubnet: "10.2.0.0/16,d::/108"
  apiServerAddress: "0.0.0.0"
  apiServerPort: 6443
nodes:
- role: control-plane

Also updated the migration services to just spawn 1 of each client and server for simplicity.

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: migrate-svc-client
spec:
  replicas: 1
  selector:
    matchLabels:
      app: migrate-svc-client
  template:
    metadata:
      labels:
        app: migrate-svc-client
        zgroup: migrate-svc
    spec:
      containers:
      - name: server
        image: docker.io/cilium/migrate-svc-test:v0.0.2
        imagePullPolicy: IfNotPresent
        command: [ "/client", "migrate-svc.default.svc.cluster.local.:8000" ]
---
apiVersion: v1
kind: Service
metadata:
  name: migrate-svc
spec:
  ports:
  - port: 8000
  selector:
    app: migrate-svc-server
---
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: migrate-svc-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: migrate-svc-server
  template:
    metadata:
      labels:
        app: migrate-svc-server
        zgroup: migrate-svc
    spec:
      containers:
      - name: server
        image: docker.io/cilium/migrate-svc-test:v0.0.2
        imagePullPolicy: IfNotPresent
        command: [ "/server", "8000" ]

The issue is highly reproducible, however its a bit of a pain to debug since it requires restarting the agent and catching monitor logs.

Here is what I'm seeing.

flowchart TD

subgraph "Client Pod"
	c("userpace-client")
	c_veth1("veth1@1")	

	c -->| Src: PodIP, Dst: ServiceIP | c_veth1
end

subgraph "Host NetNS"
	direction LR
	c_veth1 --> c_veth2("veth1@2\n service xlate: ServiceIP => BackendIP")	
	c_veth2 -->| BPF Redirect\n Src: PodIP, Dst: BE-IP | be_veth2("veth2@2")	
	be_veth2("veth2@2")

	c_veth2 -..->| service xlate failure| hns("host network stack")
	hns -..-> | Src:  ServiceIP, Dst: ClientIP, RST | c_veth2
end

subgraph "Backend Pod"
	direction LR
	be_veth2 --> be_veth1("veth2@1")	
	be_veth1 --> be("userpace-backend")
end

Dotted lines show the error case, solid lines show the normal case.

On agent restart, existing flows from Client Pod -> service translation -> Backend Pod are dropping to stack.

When they hit stack, no service translation has occured, and the host network stack doesn't know why you're sending it a 'ACK' or 'ACK/PUSH', and responds with a RST.

The RST goes back to the Client Pod and it rips down the Client. The server reads from the client's socket and is forced to close with probably an EOF.

This situation can be confirmed with IPTables.

On v1.12.5, where the issue is not present, create a rule like so (10.2.248.173 == ServiceIP)

iptables -t raw -A PREROUTING -p tcp -d 10.2.248.173 --dport 8000 -j TRACE

Restart the Cilium agent and check for any hits:
iptables -t raw -L PREROUTING -v

Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  644  541K CILIUM_PRE_raw  all  --  any    any     anywhere             anywhere             /* cilium-feeder: CILIUM_PRE_raw */
    0     0 TRACE      tcp  --  any    any     anywhere             10.2.248.173         tcp dpt:8000

There aren't any, since no packets were spilt to the host stack during during restart.

Now, checkout HEAD, where the issue is present, and do the same.

(Sanity check, no hits before agent restart)

root@cilium-testing-control-plane:/home/cilium# iptables -t raw -L PREROUTING -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
12795 9527K CILIUM_PRE_raw  all  --  any    any     anywhere             anywhere             /* cilium-feeder: CILIUM_PRE_raw */
    0     0 TRACE      tcp  --  any    any     anywhere             10.2.219.0           tcp dpt:8000

(Hits after agent restart)

root@cilium-testing-control-plane:/home/cilium# iptables -t raw -L PREROUTING -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 2715 1019K CILIUM_PRE_raw  all  --  any    any     anywhere             anywhere             /* cilium-feeder: CILIUM_PRE_raw */
   11  1860 TRACE      tcp  --  any    any     anywhere             10.2.219.0           tcp dpt:8000
root@cilium-testing-control-plane:/home/cilium# 

(One more sanity check that rule is not hit once normal operation is back)

🖳  kubectl get pods
NAME                       READY   STATUS    RESTARTS        AGE
migrate-svc-client-jcfqs   1/1     Running   2 (3m55s ago)   6m26s
migrate-svc-server-xcvdd   1/1     Running   1 (3m43s ago)   6m26s

--- 
root@cilium-testing-control-plane:/home/cilium# iptables -t raw -L PREROUTING -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
28644 5337K CILIUM_PRE_raw  all  --  any    any     anywhere             anywhere             /* cilium-feeder: CILIUM_PRE_raw */
   11  1860 TRACE      tcp  --  any    any     anywhere             10.2.219.0           tcp dpt:8000

Wireshark also supports this theory, as we see a RST on the Client side, but no corresponding RST on the Backend's VETH interface.
This indicates, the RST the client receives is not from the Backend itself (its from the host netns).

It appears at some point we introduced a bug which drops LB flows to the stack during agent restart.

A git bisect leads to this commit:

git bisect bad
2a7cef4bb31af6f3a355187e30ee85fa4841093d is the first bad commit
commit 2a7cef4bb31af6f3a355187e30ee85fa4841093d
Author: Timo Beckers <timo@isovalent.com>
Date:   Thu May 12 10:24:11 2022 +0200

    init,cleanup: remove TC filters containing 'cilium' in their names
    
    With the addition of Go code that loads and attaches BPF programs,
    we're no longer using the BPF file/section as the tc filter name.
    
    Assume the filter names can also contain 'cilium'.
    
    Signed-off-by: Timo Beckers <timo@isovalent.com>

 bpf/init.sh           | 10 ++++------
 cilium/cmd/cleanup.go |  5 ++++-
 2 files changed, 8 insertions(+), 7 deletions(-)

And indeed, checking out the commit just prior to this stops the RST's on agent restart in my test cluster.

@ti-mo
Copy link
Contributor

ti-mo commented Mar 13, 2023

Thanks @ldelossa, fix proposed in #24336.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants