-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cilium 1.13.0 restart requires an app reconnect #24191
Comments
@aanm Thanks for the issue! Do you have a sysdump before and after? |
@brb
|
Thanks. The migrate-svc-* pod logs are missing. Could you paste them collected after the upgrade? |
|
Hmm, nothing interesting in the logs. Also, no changes in the drop counters. The affected packet path is the bpf_lxc's per-packet LB to ClusterIP. |
There won't be any drop counter increasing: the apps restarted because of TCP RSTs. |
One example from the last
No surprises here, we have a NAT issue. The server IP goes from 10.106.98.104 (clusterIP, VIP) to 10.0.0.249 (endpoint IP, backend). That trace point is right before we deliver in the destination/client pod so we shouldn't see a backend IP. |
Corresponding CT entries:
Reverse NAT entries:
|
I couldn't reproduce it on Kind with multiple K8s Nodes. But once I switch to a single Node cluster (i.e., only with |
Once we fix the issue, we should extend the K8s upgrade test to add the Cilium agents restart and check whether the migrate-svc counters didn't increase. |
Issue debugged on a fairly stock config: debug:
enabled: true
verbose: datapath
bpf:
monitorAggregation: none
image:
repository: quay.io/cilium/cilium-dev
tag: latest
pullPolicy: IfNotPresent
useDigest: false
operator:
enabled: true
replicas: 1
image:
override: ~
repository: quay.io/cilium/operator
tag: latest
useDigest: false
pullPolicy: IfNotPresent
suffix: "" And a single node kind cluster: kind: Cluster
name: cilium-testing
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true
ipFamily: dual
podSubnet: "10.1.0.0/16,c::/63"
serviceSubnet: "10.2.0.0/16,d::/108"
apiServerAddress: "0.0.0.0"
apiServerPort: 6443
nodes:
- role: control-plane Also updated the migration services to just spawn 1 of each client and server for simplicity. apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: migrate-svc-client
spec:
replicas: 1
selector:
matchLabels:
app: migrate-svc-client
template:
metadata:
labels:
app: migrate-svc-client
zgroup: migrate-svc
spec:
containers:
- name: server
image: docker.io/cilium/migrate-svc-test:v0.0.2
imagePullPolicy: IfNotPresent
command: [ "/client", "migrate-svc.default.svc.cluster.local.:8000" ]
---
apiVersion: v1
kind: Service
metadata:
name: migrate-svc
spec:
ports:
- port: 8000
selector:
app: migrate-svc-server
---
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: migrate-svc-server
spec:
replicas: 1
selector:
matchLabels:
app: migrate-svc-server
template:
metadata:
labels:
app: migrate-svc-server
zgroup: migrate-svc
spec:
containers:
- name: server
image: docker.io/cilium/migrate-svc-test:v0.0.2
imagePullPolicy: IfNotPresent
command: [ "/server", "8000" ] The issue is highly reproducible, however its a bit of a pain to debug since it requires restarting the agent and catching monitor logs. Here is what I'm seeing. flowchart TD
subgraph "Client Pod"
c("userpace-client")
c_veth1("veth1@1")
c -->| Src: PodIP, Dst: ServiceIP | c_veth1
end
subgraph "Host NetNS"
direction LR
c_veth1 --> c_veth2("veth1@2\n service xlate: ServiceIP => BackendIP")
c_veth2 -->| BPF Redirect\n Src: PodIP, Dst: BE-IP | be_veth2("veth2@2")
be_veth2("veth2@2")
c_veth2 -..->| service xlate failure| hns("host network stack")
hns -..-> | Src: ServiceIP, Dst: ClientIP, RST | c_veth2
end
subgraph "Backend Pod"
direction LR
be_veth2 --> be_veth1("veth2@1")
be_veth1 --> be("userpace-backend")
end
Dotted lines show the error case, solid lines show the normal case. On agent restart, existing flows from Client Pod -> service translation -> Backend Pod are dropping to stack. When they hit stack, no service translation has occured, and the host network stack doesn't know why you're sending it a 'ACK' or 'ACK/PUSH', and responds with a RST. The RST goes back to the Client Pod and it rips down the Client. The server reads from the client's socket and is forced to close with probably an EOF. This situation can be confirmed with IPTables. On v1.12.5, where the issue is not present, create a rule like so (10.2.248.173 == ServiceIP)
Restart the Cilium agent and check for any hits: Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
644 541K CILIUM_PRE_raw all -- any any anywhere anywhere /* cilium-feeder: CILIUM_PRE_raw */
0 0 TRACE tcp -- any any anywhere 10.2.248.173 tcp dpt:8000 There aren't any, since no packets were spilt to the host stack during during restart. Now, checkout HEAD, where the issue is present, and do the same. (Sanity check, no hits before agent restart) root@cilium-testing-control-plane:/home/cilium# iptables -t raw -L PREROUTING -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
12795 9527K CILIUM_PRE_raw all -- any any anywhere anywhere /* cilium-feeder: CILIUM_PRE_raw */
0 0 TRACE tcp -- any any anywhere 10.2.219.0 tcp dpt:8000 (Hits after agent restart) root@cilium-testing-control-plane:/home/cilium# iptables -t raw -L PREROUTING -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
2715 1019K CILIUM_PRE_raw all -- any any anywhere anywhere /* cilium-feeder: CILIUM_PRE_raw */
11 1860 TRACE tcp -- any any anywhere 10.2.219.0 tcp dpt:8000
root@cilium-testing-control-plane:/home/cilium# (One more sanity check that rule is not hit once normal operation is back)
Wireshark also supports this theory, as we see a RST on the Client side, but no corresponding RST on the Backend's VETH interface. It appears at some point we introduced a bug which drops LB flows to the stack during agent restart. A git bisect leads to this commit:
And indeed, checking out the commit just prior to this stops the RST's on agent restart in my test cluster. |
Start minikube
Install Cilium
Wait until Cilium pods are ready
Deploy some apps
Wait until the apps are ready:
Restart Cilium agent:
Check that apps got restarted:
The above bug is NOT happening with Cilium 1.12.5
The text was updated successfully, but these errors were encountered: