-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LRP stops performing NAT for existing sockets after temporary k8s API-server unavailability #31988
Comments
What's your kube-proxy replacement setting on the cluster? |
Strict, according to status:
|
You are running into the socket-lb limitation with respect to stale backends. This issue is a dup of - #31012. Edit: Sorry, I might not have understood the report entirely, so let's check if it's the same issue.
Can you reword this? I'm a bit lost as to what problem you are seeing. |
K8s API server can become temporary unavailable due to network issues or problems at k8s master itself. After availability restores - cilium noticing that (we see "Network status error received, restarting client connections" in logs) and it somehow triggers broken NAT for LRP (client still getting replies, but from node-local-dns IP address instead of kube-dns). If client performs verification that UDP reply was sent from the IP-address, the initial socket was created to, that verification fails and DNS reply is getting dropped. So, it is not looks like the same issue as you mentioned (node-local-dns pod not being restarted in our case). |
So you are seeing issues on the reverse NAT path? Can you capture two sysdump: one before the problem occurs and another while you are seeing the connectivity issue? |
It looks so.
Yes, but it will take some time (I'll try to reproduce the problem on clean debugging environment). |
I've uploaded requested sysdumps: The problem was reproduced on freshly deployed k8s cluster. The steps:
#!/usr/bin/env python3
import socket
from binascii import unhexlify
from datetime import datetime
from time import sleep
dns_req_msg = "AAAA01000001000000000000076578616d706c6503636f6d0000010001"
with open("/etc/resolv.conf") as fd:
while True:
l = fd.readline()
if l.startswith("nameserver"):
ns_ip = l.split()[1]
break
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.connect((ns_ip, 53))
while True:
sock.send(unhexlify(dns_req_msg))
reply, reply_addr = sock.recvfrom(4096)
print(f"{datetime.now()} - {reply_addr[0]}")
sleep(1)
date;
iptables -I OUTPUT 1 -d 10.0.0.6 -j REJECT;
iptables -I OUTPUT 1 -d 158.160.162.56 -j REJECT;
sleep 300;
date;
iptables -D OUTPUT -d 10.0.0.6 -j REJECT;
iptables -D OUTPUT -d 158.160.162.56 -j REJECT; The output of blocking script (time of blocking and time of unblocking):
The output from python script at time around source IP change:
The logs from cilium-agent at the same time:
|
Can also confirm this in my cluster with kube-proxy replacement set to strict. Happy to assist with logs as well. Restarting node-local-cache pods appears to fix this but it's not ideal. |
Is there an existing issue for this?
What happened?
We using NodeLocal DNSCache with Local Redirect Policy:
Some kind of our services opens UDP socket to DNS service at start and never reopens it. For example:
10.10.23.193
here is address of local instance ofnode-local-dns
, but the service connects to10.11.0.2
- the address ofkube-dns
service (as read from/etc/resolv.conf
).When tracing the client service we can see that replies are coming from
10.11.0.2
(as if it is kube-dns). So, LRP performs NAT and everything works well:Until k8s node temporary looses network access to k8s master. After that, the service receives DNS replies not from
10.11.0.2
, but from10.10.23.193
(from the same socket, created to10.11.0.2
). That triggers DNS anti-spoofing in client library (c-ares), which just drops replies from "wrong" address.On k8s versions prior to 1.27 it was easy to reproduce by temporary disabling access to k8s master using iptables. But after upgrading to 1.27, disabling for 5 minutes is not enough to reproduce the problem, but longer unavailability leads to pods relocation. Anyway, after k8s API server become unavailable by itself and the returns, services starting to see DNS replies from not expected address (following example is from another pod):
On the cilium side we can see following logs:
Cilium Version
Client: 1.12.9 e0bb30a 2023-04-17T23:54:19+02:00 go version go1.18.10 linux/amd64
Daemon: 1.12.9 e0bb30a 2023-04-17T23:54:19+02:00 go version go1.18.10 linux/amd64
Kernel Version
5.4.0-171-generic #189-Ubuntu SMP Fri Jan 5 14:23:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Regression
No response
Sysdump
Error: unknown command "sysdump" for "cilium"
Relevant log output
No response
Anything else?
No response
Cilium Users Document
Code of Conduct
The text was updated successfully, but these errors were encountered: