DNS errors cause global-rate-limit-memcached-connect-timeout to not be respected. #11337

alphabet5 · 2024-05-02T02:10:08Z

What happened:

We had a unique failure where coredns pods were removed/replaced during an upgrade. ingress-nginx was sending traffic to the old pod IPs which was getting dropped by cilium (because they were stale IPs / not assigned to anything.)

The default dns timeout is 5 retries x 2000ms - so a 10s delay happens, even when global-rate-limit-memcached-connect-timeout is set to 50ms.

ingress-nginx/rootfs/etc/nginx/lua/util/dns.lua

Line 99 in 300f772

retrans = 5,

This is the resulting log message

2024/04/26 21:45:14 [error] 1043#1043: *24280775 [lua] global_throttle.lua:105: throttle(): error while processing key: failed to connect: memcached.ingress-nginx.svc.cluster.local could not be resolved (110: Operation timed out), ...

What you expected to happen:

1- dns should hit the service and not the pod ips so when they're replaced dns doesn't just timeout. (I haven't figured this one out yet.)

2- the 50ms timeout should be followed even if dns is down.

For #2, it's easily reproducible:

Ingress values.yaml

controller:
  config:
    global-rate-limit-memcached-host: memcached.ingress-nginx.svc.cluster.local.

All the manifests

---
apiVersion: v1
kind: Service
metadata:
  name: memcached
  namespace: ingress-nginx
spec:
  selector:
    app: memcached
  ports:
    - protocol: TCP
      port: 11211
      targetPort: 11211
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: memcached
  namespace: ingress-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: memcached
  template:
    metadata:
      labels:
        app: memcached
    spec:
      containers:
      - name: memcached
        image: docker.io/memcached
        command:
        - memcached
        args:
        - '-m'
        - '64'
        ports:
        - containerPort: 11211
        imagePullPolicy: Always
        readinessProbe:
          tcpSocket:
            port: 11211
          periodSeconds: 10
          failureThreshold: 2
        livenessProbe:
          tcpSocket:
            port: 11211
          periodSeconds: 10
          failureThreshold: 30
        startupProbe:
          tcpSocket:
            port: 11211
          periodSeconds: 1
          failureThreshold: 30
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: dummy
  annotations:
    cert-manager.io/issuer: "letsencrypt"
    nginx.ingress.kubernetes.io/global-rate-limit: "1000000"
    nginx.ingress.kubernetes.io/global-rate-limit-window: "1m"
spec:
  ingressClassName: nginx
  rules:
  - host: "dummy.alphabet5.dev"
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: dummy
            port:
              number: 80
  tls:
  - hosts:
    - dummy.alphabet5.dev
    secretName: dummy-alphabet5-dev-tls
---
apiVersion: v1
kind: Service
metadata:
  name: dummy
spec:
  selector:
    app: dummy
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

block dns for testing

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: block-dns
  namespace: kube-system
spec:
  podSelector:
    matchLabels:
      k8s-app: kube-dns
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: ingress-nginx
    ports:
    - protocol: TCP
      port: 6379

This results in the following set of log messages

2024/05/02 01:56:41 [error] 138#138: send() failed (111: Connection refused) while resolving, resolver: 10.43.0.10:53
2024/05/02 01:56:41 [error] 138#138: send() failed (111: Connection refused) while resolving, resolver: 10.43.0.10:53
2024/05/02 01:56:46 [error] 138#138: send() failed (111: Connection refused) while resolving, resolver: 10.43.0.10:53
2024/05/02 01:56:51 [error] 138#138: send() failed (111: Connection refused) while resolving, resolver: 10.43.0.10:53
2024/05/02 01:56:56 [error] 138#138: send() failed (111: Connection refused) while resolving, resolver: 10.43.0.10:53
2024/05/02 01:57:01 [error] 138#138: send() failed (111: Connection refused) while resolving, resolver: 10.43.0.10:53
2024/05/02 01:57:06 [error] 138#138: send() failed (111: Connection refused) while resolving, resolver: 10.43.0.10:53
2024/05/02 01:57:11 [error] 138#138: *7868 [lua] global_throttle.lua:105: throttle(): error while processing key: failed to connect: memcached.ingress-nginx.svc.cluster.local could not be resolved (110: Operation timed out), client: 98.97.11.220, server: dummy.alphabet5.dev, request: "GET / HTTP/2.0", host: "dummy.alphabet5.dev"

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
Release: v1.10.1
Build: 4fb5aac
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.25.3

Kubernetes version (use kubectl version):

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.7+k3s1

(Also on v1.27.8+rke2r1 and v1.28.8+rke2r1)

Environment:

Cloud provider or hardware configuration: bare metal, and vms
OS (e.g. from /etc/os-release): Debian GNU/Linux 12 (bookworm), Ubuntu 22.04, Ubuntu 20.04
Kernel (e.g. uname -a): v5.15? (20.04) and 6.1.0-18-amd64
Install tools: rancher for rke2, and reproduced on as close to stock k3s as possible.

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-05-02T02:10:15Z

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

longwuyuan · 2024-05-02T08:54:35Z

/remove-kind bug
/kind support
/triage needs-information

1- dns should hit the service and not the pod ips so when they're replaced dns doesn't just timeout. (I haven't figured this one out yet.)

What was the ipaddress of the coredns pods before and after they/it crashed/replaced
What do you mean dns here ? Do you mean dns resolution queries for resolving the backend-app pod ipaddress ?
What do yo mean by service ? Is it the backend-app service configured in the ingress resource ?
What do you mean pod ips here ? Do you mean the pod ips of the backend app ?

alphabet5 · 2024-05-02T20:49:47Z

The pod doesn't have to crash, any dropped dns queries to resolve the memcached host will cause the configured timeout to not be followed. (Assuming the memcached timeout is < dns retry interval.)

This is an example:

coredns pod ip

~ % kubectl get pod -n kube-system -o wide | grep coredns
rke2-coredns-rke2-coredns-76d7556c69-lhr4h             1/1     Running   0                10m     10.64.2.210

Delete the pod, or rollout restart the deployment, and the new pod will get a new IP.

~ % kubectl delete pod -n kube-system rke2-coredns-rke2-coredns-76d7556c69-lhr4h
pod "rke2-coredns-rke2-coredns-76d7556c69-lhr4h" deleted
~ % kubectl get pod -n kube-system -o wide | grep coredns
rke2-coredns-rke2-coredns-76d7556c69-q7prt             1/1     Running   0                53s     10.64.2.51

By DNS - I mean the DNS query that ingress-nginx uses to resolve the global-rate-limit-memcached-host to an IP address.

By service I mean the coredns service, in this case 10.65.0.10 (10.65/16 is this example clusters service range. 10.64/16 is the pod cidr range, with /24's being given to each node to be assigned to pods.)

~ % kubectl get svc -n kube-system rke2-coredns-rke2-coredns
NAME                        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
rke2-coredns-rke2-coredns   ClusterIP   10.65.0.10   <none>        53/UDP,53/TCP   61d

Now connections will be kept alive to memcached, but if the memcached deployment also gets an update or pods get rescheduled, ingress-nginx will try to reconnect and query coredns for the IP.

When this happens (only with queries to ingress objects using global-rate-limit) this shows up in the hubble/cilum logs.

ingress-nginx/ingress-nginx-controller-74rb6:53239 (ID:28807) <> 10.64.2.210:53 (world) Stale or unroutable IP DROPPED (UDP)

(Notice how this is to a coredns pod IP and not to the coredns service address.)

Then after the dns query to resolve the global-rate-limit-memcached-host times out, the request to the backend pod happens as normal - so every request is delayed by an extra 10s.

I haven't figured out why this happens, and querying for the memcached hostname from within the nginx pod resolves correctly.

~ % kubectl exec --stdin --tty -n ingress-nginx ingress-nginx-controller-74rb6 -- bash
ingress-nginx-controller-74rb6:/etc/nginx$ nslookup memcached.ingress-nginx.svc.cluster.local.
Server:		10.65.0.10
Address:	10.65.0.10:53


Name:	memcached.ingress-nginx.svc.cluster.local
Address: 10.65.7.46

This suggests to me that it's something within ingress-nginx/openresty/lua that is doing something weird with dns, but I haven't figured that part out yet. (It could also be some weird issue with cilium/ebpf.)

None of these specifics really matter, because regardless of the underlying issue, it shouldn't cause a 10s delay to requests when the timeout is configured to be 50ms.

longwuyuan · 2024-05-03T06:22:33Z

Thanks for explaining.

But none of your data is actual live status from output of commands so I personally don't see much relevance pointing to a problem in the controller like the suggestion that you configured a 50ms timeout for something, somewhere in the controller and the controller did not honor that.

There is dns cache involved
There is arp involved
I dont know where you configured a 50ms timeout for ingress-nginx-controller pod to resolve a memcached pod's hostname to a ipaddress
And I suspect the controller you have running is not even a image built/released by this project

But these are my opinions. Wait for comments from experts who agree that a user breaking coredns showcases a bug in the ingress-controller image you are running, which may not be a image released by this project.

longwuyuan · 2024-05-03T06:25:58Z

Also, if the coredns pods crashed and recreated, is there a hint here that the ipaddress of the memcached pod changed ?

I wish you get to the bottom of this issue and solve future occurrence of this break in rate-limiting

alphabet5 · 2024-05-03T14:21:37Z

This is using the ingress-nginx helm chart. You can see the output of nginx-ingress-controller --version in the description.

NGINX Ingress controller
Release: v1.10.1
Build: https://github.com/kubernetes/ingress-nginx/commit/4fb5aac1dd3669daa3a14d9de3e3cdb371b4c518
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.25.3

You can see the helm chart values I used to reproduce this issue in the description. The only thing configured is the memcached host.

controller:
  global-rate-limit-memcached-host: memcached.ingress-nginx.svc.cluster.local.

The 50ms timeout is the default, as shown here:
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#global-rate-limit

global-rate-limit-memcached-connect-timeout: configure timeout for connect, send and receive operations. Unit is millisecond. Defaults to 50ms.

DNS cache is implementation specific, If you have more information on the lua resolver dns cache implementation, that would be helpful. As indicated by my nslookup command from within the nginx controller pod, the pod can resolve it correctly, and uses the coredns service address to do so, but based on the traffic captured the lua code for the global-rate-limit does not.

This does not require blocking traffic to coredns, or coredns crashing. Coredns was updated, and how updates work is by creating a new pod, waiting for it to become ready, the service/endpoint slices get updated, and then the old pods are removed.

longwuyuan · 2024-05-03T16:09:05Z

Thanks for your comments.

If I was to summarize the problem, it would like below.

You are reporting that after a crash/recreate of the coredns pods in a cluster, it takes 10s for ingress-nginx-controller to resolve the memcached workload service to a memcached pod ipaddress, and this 10s is a blind-spot for the rate-limit annotation because it can NOT get the counters in those 10seconds.

You are expecting that it should take only 50ms, after the crash/recreate of coredns pods in a cluster, for ingress-nginx to resolve & query the memcached workload and get the rate-limit-counters, for the rate-limit-annotation.

Is this correct ?

longwuyuan · 2024-05-03T16:40:51Z

I deleted coredns pod in one terminal and I did a nslookup for a service resource in a bash while loop with sleep for 1 second. I don't see even 1 nslookup failure

alphabet5 · 2024-05-03T17:55:50Z

2 issues - issue 1 causes issue 2.

DNS resolution from within the global-rate-limit code-path does not use the service ip, and does not reach coredns if the pod IPs change. This does not apply to nslookup from within the ingress-nginx pod, or to resolving the backend pods for an ingress. (I believe this uses the k8s api instead of dns, but regardless.)

Any dns lookup failure (i.e. timeout, not an immediate NXDOMAIN) will cause traffic through ingress-nginx to be delayed by 10s, even when the timeout configured for the global-rate-limit is set to 50ms.

Here's an easier way you can reproduce issue 2. This is showing both the input (commands I run) and the output. I'll put just the commands down below.

user@computer ~ % kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.29.2) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/
user@computer ~ % helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace \
  --set controller.config.global-rate-limit-memcached-host=asdf.blackhole.alphabet5.dev
Release "ingress-nginx" does not exist. Installing it now.
NAME: ingress-nginx
LAST DEPLOYED: Fri May  3 12:21:37 2024
NAMESPACE: ingress-nginx
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The ingress-nginx controller has been installed.
It may take a few minutes for the load balancer IP to be available.
You can watch the status by running 'kubectl get service --namespace ingress-nginx ingress-nginx-controller --output wide --watch'

An example Ingress that makes use of the controller:
  apiVersion: networking.k8s.io/v1
  kind: Ingress
  metadata:
    name: example
    namespace: foo
  spec:
    ingressClassName: nginx
    rules:
      - host: www.example.com
        http:
          paths:
            - pathType: Prefix
              backend:
                service:
                  name: exampleService
                  port:
                    number: 80
              path: /
    # This section is only required if TLS is to be enabled for the Ingress
    tls:
      - hosts:
        - www.example.com
        secretName: example-tls

If TLS is enabled for the Ingress, a Secret containing the certificate and key must also be provided:

  apiVersion: v1
  kind: Secret
  metadata:
    name: example-tls
    namespace: foo
  data:
    tls.crt: <base64 encoded cert>
    tls.key: <base64 encoded key>
  type: kubernetes.io/tls
user@computer ~ % cat <<EOF > manifests.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dummy
spec:
  selector:
    matchLabels:
      app: dummy
  replicas: 1
  template:
    metadata:
      labels:
        app: dummy
    spec:
      containers:
      - name: podinfo
        image: stefanprodan/podinfo
        ports:
        - containerPort: 9898
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: dummy
  annotations:
    nginx.ingress.kubernetes.io/global-rate-limit: "1000000"
    nginx.ingress.kubernetes.io/global-rate-limit-window: "1m"
spec:
  ingressClassName: nginx
  rules:
  - host: "localhost"
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: dummy
            port:
              number: 9898
---
apiVersion: v1
kind: Service
metadata:
  name: dummy
spec:
  selector:
    app: dummy
  ports:
    - protocol: TCP
      port: 9898
      targetPort: 9898
EOF
user@computer ~ % kubectl apply -f manifests.yaml
deployment.apps/dummy created
ingress.networking.k8s.io/dummy created
service/dummy created
user@computer ~ % sudo kubectl port-forward -n ingress-nginx svc/ingress-nginx-controller 80:80
user@computer ~ % Forwarding from 127.0.0.1:80 -> 80
Forwarding from [::1]:80 -> 80

user@computer ~ % time curl http://localhost
Handling connection for 80
user@computer ~ % time curl http://localhost
Handling connection for 80
{
  "hostname": "dummy-749c8b54cd-js75j",
  "version": "6.6.2",
  "revision": "8d010c498e79f499d1b37480507ca1ffb81a3bf7",
  "color": "#34577c",
  "logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif",
  "message": "greetings from podinfo v6.6.2",
  "goos": "linux",
  "goarch": "arm64",
  "runtime": "go1.22.2",
  "num_goroutine": "6",
  "num_cpu": "4"
}curl http://localhost  0.00s user 0.00s system 0% cpu 6.028 total

kind create cluster
helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace \
  --set controller.config.global-rate-limit-memcached-host=asdf.blackhole.alphabet5.dev
cat <<EOF > manifests.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dummy
spec:
  selector:
    matchLabels:
      app: dummy
  replicas: 1
  template:
    metadata:
      labels:
        app: dummy
    spec:
      containers:
      - name: podinfo
        image: stefanprodan/podinfo
        ports:
        - containerPort: 9898
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: dummy
  annotations:
    nginx.ingress.kubernetes.io/global-rate-limit: "1000000"
    nginx.ingress.kubernetes.io/global-rate-limit-window: "1m"
spec:
  ingressClassName: nginx
  rules:
  - host: "localhost"
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: dummy
            port:
              number: 9898
---
apiVersion: v1
kind: Service
metadata:
  name: dummy
spec:
  selector:
    app: dummy
  ports:
    - protocol: TCP
      port: 9898
      targetPort: 9898
EOF


sudo kubectl port-forward -n ingress-nginx svc/ingress-nginx-controller 80:80
time curl http://localhost

If you modify the memcached host to something that doesn't exist, the timeout works as expected. (i.e. if the failure is not dns related, and is the socket connect)

alphabet5 · 2024-05-03T18:03:52Z

For this last reproduction, the time is not exactly 10s because the SERVFAIL time will vary based on your upstream dns servers. If lua can't connect to your dns server, it will always be 10s. (Because it's configured for 5x retries, 2s timeout.)

time nslookup asdf.blackhole.alphabet5.dev 1.1.1.1
Server:		1.1.1.1
Address:	1.1.1.1#53

** server can't find asdf.blackhole.alphabet5.dev: SERVFAIL


real	0m4.079s
user	0m0.005s
sys	0m0.005s

time nslookup asdf.blackhole.alphabet5.dev 8.8.8.8
;; communications error to 8.8.8.8#53: timed out
;; communications error to 8.8.8.8#53: timed out
;; communications error to 8.8.8.8#53: timed out
;; no servers could be reached



real	0m15.131s
user	0m0.004s
sys	0m0.008s

longwuyuan · 2024-05-04T00:58:23Z

Hi @alphabet5 , thank you very much for the updated info you provided. Appreciate the time taken to give details.

These 2 message from you highlighted below , seem to be the key to issue you are reporting.

NS resolution from within the global-rate-limit code-path does not use the service ip, and does not reach coredns if the pod IPs change.

and this message

This does not apply to nslookup from within the ingress-nginx pod, or to resolving the backend pods for an ingress. (I believe this uses the k8s api instead of dns, but regardless.)

I think it will help if you copy/paste the links to the code-snippets you are referring to.

Meanwhile I will try to look it up and see if I can understand the details of the 2 messages highlighted above.

alphabet5 added the kind/bug Categorizes issue or PR as related to a bug. label May 2, 2024

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels May 2, 2024

k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS errors cause global-rate-limit-memcached-connect-timeout to not be respected. #11337

DNS errors cause global-rate-limit-memcached-connect-timeout to not be respected. #11337

alphabet5 commented May 2, 2024 •

edited

k8s-ci-robot commented May 2, 2024

longwuyuan commented May 2, 2024

alphabet5 commented May 2, 2024

longwuyuan commented May 3, 2024

longwuyuan commented May 3, 2024 •

edited

alphabet5 commented May 3, 2024

longwuyuan commented May 3, 2024 •

edited

longwuyuan commented May 3, 2024

alphabet5 commented May 3, 2024

alphabet5 commented May 3, 2024

longwuyuan commented May 4, 2024

DNS errors cause global-rate-limit-memcached-connect-timeout to not be respected. #11337

DNS errors cause global-rate-limit-memcached-connect-timeout to not be respected. #11337

Comments

alphabet5 commented May 2, 2024 • edited

k8s-ci-robot commented May 2, 2024

longwuyuan commented May 2, 2024

1- dns should hit the service and not the pod ips so when they're replaced dns doesn't just timeout. (I haven't figured this one out yet.)

alphabet5 commented May 2, 2024

longwuyuan commented May 3, 2024

longwuyuan commented May 3, 2024 • edited

alphabet5 commented May 3, 2024

longwuyuan commented May 3, 2024 • edited

longwuyuan commented May 3, 2024

alphabet5 commented May 3, 2024

alphabet5 commented May 3, 2024

longwuyuan commented May 4, 2024

alphabet5 commented May 2, 2024 •

edited

longwuyuan commented May 3, 2024 •

edited

longwuyuan commented May 3, 2024 •

edited