Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS clusters are impacted by kube dns issue #56903 #632

Closed
invidious9000 opened this issue Aug 31, 2018 · 28 comments

Comments

Projects
None yet
@invidious9000
Copy link

commented Aug 31, 2018

What happened:
Out-of-the-box AKS clusters that are fully standalone on a subscription have intermittent DNS faults as described in kubernetes #56903. This breaks tooling that has short timeouts and significantly impacts application performance.

What you expected to happen:
DNS calls and egress to complete in timely manner with default settings

How to reproduce it (as minimally and precisely as possible):
Create a fresh AKS cluster using default settings in any region, run a busybox or ubuntu shell inside the cluster using something like this:
kubectl run my-shell --rm -i --tty --image ubuntu -- bash

Once that shell launches run something like this until 5s delay is observed:
apt update; apt install -y curl
for i in `seq 1 50`; do time curl -s google.com > /dev/null; done

Anything else we need to know?:
acs-engine examples are also impacted, issue seems to span multiple versions of kubernetes.
GKE seems to not be impacted by these issues but we're still digging into how they've worked around it to see if we can mimic their solution with ARM templates.

These are high quality write-ups on the issue:
https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02

Environment:

  • Kubernetes version (use kubectl version):
    1.11.2 but others are also impacted, we have not assessed full scope of impact
  • Size of cluster (how many worker nodes are in the cluster?)
    3
  • General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.)'
    None, empty cluster except for busybox/ubuntu shell to sample DNS issue
@segfault

This comment has been minimized.

Copy link

commented Sep 4, 2018

👍 I've been running into this a lot on my acs-engine generated clusters. I've also run rancher generated clusters and run into the same issue. I do not see the same behavior outside of azure.

@zyrill

This comment has been minimized.

Copy link

commented Sep 13, 2018

Just an FYI if anybody is looking on how to work around this issue: we've decided to port our containers to debian which fixes the issue. Another workaround we tested was installing unbound which also works but obviously caching is not what you really want in a dynamic cluster environment. You also don't want multiple processes running in your containers, but... meh.

Don't even try to fix AKS or acs-engine and save yourself the pain - let's just wait for the kernel patches to land. If you're running acs you can install calico to fix this issue as calico doesn't require iptables for dns resolution.

@posix4e

This comment has been minimized.

Copy link

commented Sep 25, 2018

We fixed this by switch dns to tcp

@bhicks329

This comment has been minimized.

Copy link

commented Sep 26, 2018

Just be careful on switching to TCP for your DNS. We have built 4 new AKS clusters in the last 24 hours and non of them seem to be able to connect to the internal Azure DNS forwarders via TCP. We're having to update out vnet DNS to a custom resolver (Cloud Flare / Google).

@zyrill

This comment has been minimized.

Copy link

commented Sep 26, 2018

Also, does this mean that counter-intuitively what kind of image you're running inside a container matters? I.e. does alpine still not work with this "fix"?

@bhicks329

This comment has been minimized.

Copy link

commented Sep 26, 2018

It does. The issue with using Alpine is that the MUSL library doesn't support the single-request or TCP options for DNS requests.

@moomzni

This comment has been minimized.

Copy link

commented Sep 26, 2018

We also encountered this issue and have identified a fix by adding the single-request-reopen config value to our DNS policy.

We had been using a curl command such as the below within a loop to test the presence of the issue and observed (consistent every 5-10 calls) dns lookup delays:

curl -k -o /dev/null -s -w "DNS-Lookup [%{time_namelookup}] Time-Connect [%{time_connect}] Time-PreTransfer [%{time_pretransfer}] Time-StartTransfer [%{time_starttransfer}] Total-Time [%{time_total}] Response-Code [%{http_code}]\n" https://www.microsoft.com

Our solution to this was adding adding the single-request-reopen flag to /etc/resolv.conf which seemed to solve the issue - we experimented with TCP but this seemed to incur a consistent DNS latency of around 1s.

We have added pod DNS policies such as the below to resolve this in our setup (which in turn modifies the /etc/resolv.conf within our containers).

dnsConfig:
  options:
    - name: single-request-reopen
@moomzni

This comment has been minimized.

Copy link

commented Sep 26, 2018

I should mention this is likely to be the underlying cause...
https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts

@jskulavik

This comment has been minimized.

Copy link

commented Sep 26, 2018

We can confirm this is indeed an issue and that the above resolution is a confirmed work-around

@bhicks329

This comment has been minimized.

Copy link

commented Sep 26, 2018

It is the issue and the above is a workaround and not the final fix.

To say this is the fix is to say AKS can’t support Alpine.

@jskulavik

This comment has been minimized.

Copy link

commented Sep 26, 2018

Thank you for the correction @bhicks329. You are correct, it is not a final fix, but a workaround. We're seeing this on multiple distros, not just Alpine

@Nilubkal

This comment has been minimized.

Copy link

commented Oct 15, 2018

Hi guys , i've been struggling with the same issue and some of our images are build upon alpine but adding the single-request-reopen doesn't solve the delays . Were you able to make it work on an alpine image ?

@bhicks329

This comment has been minimized.

Copy link

commented Oct 15, 2018

@Nilubkal - Alpine doesn't support those options. You will have to switch to a non-musl image like Debian for now.

@Nilubkal

This comment has been minimized.

Copy link

commented Oct 15, 2018

@bhicks329 Thanks !
I'll go for some of the "slim" containers (jessie-slim).

@jleechp-occm

This comment has been minimized.

Copy link

commented Nov 9, 2018

Any timeframe for this being fixed?

Ran into this today on a cluster running 1.11.3. The dnsConfig workaround didn't change the results and using debian:7.11-slim has the error as well so switching to Debian doesn't seem to be a fix currently.

We don't have the issue on a 1.9.9 cluster (but it does have routing issues due to using the kubenet plugin rather than Azure so can't use it for any scaled out loads.

@Urik

This comment has been minimized.

Copy link

commented Nov 9, 2018

@jleechp-occm are you sure you applied the dnsConfig fix and migrated the Docker image to slim on the Pod making the requests rather than the one receiving them?
I applied the workaround yesterday to a Pod of mine running on AKS West US 2, K8s version 1.11.3 as well, and it fixed the issue.
Make sure your Pod is also pulling the new debian:7.11-slim image, review its imagePullPolicy.

@Nilubkal

This comment has been minimized.

Copy link

commented Nov 9, 2018

@jleechp-occm Yep, @Urik is right i can confirm that that's the way we solved it with our images which were alpine based.

@jleechp-occm

This comment has been minimized.

Copy link

commented Nov 9, 2018

@Nilubkal , @Urik, Still getting those results and destroyed the replica set to make sure it was pulling the correct config (just in case I somehow had made changes and it didn't take effect).

Edit : We're on EastUS and using Advanced Networking (set up using Terraform with the Azure Network plugin). The DNS issues also impact querying DNS for records on our peered VNETs.


Deployment Config (kubectl run debian and then edited in the dnsConfig):

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "4"
  creationTimestamp: 2018-11-09T15:15:15Z
  generation: 4
  labels:
    run: debian
  name: debian
  namespace: default
  resourceVersion: "547077"
  selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/debian
  uid: 42fa737c-e432-11e8-8369-d2e233eae55b
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      run: debian
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        run: debian
    spec:
      containers:
      - args:
        - bash
        image: debian:7.11-slim
        imagePullPolicy: Always
        name: debian
        resources: {}
        stdin: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        tty: true
      dnsConfig:
        options:
        - name: single-request-reopen
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: 2018-11-09T15:15:15Z
    lastUpdateTime: 2018-11-09T15:15:15Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: 2018-11-09T15:15:15Z
    lastUpdateTime: 2018-11-09T15:23:49Z
    message: ReplicaSet "debian-7f47487679" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 4
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

resolv.conf

root@debian-7f47487679-n4lpn:/# cat /etc/resolv.conf
nameserver 10.0.0.10
search default.svc.cluster.local svc.cluster.local cluster.local 5x3tq5lz12au5atnsatgdwdc3h.bx.internal.cloudapp.net
options ndots:5 single-request-reopen

Curl results:

root@debian-7f47487679-n4lpn:/# for i in {1..10}; do curl -k -o /dev/null -s -w "DNS-Lookup [%{time_namelookup}] Time-Connect [%{time_connect}] Time-PreTransfer [%{time_pretransfer}] Time-StartTransfer [%{time_starttransfer}] Total-Time [%{time_total}] Response-Code [%{http_code}]\n" https://www.microsoft.com; done
DNS-Lookup [10.051] Time-Connect [10.054] Time-PreTransfer [10.062] Time-StartTransfer [10.073] Total-Time [10.073] Response-Code [200]
DNS-Lookup [0.010] Time-Connect [0.013] Time-PreTransfer [0.021] Time-StartTransfer [0.033] Total-Time [0.033] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.012] Time-PreTransfer [0.021] Time-StartTransfer [0.034] Total-Time [0.034] Response-Code [200]
DNS-Lookup [0.011] Time-Connect [0.014] Time-PreTransfer [0.024] Time-StartTransfer [0.036] Total-Time [0.036] Response-Code [200]
DNS-Lookup [0.011] Time-Connect [0.014] Time-PreTransfer [0.022] Time-StartTransfer [0.034] Total-Time [0.034] Response-Code [200]
DNS-Lookup [0.006] Time-Connect [0.009] Time-PreTransfer [0.018] Time-StartTransfer [0.031] Total-Time [0.031] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.012] Time-PreTransfer [0.022] Time-StartTransfer [0.034] Total-Time [0.034] Response-Code [200]
DNS-Lookup [0.010] Time-Connect [0.013] Time-PreTransfer [0.030] Time-StartTransfer [0.043] Total-Time [0.043] Response-Code [200]
DNS-Lookup [0.058] Time-Connect [0.061] Time-PreTransfer [0.070] Time-StartTransfer [0.082] Total-Time [0.082] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.012] Time-PreTransfer [0.022] Time-StartTransfer [0.034] Total-Time [0.034] Response-Code [200]
root@debian-7f47487679-n4lpn:/# for i in {1..10}; do curl -k -o /dev/null -s -w "DNS-Lookup [%{time_namelookup}] Time-Connect [%{time_connect}] Time-PreTransfer [%{time_pretransfer}] Time-StartTransfer [%{time_starttransfer}] Total-Time [%{time_total}] Response-Code [%{http_code}]\n" https://www.microsoft.com; done
DNS-Lookup [8.043] Time-Connect [8.045] Time-PreTransfer [8.054] Time-StartTransfer [8.067] Total-Time [8.067] Response-Code [200]
DNS-Lookup [0.008] Time-Connect [0.010] Time-PreTransfer [0.022] Time-StartTransfer [0.034] Total-Time [0.034] Response-Code [200]
DNS-Lookup [0.010] Time-Connect [0.013] Time-PreTransfer [0.023] Time-StartTransfer [0.034] Total-Time [0.034] Response-Code [200]
DNS-Lookup [0.008] Time-Connect [0.011] Time-PreTransfer [0.021] Time-StartTransfer [0.034] Total-Time [0.035] Response-Code [200]
DNS-Lookup [0.007] Time-Connect [0.009] Time-PreTransfer [0.018] Time-StartTransfer [0.032] Total-Time [0.033] Response-Code [200]
DNS-Lookup [0.005] Time-Connect [0.008] Time-PreTransfer [0.019] Time-StartTransfer [0.033] Total-Time [0.033] Response-Code [200]
DNS-Lookup [0.007] Time-Connect [0.010] Time-PreTransfer [0.020] Time-StartTransfer [0.034] Total-Time [0.034] Response-Code [200]
DNS-Lookup [0.005] Time-Connect [0.008] Time-PreTransfer [0.017] Time-StartTransfer [0.029] Total-Time [0.029] Response-Code [200]
DNS-Lookup [0.005] Time-Connect [0.007] Time-PreTransfer [0.016] Time-StartTransfer [0.028] Total-Time [0.028] Response-Code [200]
DNS-Lookup [0.006] Time-Connect [0.010] Time-PreTransfer [0.020] Time-StartTransfer [0.034] Total-Time [0.034] Response-Code [200]
@Urik

This comment has been minimized.

Copy link

commented Nov 9, 2018

Hmmm I just tried running your deployment, running bash on the container and running the curl command and had no abnormal results at all.

DNS-Lookup [0.016] Time-Connect [0.021] Time-PreTransfer [0.034] Time-StartTransfer [0.046] Total-Time [0.046] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.015] Time-PreTransfer [0.028] Time-StartTransfer [0.042] Total-Time [0.042] Response-Code [200]
DNS-Lookup [0.014] Time-Connect [0.019] Time-PreTransfer [0.033] Time-StartTransfer [0.048] Total-Time [0.048] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.014] Time-PreTransfer [0.027] Time-StartTransfer [0.040] Total-Time [0.040] Response-Code [200]
DNS-Lookup [0.011] Time-Connect [0.016] Time-PreTransfer [0.029] Time-StartTransfer [0.046] Total-Time [0.046] Response-Code [200]
DNS-Lookup [0.011] Time-Connect [0.016] Time-PreTransfer [0.029] Time-StartTransfer [0.045] Total-Time [0.045] Response-Code [200]
DNS-Lookup [0.010] Time-Connect [0.016] Time-PreTransfer [0.030] Time-StartTransfer [0.044] Total-Time [0.044] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.015] Time-PreTransfer [0.027] Time-StartTransfer [0.040] Total-Time [0.040] Response-Code [200]
DNS-Lookup [0.010] Time-Connect [0.015] Time-PreTransfer [0.029] Time-StartTransfer [0.042] Total-Time [0.042] Response-Code [200]
DNS-Lookup [0.010] Time-Connect [0.015] Time-PreTransfer [0.027] Time-StartTransfer [0.038] Total-Time [0.039] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.014] Time-PreTransfer [0.031] Time-StartTransfer [0.044] Total-Time [0.044] Response-Code [200]
DNS-Lookup [0.012] Time-Connect [0.017] Time-PreTransfer [0.030] Time-StartTransfer [0.044] Total-Time [0.044] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.015] Time-PreTransfer [0.031] Time-StartTransfer [0.044] Total-Time [0.044] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.015] Time-PreTransfer [0.027] Time-StartTransfer [0.039] Total-Time [0.039] Response-Code [200]
DNS-Lookup [0.010] Time-Connect [0.015] Time-PreTransfer [0.029] Time-StartTransfer [0.042] Total-Time [0.042] Response-Code [200]
DNS-Lookup [0.010] Time-Connect [0.015] Time-PreTransfer [0.028] Time-StartTransfer [0.042] Total-Time [0.042] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.014] Time-PreTransfer [0.027] Time-StartTransfer [0.041] Total-Time [0.041] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.014] Time-PreTransfer [0.027] Time-StartTransfer [0.040] Total-Time [0.040] Response-Code [200]
DNS-Lookup [0.009] Time-Connect [0.014] Time-PreTransfer [0.027] Time-StartTransfer [0.042] Total-Time [0.042] Response-Code [200]
root@debian-7f47487679-c92vk:/#

Then I commented out the dnsConfig object, deleted and relaunched the deployment, and these are my results:

DNS-Lookup [0.009] Time-Connect [0.014] Time-PreTransfer [0.036] Time-StartTransfer [0.055] Total-Time [0.055] Response-Code [200]
DNS-Lookup [5.017] Time-Connect [5.022] Time-PreTransfer [5.037] Time-StartTransfer [5.050] Total-Time [5.050] Response-Code [200]
DNS-Lookup [0.007] Time-Connect [0.012] Time-PreTransfer [0.026] Time-StartTransfer [0.039] Total-Time [0.039] Response-Code [200]
DNS-Lookup [0.008] Time-Connect [0.013] Time-PreTransfer [0.027] Time-StartTransfer [0.039] Total-Time [0.039] Response-Code [200]
DNS-Lookup [0.005] Time-Connect [0.010] Time-PreTransfer [0.029] Time-StartTransfer [0.041] Total-Time [0.041] Response-Code [200]
DNS-Lookup [0.005] Time-Connect [0.010] Time-PreTransfer [0.023] Time-StartTransfer [0.037] Total-Time [0.037] Response-Code [200]
DNS-Lookup [5.015] Time-Connect [5.020] Time-PreTransfer [5.037] Time-StartTransfer [5.049] Total-Time [5.049] Response-Code [200]
DNS-Lookup [0.005] Time-Connect [0.010] Time-PreTransfer [0.023] Time-StartTransfer [0.035] Total-Time [0.035] Response-Code [200]
DNS-Lookup [5.015] Time-Connect [5.020] Time-PreTransfer [5.033] Time-StartTransfer [5.048] Total-Time [5.048] Response-Code [200]
DNS-Lookup [5.016] Time-Connect [5.021] Time-PreTransfer [5.034] Time-StartTransfer [5.048] Total-Time [5.048] Response-Code [200]
DNS-Lookup [0.006] Time-Connect [0.011] Time-PreTransfer [0.028] Time-StartTransfer [0.043] Total-Time [0.043] Response-Code [200]
DNS-Lookup [5.018] Time-Connect [5.023] Time-PreTransfer [5.039] Time-StartTransfer [5.051] Total-Time [5.051] Response-Code [200]
DNS-Lookup [0.005] Time-Connect [0.010] Time-PreTransfer [0.030] Time-StartTransfer [0.043] Total-Time [0.043] Response-Code [200]
DNS-Lookup [5.012] Time-Connect [5.017] Time-PreTransfer [5.030] Time-StartTransfer [5.045] Total-Time [5.045] Response-Code [200]
DNS-Lookup [0.005] Time-Connect [0.010] Time-PreTransfer [0.023] Time-StartTransfer [0.037] Total-Time [0.037] Response-Code [200]
DNS-Lookup [0.005] Time-Connect [0.010] Time-PreTransfer [0.023] Time-StartTransfer [0.037] Total-Time [0.037] Response-Code [200]
DNS-Lookup [0.013] Time-Connect [0.018] Time-PreTransfer [0.032] Time-StartTransfer [0.046] Total-Time [0.046] Response-Code [200]
DNS-Lookup [0.006] Time-Connect [0.011] Time-PreTransfer [0.024] Time-StartTransfer [0.039] Total-Time [0.039] Response-Code [200]
DNS-Lookup [5.010] Time-Connect [5.016] Time-PreTransfer [5.030] Time-StartTransfer [5.043] Total-Time [5.043] Response-Code [200]

What region is your cluster on? I'll try executing it there.

@Urik

This comment has been minimized.

Copy link

commented Nov 9, 2018

Tried it in East US, same results. Consistent 5 seconds lookup times without dnsConfig, no delays with dnsConfig set to single-request-reopen

@MaurGi

This comment has been minimized.

Copy link
Member

commented Nov 14, 2018

We just encountered this on debian:jessie - our symptom was a 10x performance degradation on our Apache Storm clusters -
We worked around this with the single-request-reopen on resolv.conf as described above.

@jackfrancis

This comment has been minimized.

Copy link
Member

commented Nov 20, 2018

As all AKS regions now deliver clusters backed by vms running the 4.15.0-1030-azure version of the Ubuntu (which in tl;dr terms "fixes 5 second lookups for alpine-based images"), could we get any stories for folks running 4.15.0-1030-azure? Do we still have issues on that kernel?

For folks running pre-existing clusters (the kernel patch was released just last week) upgrade/scale will be required to get that kernel version on your nodes, FYI.

@evillgenius75

This comment has been minimized.

Copy link

commented Nov 20, 2018

can we remove the workaround that was provided early in this thread?

Our solution to this was adding adding the single-request-reopen flag to /etc/resolv.conf

@joekohlsdorf

This comment has been minimized.

Copy link

commented Nov 21, 2018

@jackfrancis Can you explain what was done in this image?

Only 1 of 3 possible races have been fixed in the kernel.

@jackfrancis

This comment has been minimized.

Copy link
Member

commented Nov 21, 2018

@joekohlsdorf According to the author of the kernel fix, he has submitted 2 fixes, one of which has been accepted and is present in that kernel version. Another of which is still in review. See the "Kernel Fix" section here:

https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts

@ysijason

This comment has been minimized.

Copy link

commented Nov 24, 2018

Upgraded AKS cluster from 1.11.3 to 1.11.4 and it seems to have solved the DNS problem.

@seanmck seanmck closed this Dec 6, 2018

@shaikatz

This comment has been minimized.

Copy link

commented Feb 13, 2019

@jackfrancis, the second fix was accepted any way to get it into the next VHD?

@jackfrancis

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

Hi @shaikatz Which component received the fix above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.