Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-prometheus: CoreDNSDown #2278

Open
bitva77 opened this issue Jan 9, 2019 · 14 comments

Comments

@bitva77
Copy link

commented Jan 9, 2019

What did you do?

Compiled/Installed kube-prometheus

What did you expect to see?

No alerts

What did you see instead? Under which circumstances?

alert: CoreDNSDown expr: absent(up{job="kube-dns"} == 1)

Environment

Kubernetes 1.13.0 - installed via kubeadm.

  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.3", GitCommit:"435f92c719f279a3a67808c80521ea17d5715c66", GitTreeState:"clean", BuildDate:"2018-11-26T12:57:14Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T20:56:12Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:

kubeadm

  • Manifests:
cat prometheus-serviceMonitorCoreDNS.json.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: coredns
  name: coredns
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 15s
    port: metrics
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-dns
   - alert: CoreDNSDown
      annotations:
        message: CoreDNS has disappeared from Prometheus target discovery.
        runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-corednsdown
      expr: |
        absent(up{job="kube-dns"} == 1)
      for: 15m
      labels:
        severity: critical

The labels are correct:

kubectl get pods -l k8s-app=kube-dns -n kube-system
NAME                       READY   STATUS    RESTARTS   AGE
coredns-86c58d9df4-bhdln   1/1     Running   0          19d
coredns-86c58d9df4-pwrts   1/1     Running   0          19d

Not sure what the deal is. Any thoughts?

@bitva77

This comment has been minimized.

Copy link
Author

commented Jan 9, 2019

From my investigation here, in the prometheus-serviceMonitorCoreDNS.json.yaml ServiceMonitor, spec.endpoints.port is set to metrics but I don't see CoreDNS pods using that name for port 9153/TCP.

And I can't seem to change it manually to 9153.

That sound about right?

@brancz

This comment has been minimized.

Copy link
Member

commented Jan 10, 2019

Yes that sounds like it's the crooks of it. Maybe make sure that kubeadm is creating coredns as recommended by their deploy repo: https://github.com/coredns/deployment/blob/c5670bf0d2c5c7964a68f1ef7dc20376602bfa2a/kubernetes/coredns.yaml.sed#L173-L175

(we made sure to have that port there 😉 coredns/deployment#112)

@bitva77

This comment has been minimized.

Copy link
Author

commented Jan 10, 2019

looks like kubeadm configured/applied CoreDNS correctly:

kubectl get deployment coredns -o yaml -n kube-system

        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP

I'll keep poking around...

(by the way, great seeing/hearing the presentations at kubecon this year - thanks for that :)

@pmcgrath

This comment has been minimized.

Copy link

commented Jan 14, 2019

Is it that your service monitor is configured to scrape a port named metrics but the kube-dns service does NOT have a port named metrics
Note: the coredns pods do have the metrics port but the service needs to include the named port also as far as I can tell

The link @brancz linked to above includes the service metrics port

kube-dns service ports
kubectl get svc -n kube-system kube-dns -o json | jq .spec.ports

[
  {
    "name": "dns",
    "port": 53,
    "protocol": "UDP",
    "targetPort": 53
  },
  {
    "name": "dns-tcp",
    "port": 53,
    "protocol": "TCP",
    "targetPort": 53
  }
]

coredns pod ports - lets just use the first one to illustrate
kubectl get pods -n kube-system --selector k8s-app=kube-dns -o json | jq .items[0].spec.containers[].ports

[
  {
    "containerPort": 53,
    "name": "dns",
    "protocol": "UDP"
  },
  {
    "containerPort": 53,
    "name": "dns-tcp",
    "protocol": "TCP"
  },
  {
    "containerPort": 9153,
    "name": "metrics",
    "protocol": "TCP"
  }
]

So if kubeadm was to use the service manifest that @brancz linked to it would result in a scrape for coredns and no firing alert for the same, looked at what i think they use to add the dns addon here but they do not seem to be using the same manifest

When I added the metrics port to the kube-dns the scrape succeeded, note this is a cluster IP so not exposed outside the cluster

@bitva77

This comment has been minimized.

Copy link
Author

commented Jan 24, 2019

@pmcgrath I OWE YOU BEER. That was it!

I applied this and we're good now:

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  creationTimestamp: null
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: KubeDNS
  name: kube-dns
  selfLink: /api/v1/namespaces/kube-system/services/kube-dns
  namespace: kube-system
spec:
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  - name: metrics
    port: 9153
    protocol: TCP
    targetPort: 9153
  selector:
    k8s-app: kube-dns
  sessionAffinity: None
  type: ClusterIP

Basically just added this:

  - name: metrics
    port: 9153
    protocol: TCP
    targetPort: 9153
@pmcgrath

This comment has been minimized.

Copy link

commented Jan 24, 2019

@bitva77 Yes it works but it means we have to mutate a service that kubeadm has already configured for us to use kube-prometheus which is not ideal

We also have to remember the same possibly after doing any in-place kubeadm upgrades to a cluster - Have not done this yet but will need to keep track of this and verify when we do so

@brancz

This comment has been minimized.

Copy link
Member

commented Jan 24, 2019

To me the solution sounds like adapting the service kubeadm provisions. That doesn’t sound like a big deal, given the upstream coredns deployments already do this. I’d imagine kubeadm wants to be consistent.

@pmcgrath

This comment has been minimized.

Copy link

commented Jan 24, 2019

@brancz Thanks, will do as you say

@pmcgrath

This comment has been minimized.

Copy link

commented Jan 24, 2019

@bitva77 +1 on your kubeadm issue

@chiefy

This comment has been minimized.

Copy link

commented Feb 11, 2019

FWIW experiencing this using AWS EKS 1.11 - trying to figure out a workaround.

@brancz

This comment has been minimized.

Copy link
Member

commented Feb 18, 2019

This issue is about kubeadm, not EKS. I'm not sure we have an issue for EKS, but your options are either to filter out the alert entirely or if EKS uses kube-dns, then use kube-dns alerts instead.

@VinceMD

This comment has been minimized.

Copy link

commented Feb 19, 2019

just FYI the port in kube-aws is 10055
So here is my kube-prometheus-kube-aws-coredns.libsonnet:

local k = import 'ksonnet/ksonnet.beta.3/k.libsonnet';
local service = k.core.v1.service;
local servicePort = k.core.v1.service.mixin.spec.portsType;

{
  prometheus+:: {
      kubeDnsPrometheusDiscoveryService:
      service.new('kube-dns-prometheus-discovery', { 'k8s-app': 'kube-dns' }, [servicePort.newNamed('metrics', 10055, 10055)]) +
      service.mixin.metadata.withNamespace('kube-system') +
      service.mixin.metadata.withLabels({ 'k8s-app': 'kube-dns' }) +
      service.mixin.spec.withClusterIp('None'),
  },
}

@chiefy

This comment has been minimized.

Copy link

commented Feb 21, 2019

@brancz I am aware this issue is with kubeadm but I am pretty sure that EKS uses kubeadm to stand up the cluster. Anyhow, I patched the kube-dns service to expose the port and now everything works fine.

@karancode

This comment has been minimized.

Copy link

commented Jul 29, 2019

I am also facing this issues. Running on EKS 1.13.
I also patched the kube-dns service with

  - name: metrics
    port: 9153
    protocol: TCP
    targetPort: 9153

and the alert seems to have stopped triggering.
Just want to confirm, it this okay to run in production environments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.