Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Kubernetes: bring kube-dns implementation up-to-date #3373

Merged
merged 13 commits into from
Jul 28, 2018

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Jun 26, 2018

What this PR does / why we need it: Basing kube-dns implementation on the kubernetes-recommended base configs.

Primary change is to replace exechealthz with sidecar. Additionally added --no-negcache config to dnsmasq.

As a reference, here are the base configs.

For v1.11:

https://github.com/kubernetes/kubernetes/blob/v1.11.0/cluster/addons/dns/kube-dns/kube-dns.yaml.base

For v1.10:

https://github.com/kubernetes/kubernetes/blob/v1.10.0/cluster/addons/dns/kube-dns.yaml.base

For v1.9:

https://github.com/kubernetes/kubernetes/blob/v1.9.0/cluster/addons/dns/kube-dns.yaml.base

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #3534 fixes #2999

Special notes for your reviewer:

If applicable:

  • documentation
  • unit tests
  • tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

use exechealthz v1.3.0 in k8s 1.11

@acs-bot
Copy link

acs-bot commented Jun 26, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ghost ghost assigned jackfrancis Jun 26, 2018
@ghost ghost added the in progress label Jun 26, 2018
@jackfrancis
Copy link
Member Author

@feiskyer @andyzhangx FYI, testing new exechealthz on v1.11 clusters. Any reasons that you're aware of to backport this update for earlier cluster versions? (Or are there any reasons why not to move forward w/ new exechealthz version?)

@codecov
Copy link

codecov bot commented Jun 26, 2018

Codecov Report

Merging #3373 into master will decrease coverage by 0.04%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master    #3373      +/-   ##
==========================================
- Coverage   55.49%   55.45%   -0.05%     
==========================================
  Files         105      105              
  Lines       16038    16041       +3     
==========================================
- Hits         8900     8895       -5     
- Misses       6386     6394       +8     
  Partials      752      752

@feiskyer
Copy link
Member

@jackfrancis kube-dns has moved to sidecar container for health checking since kubernetes/kubernetes#38992 (v.1.6.0). I suggest we also use it (e.g. k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10) for our cluster.

@jackfrancis
Copy link
Member Author

@jackfrancis
Copy link
Member Author

@feiskyer thanks for the nudge here! See my changes in the file parts/k8s/addons/kubernetesmasteraddons-kube-dns-deployment.yaml. They aren't working. Anything obvious jump out at you?

@@ -146,6 +146,8 @@ spec:
- mountPath: /kube-dns-config
name: kube-dns-config
- args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

healthz container should be removed, as sidecar container has been added below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also deployed with https://github.com/kubernetes/kubernetes/blob/v1.11.0/cluster/addons/dns/kube-dns/kube-dns.yaml.base, it works well with acs-engine deployed cluster (v1.11.0).

@jackfrancis jackfrancis changed the title [WIP] use new exechealthz w/ k8s 1.11 kube-dns updates for 1.11 Jul 25, 2018
@jackfrancis
Copy link
Member Author

@feiskyer Thanks for the continued guidance! I've converted the 1.11 kube-dns implementation here to follow the base kubernetes example. Initial tests suggest that HPA is failing, but everything else in our test surface area checks out.

Does anything look suspicious here that would break HPA?

https://github.com/jackfrancis/acs-engine/blob/1bcf1af22bc7cd446c95766fec93b6f05930d921/parts/k8s/addons/kubernetesmasteraddons-kube-dns-deployment.yaml

@feiskyer
Copy link
Member

@jackfrancis Did you mean HPA for other pods or dns-horizontal-autoscaler for kube-dns? The change doesn't seem related with HPA for other pods.

@jackfrancis
Copy link
Member Author

@FeiSker just regular HPA. We test deploying nginx, attaching HPA config to it, and then hitting with load.

@feiskyer
Copy link
Member

feiskyer commented Jul 26, 2018

Have you checked HPA events? It should include some hints, e.g.

kubectl describe hpa <hpa-name>

@jackfrancis
Copy link
Member Author

@feiskyer so, a follow-up run had different side-effects.

I observed kube-dns pod go from CrashLoopBackOff to Error.

See:

$ kubectl logs kube-dns-7f9df74d5b-6g7g2 -n kube-system -c dnsmasq
I0726 19:42:19.211202       1 main.go:74] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0726 19:42:19.211373       1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0726 19:42:19.547769       1 nanny.go:119] 
W0726 19:42:19.547801       1 nanny.go:120] Got EOF from stdout
I0726 19:42:19.547815       1 nanny.go:116] dnsmasq[9]: started, version 2.78 cachesize 1000
I0726 19:42:19.547826       1 nanny.go:116] dnsmasq[9]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0726 19:42:19.547831       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0726 19:42:19.547835       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0726 19:42:19.547945       1 nanny.go:116] dnsmasq[9]: reading /etc/resolv.conf
I0726 19:42:19.547954       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0726 19:42:19.547958       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0726 19:42:19.547961       1 nanny.go:116] dnsmasq[9]: using nameserver 168.63.129.16#53
I0726 19:42:19.548104       1 nanny.go:116] dnsmasq[9]: read /etc/hosts - 7 addresses

and:

$ kubectl logs kube-dns-7f9df74d5b-6g7g2 -n kube-system -c sidecar
I0726 19:40:02.174431       1 main.go:51] Version v1.14.8.3
I0726 19:40:02.174499       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
I0726 19:40:02.174527       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
I0726 19:40:02.174577       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
W0726 19:40:02.174835       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:56325->127.0.0.1:53: read: connection refused

- -k
- --cache-size=1000
- --no-negcache
- --log-facility=-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing --server=/cluster.local/127.0.0.1#10053 here?

@feiskyer
Copy link
Member

It works fine on my cluster after adding --server=/cluster.local/127.0.0.1#10053.

@jackfrancis
Copy link
Member Author

That worked @feiskyer, thank you! The changes here are only for 1.11. Do you recommend doing a similar kube-dns conversion for any other k8s versions?

@acs-bot acs-bot added size/XXL and removed size/XL labels Jul 27, 2018
@jackfrancis
Copy link
Member Author

@feiskyer for your review. I audited the k8s codebase and determined that the sidecar implementation has been in place since 1.7 (at least, I didn't check earlier).

To be conservative, and with the prediction that there are more clusters on 1.8 and below in the wild (1.8 is the default when none is provided), I have converted 1.9 and above to adhere to an implementation that looks like the upstream base example.

Thoughts? Any reservations about these changes? Thanks again for your eyes!

@jackfrancis jackfrancis changed the title kube-dns updates for 1.11 Kubernetes: bring kube-dns implementation up-to-date Jul 27, 2018
@feiskyer
Copy link
Member

I have converted 1.9 and above to adhere to an implementation that looks like the upstream base example.

I'm also suggesting this, so we're consistent with upstream.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kube-dns-v20 deployment Pods are unable to resolve DNS for services both internally and externally.
3 participants