Description
So here's the issue...
In some clusters, DNS will not resolve correctly due to Alpine not handling DNS resolution correctly. Alpine is used as a base image for cert-manager.
This is a critical problem as I'm unable to get this to work within my large Kubernetes cluster with Let's Encrypt.
There is a HUGE chain of issues that describe's what's happening. Essentially, Alpine does not resolve the DNS queries correctly and either returns incorrect queries, or (depending if the provider uses Cloudflare), returns them incorrectly.
I'm unfamiliar with Bazel, but it'd be good to change it from Alpine to Debian here: https://github.com/jetstack/cert-manager/blob/5b8fd9415e324a989e63915f050b2484fb5839ba/WORKSPACE#L72
How to replicate the bug and what happens:
Deploy cert-manager:
helm install \
--name cert-manager \
--namespace kube-system \
--version v0.5.2 \
stable/cert-manager
Now try to do an nslookup within the cert-manager container:
▶ kubectl exec -it cert-manager-5d5bc6cd7f-fw7dx -n kube-system -- /bin/sh
/ $ nslookup letsencrypt.org
nslookup: can't resolve '(null)': Name does not resolve
Name: letsencrypt.org
Address 1: 23.23.86.44 ec2-23-23-86-44.compute-1.amazonaws.com
This returns an INCORRECT dns entry. The reasoning behind this can be found in multiple issues: kubernetes/kubernetes#30215 gliderlabs/docker-alpine#8 https://github.com/JiscRDSS/rdss-arkivum-nextcloud/pull/24 kubernetes/dns#119
Larger projects have also switched over to using Debian instead of Alpine due to an incredible amount of DNS issues: apache/openwhisk#4052
This is due to Alpine not resolving the /etc/resolv.conf
file correctly:
/ $ cat /etc/resolv.conf
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local net
options ndots:5
/ $
After removing "net" (provided by Kubernetes) from /etc/resolv.conf, DNS now resolves correctly:
/ $ nslookup letsencrypt.org
nslookup: can't resolve '(null)': Name does not resolve
Name: letsencrypt.org
Address 1: 23.195.219.207 a23-195-219-207.deploy.static.akamaitechnologies.com
Address 2: 2600:140a:0:384::ce0 g2600-140a-0000-0384-0000-0000-0000-0ce0.deploy.static.akamaitechnologies.com
Address 3: 2600:140a:0:3b0::ce0 g2600-140a-0000-03b0-0000-0000-0000-0ce0.deploy.static.akamaitechnologies.com
/ $ ^C
I highly suggest changing the base image from Alpine (the current one) to Debian in order to resolve these DNS issues as at the moment, cert-manager is incompatible with Let's Encrypt due to DNS issues not being able to resolve correctly with the current Alpine image.
I'd honestly open a PR, but it looks like the Alpine image is being built somewhere else and is pushed to gcr.io
.