Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nslookup does not work in latest busybox image #48

Closed
krishshenoy opened this issue Jul 17, 2018 · 22 comments

Comments

Projects
None yet
10 participants
@krishshenoy
Copy link

commented Jul 17, 2018

I deployed a kubernetes image using the latest version of busybox image.
After the pod was successfully deployed I tried to run
kubectl exec busybox nslookup kubernetes.default

The nslookup command no longer works.

shenoyk-m01:image-pipeline shenoyk$ kubectl exec busybox nslookup kubernetes.default
Server: 10.0.0.10
Address: 10.0.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

The same command works when specifying busybox:1.28 version for the image. Nslookup started failing with the latest version

busybox.yaml is below.

apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:

  • image: busybox
    command:
    • sleep
    • "3600"
      imagePullPolicy: Always
      name: busybox
      restartPolicy: Always
@wglambert

This comment has been minimized.

Copy link

commented Jul 17, 2018

Seems to be a kubernetes configuration issue

Not able to reproduce the issue through Docker standalone

$ docker run --rm -dit --name busybox busybox:latest
$ docker exec -it busybox sh

# ping google.com
PING google.com (172.217.11.174): 56 data bytes
64 bytes from 172.217.11.174: seq=0 ttl=53 time=14.993 ms
64 bytes from 172.217.11.174: seq=1 ttl=53 time=14.598 ms
64 bytes from 172.217.11.174: seq=2 ttl=53 time=14.039 ms
^C
# nslookup github.com
Server:    8.8.8.8
Address 1: 8.8.8.8 google-public-dns-a.google.com

Name:      github.com
Address 1: 192.30.255.112 lb-192-30-255-112-sea.github.com
Address 2: 192.30.255.113 lb-192-30-255-113-sea.github.com
# nslookup google.com
Server:    8.8.8.8
Address 1: 8.8.8.8 google-public-dns-a.google.com

Name:      google.com
Address 1: 2607:f8b0:4007:804::200e lax28s15-in-x0e.1e100.net
Address 2: 216.58.219.14 lax17s03-in-f14.1e100.net

Kubernetes with hostNetwork: true

$ kubectl exec busybox-7cc555b5d6-2mmcr ping google.com
PING google.com (172.217.11.174): 56 data bytes
64 bytes from 172.217.11.174: seq=0 ttl=54 time=13.444 ms
64 bytes from 172.217.11.174: seq=1 ttl=54 time=14.249 ms
64 bytes from 172.217.11.174: seq=2 ttl=54 time=20.149 ms
^C

$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup google.com 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8:53

Non-authoritative answer:
Name:   google.com
Address: 172.217.11.174

*** Can't find google.com: No answer

$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup kubernetes.default 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup kubernetes.default
Server:         127.0.0.53
Address:        127.0.0.53:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

This seems to be the most relevant issue I found kubernetes/kubernetes#33798

@wglambert wglambert added the question label Jul 17, 2018

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 17, 2018

This reminds me of the fun we had back in #9, but that doesn't seem related. 😞

@krishshenoy

This comment has been minimized.

Copy link
Author

commented Jul 17, 2018

I have a kubernetes cluster monitoring test that continually deploys a busybox pod in a cluster and verifies DNS resolution within the pod by executing kubectl exec nslookup. It started failing right when I downloaded the latest busybox image. Installing a busybox pod with the previous version 1.28 of the image nslookup works. All signs point to a change in this latest version that is causing the failure.

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 17, 2018

Unfortunately, that only narrows it down to somewhere in the sea of 438 files changed, 9453 insertions(+), 4480 deletions(-) (from 1_28_4 to 1_29_1 in the Git tags of the two releases).

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 17, 2018

Something in here seems most likely:

$ git log --oneline 1_28_4...1_29_1 -- networking/nslookup.c
2f7738e47 nslookup: placate "warning: unused variable i"
c72499584 nslookup: simplify make_ptr
71e4b3f48 nslookup: get rid of query::rlen field
58e43a4c4 nslookup: move array of queries to "globals"
4b6091f92 nslookup: accept lowercase -type=soa, document query types
6cdc3195a nslookup: change -stats to -debug (it's a bug in bind that it accepts -s)
d4461ef9f nslookup: rework option parsing
a980109c6 nslookup: smaller qtypes[] array
2cf75b3c8 nslookup: process replies immediately, do not store them
4e73c0f65 nslookup: fix output corruption for "nslookup 1.2.3.4"
cf950cd3e nslookup: more closely resemble output format of bind-utils-9.11.3
71e016d80 nslookup: shrink send_queries()
db93b21ec nslookup: use xmalloc_sockaddr2dotted() instead of homegrown function
55bc8e882 nslookup: usee bbox network functions instead of opne-coded mess
0dd3be8c0 nslookup: add openwrt / lede version
@djsly

This comment has been minimized.

Copy link

commented Jul 18, 2018

/label bug
we are having the same issue.
1.27/1.28 are working , 1.29/1.29.1 are not

kubectl run --attach busybox --rm --image=busybox:1.27 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.

Server:    192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 192.168.0.1 kubernetes.default.svc.cluster.local
kubectl run --attach busybox --rm --image=busybox:1.28 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.

Server:    192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 192.168.0.1 kubernetes.default.svc.cluster.local
 kubectl run --attach busybox --rm --image=busybox:1.29 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.

Server:         192.168.0.10
Address:        192.168.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer
 kubectl run --attach busybox --rm --image=busybox:1.29.1 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.


Server:         192.168.0.10
Address:        192.168.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer
@tianon

This comment has been minimized.

Copy link
Member

commented Jul 18, 2018

@djsly

This comment has been minimized.

@tokiwinter

This comment has been minimized.

Copy link

commented Jul 18, 2018

Same issue here. Reverting to 1.28 fixed the issue for me.

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 20, 2018

How does this relate to #27? Are they the same issue?

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 20, 2018

From what I can tell, the new resolver in BusyBox's nslookup doesn't support DNS search domains at all, which seems like a pretty hefty regression.

@krishshenoy

This comment has been minimized.

Copy link
Author

commented Jul 21, 2018

Thanks tianon. How will this be addressed?

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 21, 2018

@hickeng

This comment has been minimized.

Copy link

commented Jul 26, 2018

@djsly Try using "sleep 4 && nslookup -type=a kubernetes.default"

I've added my findings here: https://bugs.busybox.net/show_bug.cgi?id=11161#c4

@piersharding

This comment has been minimized.

Copy link

commented Aug 8, 2018

As a suggestion, would it be possible to regress the :latest tag to point to 1.8.x until upstream is resolved?

@krzysztofp

This comment has been minimized.

Copy link

commented Aug 9, 2018

See this issue:
kubernetes/kubernetes#66924

@tianon

This comment has been minimized.

Copy link
Member

commented Aug 9, 2018

Given that the upstream change was intentional and is a reflection of upstream, I'm not comfortable changing latest back to 1.28 (especially given that 1.29 is considered "stable" by upstream) -- I'd recommend instead pinning usage to busybox:1.28 (or more specifically, busybox:1.28-variant) for now until the updated functionality which resolves this issue is implemented upstream. (Pinning to a particular release or release series of dependencies is generally good advice anyhow, and it looks like Busybox upstream might intend to get more aggressive about changes in the future, so it seems more prudent than ever.)

@krzysztofp

This comment has been minimized.

Copy link

commented Aug 9, 2018

For some people it’s still difficult to admit a mistake. Being aggressive and brave with new changes is one thing, breaking stuff that worked before is another one, especially these days when a lot of people are using “:latest” by default - introducing a BC and calling that was on purpose is just far from wise.

Please read more about semantic versioning as well.

@piersharding

This comment has been minimized.

Copy link

commented Aug 9, 2018

Hi @tianon - I can understand that you don't want to have a regression on :latest, but there is a surprising amount of fallout from this simple issue because so many people and documentation out there use busybox:latest as the "Hello, World" example. Temporarily changing the tag would help mitigate that pain and these unintended consequences.

Cheers,
Piers.

jonashackt added a commit to jonashackt/kubernetes-the-hard-way that referenced this issue Sep 4, 2018

Fixing DNS resolution with latest busybox
Version `1.28.4` of busybox does the `nslookup` correctly as described in the tutorial, the `latest` does not. So it needs to be set explicitely. Fixes kelseyhightower#356. Also see docker-library/busybox#48.
@tianon

This comment has been minimized.

Copy link
Member

commented Sep 4, 2018

Given that this issue is an upstream issue (not something we've introduced), that it is appropriately filed at https://bugs.busybox.net/show_bug.cgi?id=11161, and apparently will be fixed in the next release (https://git.busybox.net/busybox/commit/?id=9408978a438ac6c3becb2216d663216d27b59eab), I'm going to close.

It would appear that Kubernetes has adjusted to use busybox:1.28 explicitly in the meantime (kubernetes/website#9901), which is the simplest workaround for folks affected by this upstream change.

@voelzmo

This comment has been minimized.

Copy link

commented Mar 11, 2019

According to the 1.30 releasenotes the patch for https://bugs.busybox.net/show_bug.cgi?id=11161 is in there – however, I still had to ping my image to 1.28 in order to execute a simple

$ kubectl run -i --tty --image busybox:1.28 dns-test --restart=Never --rm nslookup web-0.nginx
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      web-0.nginx
Address 1: 172.17.0.2 web-0.nginx.default.svc.cluster.local
pod "dns-test" deleted

Whereas :latest aka :1.30.1 have me this

$ kubectl run -i --tty --image busybox:1.30.1 dns-test --restart=Never --rm nslookup web-0.nginx
If you don't see a command prompt, try pressing enter.

*** Can't find web-0.nginx: No answer

pod "dns-test" deleted

This is just using minikube and an nginx statefulset from https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset

I'm not sure if I'm missing something here, but is this issue really solved?

agolomoodysaada added a commit to agolomoodysaada/blackbox_exporter that referenced this issue May 22, 2019

medyagh added a commit to kubernetes/minikube that referenced this issue Jun 14, 2019

@Simon3

This comment has been minimized.

Copy link

commented Jun 17, 2019

Hi, after months of using busybox in Kubernetes with no problem, today I've just got something that seems to be the same NXDOMAIN bug as reported in this thread:

/ # nslookup kubernetes.default
Server:		10.0.0.10
Address:	10.0.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

/ # echo $?
1

But this works:

/ # nslookup kubernetes.default.svc.cluster.local
Server:		10.0.0.10
Address:	10.0.0.10:53

Non-authoritative answer:
Name:	kubernetes.default.svc.cluster.local
Address: 10.0.0.1

*** Can't find kubernetes.default.svc.cluster.local: No answer

/ # echo $?
0
/ # cat /etc/resolv.conf 
nameserver 10.0.0.10
search flowr-besix-stay.svc.cluster.local svc.cluster.local cluster.local c.taktik-dev.internal google.internal
options ndots:5

In my chart I have always simply been using 'busybox', I'm not sure on which tag I am currently, all I could find is the hash of the image:

    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:bf510723d2cd2d4e3f5ce7e93bf1e52c8fd76831995ac3bd3f90ecc866643aff

Meanwhile, the workaround is just to use nslookup cassandra.cassandra.svc.cluster.local instead of nslookup cassandra.cassandra.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.