Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forever "Too many queries. dropping incoming query" #227

Closed
R4scal opened this issue Apr 22, 2020 · 11 comments
Closed

Forever "Too many queries. dropping incoming query" #227

R4scal opened this issue Apr 22, 2020 · 11 comments

Comments

@R4scal
Copy link

R4scal commented Apr 22, 2020

Hi
We use unbound as caching recursive DNS on own site. Everything was fine, but one day our site lost internet connection for 2 minutes and resolving stop working. In logs

[1587561379] unbound[25416:0] debug: udp request from ip4 x.x.x.x port 52739 (len 16)
[1587561379] unbound[25416:0] debug: Too many queries. dropping incoming query.

The problem was not solved after recovery of the internet connnection in 5 or 15 minutes. The problem was solved only by restarting the unbound process.

Our config:

server:
    verbosity: 5
    interface: 0.0.0.0
    chroot: ""
    do-ip4: yes
    do-ip6: no
    num-threads: 1
    outgoing-range: 8192
    num-queries-per-thread: 4096
    logfile: /var/log/unbound.log
    pidfile: /var/run/unbound.pid
    hide-version: yes
    do-daemonize: yes
    access-control: 0.0.0.0/0 allow
    cache-min-ttl: 30
    serve-expired: yes
    serve-expired-ttl: 0
    serve-expired-ttl-reset: no
    сlient-timeout: 150
    prefetch: yes
    do-not-query-localhost: no
forward-zone:
  name: "."
  forward-addr: 8.8.8.8
  forward-addr: 8.8.4.4
forward-zone:
  name: "local.example.com"
  forward-addr: 127.0.0.1@5353
forward-zone:
  name: "consul.example.com"
  forward-addr: 127.0.0.1@8600

Versions:

unbound -V
Version 1.10.0

Configure line: --enable-fully-static LDFLAGS=-static --disable-flto --with-libevent
Linked libs: libevent 2.1.8-stable (it uses epoll), OpenSSL 1.1.1d  10 Sep 2019
Linked modules: dns64 respip validator iterator

BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues
@iz8mbw
Copy link

iz8mbw commented Apr 22, 2020

I can confirm a similar behavior if for example I restart the router and so I lost Internet connection for some minutes (unbound remains alive/up on the Server).
In my case only after some minutes (anyway too much) the router comes up (so Internet comes up) unbound start to works again. But in same cases I need to restart the Server/service where unbound runs to let unbound works again.

EDIT: I don't know if it can help: when I restart my Router the pubic IP changes.

@R4scal
Copy link
Author

R4scal commented Apr 23, 2020

Hi
@wcawijngaards @gthess sorry for that, but looks like this is a critical issue.

@R4scal
Copy link
Author

R4scal commented May 18, 2020

Almost a month has passed since the opening of Issue

@ralphdolmans
Copy link
Contributor

We believe this issue is fixed in 00323b7, which will be included in a future release.

You could temporarily work around this by setting serve-expired-client-timeout to 0.

@iz8mbw
Copy link

iz8mbw commented May 25, 2020

@ralphdolmans Hi Ralph. I built unbound using the master from Github of May 21, 2020 then this commit 00323b7 is included.
Today my router resetted so for some minutes I lost Internet connection but after the Internet connection come up again, unbound was not able to resolve. Every query: SERVFAIL.
I rebooted the Linux machine where unbound runs and DNS resolutions started again.

@aletheia7
Copy link

Maybe related to #248.

@petri-ojala
Copy link

We tested 1.10.0/1.10.1 in one of our DNS instances and seem to run into similar situation, it stops handling queries, thread by thread.

A colleague noticed from the statistics that requestlist.user value seems to wrap
total.requestlist.current.user=18446744073709551580
and when it's happening, we can see the exceeded value for a thread to increase:

thread0.requestlist.exceeded=16
thread1.requestlist.exceeded=12
thread2.requestlist.exceeded=17
thread3.requestlist.exceeded=19
thread4.requestlist.exceeded=0
thread5.requestlist.exceeded=18
thread6.requestlist.exceeded=15
thread7.requestlist.exceeded=15
...

(4 is probably still working..)

Trying to find more insights to the issue later..

@gthess
Copy link
Member

gthess commented Jun 18, 2020

Do you use serve-expired:? If so, have you tried the workaround mentioned above (serve-expired-client-timeout: 0)?

@petri-ojala
Copy link

petri-ojala commented Jun 18, 2020

Do you use serve-expired:? If so, have you tried the workaround mentioned above (serve-expired-client-timeout: 0)?

Didn't try setting it to 0 yet. The primary reason for trying out 1.10 was in fact related to this as with the older 1.8 we noticed that some programmatic DNS responses (miekg/dns -based piece of software, ttl set to 0) were cached and cache was "corrupted" with records that unbound had fetch without the EDNS subnet data. It looked like serve-expired related issue, prefetch yes/no didn't change the behaviour. Also haven't tried with 1.9.x yet.
The programmatic backend was working nicely with 1.10.1 and serve-expired-client-timeout set to the recommended 1800 ms. The backend is defined as stub with no-cache set to yes.

.. just installed 1.9.6 there and running it. Looks like it fixes the caching issue we had with 1.8.0.

@gthess
Copy link
Member

gthess commented Sep 9, 2020

Closing as this particular issue is already fixed with 00323b7.
The fix is already part of unbound 1.11.0.

@gthess gthess closed this as completed Sep 9, 2020
@petri-ojala
Copy link

Thank you. I can also confirm that we've been running 1.11.0 now for a month and have not experienced the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants