Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist TCP stack needs improving #4814

Closed
RobinGeuze opened this issue Dec 23, 2016 · 3 comments

Comments

Projects
None yet
3 participants
@RobinGeuze
Copy link
Contributor

commented Dec 23, 2016

  • Program: dnsdist
  • Issue type: Bug report

Description

The current dnsdist TCP stack has some weaknesses particularly when it is used in front of recursors. Due to the way dnsdist handles distributing queries over its tcp threads it can cause a "jam" of (very) slow queries on a certain thread. When it then assigns a "normal" query to this thread this query might timeout. Another case is that the queue for a certain thread can fill up the entire global TCP queue while the other thread still have plenty of time for processing queries causing queries to get dropped since the queue is "full".

These effects can be mitigated by spawning alot of tcp threads or setting the tcp recv timeouts on the server very low, but neither solution is very desirable.

Environment

  • Operating system: Any
  • Software version: 1.0.0, 1.1.0-beta2, git (probably)

Steps to reproduce

The easiest way to reproduce is to fire a bunch of known slow queries at dnsdist. dnsdist will then stop responding properly to normal queries as well or at least encountering failures at random (although I managed to temporarily break it completely with about 1 query every 2 seconds in my case).

@rgacogne

This comment has been minimized.

Copy link
Member

commented Dec 23, 2016

#4817 might help.

@Habbie

This comment has been minimized.

Copy link
Member

commented Nov 9, 2017

Did it help? :)

@RobinGeuze

This comment has been minimized.

Copy link
Contributor Author

commented Mar 18, 2018

We again ran into some troubles today so I activated the possible fix to check it effectiveness.

These are the TCP stats on the machine without the singlepipe:

Clients    MaxClients Queued     MaxQueued
134        500        0          1000

And these are the TCP stats on the machine with the singlepipe:

Clients    MaxClients Queued     MaxQueued
19         500        0          1000

On the machine without the singlepipe the clients number will keep climbing till it reaches 500 and then start queueing things. I restarted both dnsdist instances roughly at the same time. So far the singlepipe version seems to stay stable at 19 clients so it seems this works like a charm!

@rgacogne rgacogne referenced this issue Mar 11, 2019

Merged

dnsdist: Refactoring of the TCP stack #7559

3 of 7 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.