dnsdist TCP stack needs improving #4814
The current dnsdist TCP stack has some weaknesses particularly when it is used in front of recursors. Due to the way dnsdist handles distributing queries over its tcp threads it can cause a "jam" of (very) slow queries on a certain thread. When it then assigns a "normal" query to this thread this query might timeout. Another case is that the queue for a certain thread can fill up the entire global TCP queue while the other thread still have plenty of time for processing queries causing queries to get dropped since the queue is "full".
These effects can be mitigated by spawning alot of tcp threads or setting the tcp recv timeouts on the server very low, but neither solution is very desirable.
Steps to reproduce
The easiest way to reproduce is to fire a bunch of known slow queries at dnsdist. dnsdist will then stop responding properly to normal queries as well or at least encountering failures at random (although I managed to temporarily break it completely with about 1 query every 2 seconds in my case).
The text was updated successfully, but these errors were encountered:
We again ran into some troubles today so I activated the possible fix to check it effectiveness.
These are the TCP stats on the machine without the singlepipe:
And these are the TCP stats on the machine with the singlepipe:
On the machine without the singlepipe the clients number will keep climbing till it reaches 500 and then start queueing things. I restarted both dnsdist instances roughly at the same time. So far the singlepipe version seems to stay stable at 19 clients so it seems this works like a charm!