Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock when listening tcp #356

Closed
kuznet9 opened this issue Nov 26, 2020 · 2 comments
Closed

deadlock when listening tcp #356

kuznet9 opened this issue Nov 26, 2020 · 2 comments

Comments

@kuznet9
Copy link

kuznet9 commented Nov 26, 2020

I was hit by massive deadlocks in unbound listening for tcp connections. Sometimes it arrives to state when number of idle tcp connections is maxed at incoming-num-tcp. New connections are not accepted, listening socket is excluded in select().
But idle unused connections never die. So, listening stops forever.

Not sure, but it looks like I found the reason. When incoming-num-tcp is about to be full the last added sockets are added
with c->tcp_timeout_msec==0. The first time in setup_tcp_handler timeout is forced to be TCP_QUERY_TIMEOUT_MINIMUM.
But if the socket did some io, the further comm_point_start_listening() are done with real 0(!) and timer is not started anymore. After several expirations of old sockets with non-zero timeout all the sockets remain with c->tcp_timeout_msec==0 and unbound stucks forever.

It is just a theory. I added capping c->tcp_timeout_msec to TCP_QUERY_TIMEOUT_MINIMUM in setup_tcp_handler().
Waiting for results. But I think more experienced developer could figure out the problem w/o waiting.

@kuznet9
Copy link
Author

kuznet9 commented Nov 26, 2020

Actually, I think the approach with reducing idle timeout when we are congested is deemed to fail. I have some experience with this in linux kernel. I used similar solution there around 2000th, it was cheap and looked nice but soon it became obvious it will never sustain even minor overloads. In single thread unbound the simplest and rock solid solution would be lru list per listening socket, when we remove socket from list when start to do something and put it to the tail when socket enters idle state. Then, when we !c->tcp_free we just close head of lru and proceed. It is not even so chumbersome, is going to be nice and clear. And we do not need redundant tcp-idle-timeout parameter anymore, established connections are closed when it is necessary.

@wcawijngaards
Copy link
Member

Hi Thanks for the detailed reasoning about the deadlock and TCP streams. I fixed it by applying the minimum so the no timeout zero value is not used, but the minimum value. This should no longer deadlock the TCP streams.

jedisct1 added a commit to jedisct1/unbound that referenced this issue Dec 11, 2020
* nlnet/master:
  - Fix missing prototypes in the code.
  Changelog note for NLnetLabs#373 - Merge PR NLnetLabs#373 from fobser: Warning: arithmetic on a pointer to void   is a GNU extension.
  Changelog note for NLnetLabs#335 - Merge PR NLnetLabs#335 from fobser: Sprinkle in some static to prevent   missing prototype warnings.
  Warning: arithmetic on a pointer to void is a GNU extension.
  - Fix to squelch permission denied and other errors from remote host,   they are logged at higher verbosity but not on low verbosity.
  - Fix NLnetLabs#371: unbound-control timeout when Unbound is not running.
  - iana portlist updated.
  - make depend.
  Code repo continues for 1.13.1 in development.
  - Fix update, with write event check with streamreuse and fastopen.
  - Fix for NLnetLabs#283: fix stream reuse and tcp fast open.
  - Fix on windows to ignore connection failure on UDP, unless verbose.
  - Fix unbound-dnstap-socket to not use log routine from interrupt   handler and not print so frequently when invoked in sequence.
  - Fix NLnetLabs#356: deadlock when listening tcp.
  - Fix NLnetLabs#360: for the additionally reported TCP Fast Open makes TCP   connections fail, in that case we print a hint that this is   happening with the error in the logs.
  Sprinkle in some static to prevent missing prototype warnings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants