Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist downstream sockets can go stale but it never recovers #4155

Closed
yunzheng opened this issue Jul 11, 2016 · 3 comments
Closed

dnsdist downstream sockets can go stale but it never recovers #4155

yunzheng opened this issue Jul 11, 2016 · 3 comments
Labels

Comments

@yunzheng
Copy link

@yunzheng yunzheng commented Jul 11, 2016

I'm using dnsdist (1.0) with some downstream servers over VPN and sometimes the connected UDP socket can go stale. It does not recover from this state and I have to restart dnsdist to fix it.

Although the downstream server can still be marked UP or DOWN by the healthChecksThread() as it does not use connected sockets there.

The UDP send() in udpClientSendRequestToBackend will actually return an error code of -1 but it's only checked to increase the error counters.

What I expected was that it would reconnect the socket to recover from this error.

I implemented the following reconnect() function in the Downstream class which I call after increasing the error counters:

void reconnect()
{
   shutdown(fd, SHUT_RDWR);
   close(fd);
   if (!IsAnyAddress(remote)) {
      fd = SSocket(remote.sin4.sin_family, SOCK_DGRAM, 0);
      if (!IsAnyAddress(sourceAddr)) {
        SSetsockopt(fd, SOL_SOCKET, SO_REUSEADDR, 1);
        SBind(fd, sourceAddr);
      }
      SConnect(fd, remote);
   }
}

The shutdown() was needed to signal the blocking recv() call in the responderThread.

I'm not sure if this is the correct way, but it works well for me.

@zeha zeha added the dnsdist label Jul 13, 2016
@rgacogne
Copy link
Member

@rgacogne rgacogne commented Aug 3, 2016

Would you happen to know the value of errno after send() returns -1? I'm guessing ENETDOWN or ENETUNREACH.
If I understand correctly, closing the socket is not enough to wake the responderThread from recv(), but shutdown() is?

@rgacogne
Copy link
Member

@rgacogne rgacogne commented Aug 3, 2016

I've done some tests, and it looks like send() might fail with EINVAL when the interface the socket was bound to doesn't exist anymore..

@yunzheng
Copy link
Author

@yunzheng yunzheng commented Aug 3, 2016

I used the following debug print:

warnlog("send() returned: %d, errno: %s, from: %s, to: %s, ret, strerror(errno), remote.toStringWithPort(), ss->getName());

From my logs it would print:

send() returned: -1, errno: Invalid argument, from: x.x.x.x:31451

So yes, errno is EINVAL. Closing the socket was not enough for recv() to return, shutdown was needed as well.

Nice that you are able to reproduce it as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants