Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use c-ares callback for socket close #3238

Closed
wants to merge 3 commits into from

Conversation

r0ro
Copy link
Contributor

@r0ro r0ro commented Nov 5, 2018

When using c-ares for asyn dns, the dns socket fd was silently closed by c-ares without curl being aware. curl would then 'realize' the fd has been removed at next call of Curl_resolver_getsock, and only then notify the CURLMOPT_SOCKETFUNCTION to remove fd from its poll set with CURL_POLL_REMOVE. At this point the fd is already closed.

Using the ephiperfifo.c example with the updated patch displaying epoll_ctl errors allow reproducing the issue:

* STATE: INIT => CONNECT handle 0x2345bf8; line 1428 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => WAITRESOLVE handle 0x2345bf8; line 1464 (connection #0)
Progress: https://r0ro.fr/ (0/0)
socket callback: s=7 e=0x2345bf8 what=IN Adding data: IN
socket callback: s=7 e=0x2345bf8 what=REMOVE 
EPOLL_CTL_DEL failed for fd: 7 : Bad file descriptor   <======== fd already closed here
*   Trying 2001:bc8:33d1::1...
* TCP_NODELAY set
* STATE: WAITRESOLVE => WAITCONNECT handle 0x2345bf8; line 1545 (connection #0)

Note that it only affect builds with ares enabled:

./configure --enable-ares --enable-debug

@@ -135,7 +135,7 @@ void Curl_resolver_cleanup(void *resolver)
int Curl_resolver_duphandle(void **to, void *from)
{
(void)from;
return Curl_resolver_init(to);
return Curl_resolver_init(NULL, to);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. Shouldn't this then pass in the new easy handle here? Otherwise this stores a NULL pointer to get used in the callback later on, and the callback doesn't deal with a NULL pointer...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of asyn-thread.c Curl_resolver_duphandle is calling Curl_resolver_init that is not using the added easy parameter.
(By the way this is causing a new warning breaking travis check, so I'll submit a new patch to correct this.)
Reviewed this patch, I believe that I should add the new created easy as parameter to Curl_resolver_duphandle

Copy link
Contributor Author

@r0ro r0ro Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there is a problem for c-ares too.
In case Curl_resolver_duphandle is called, ares_dup is then called, but the sock_state_cb_data of the new ares handle should be changed to point to the newly allocated easy.
Right now c-ares only supports setting the sock_state_cb_data in init_by_options so the only way that I see how to fix it now, is to create a new ares handle instead of using ares_dup
What do you think ?

@bagder bagder changed the title Fix ares silent fd close use c-ares callback for socket close Nov 6, 2018
When using c-ares for asyn dns, the dns socket fd was silently closed
by c-ares without curl being aware. curl would then 'realize' the fd
has been removed at next call of Curl_resolver_getsock, and only then
notify the CURLMOPT_SOCKETFUNCTION to remove fd from its poll set with
CURL_POLL_REMOVE. At this point the fd is already closed.

By using ares socket state callback (ARES_OPT_SOCK_STATE_CB), this
patch allows curl to be notified that the fd is not longer needed
for neither for write nor read. At this point by calling
Curl_multi_closed we are able to notify multi with CURL_POLL_REMOVE
before the fd is actually closed by ares.

In asyn-ares.c Curl_resolver_duphandle we can't use ares_dup anymore
since it does not allow passing a different sock_state_cb_data
Copy link
Member

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems fine, just a little unfortunate that you bundle it with the c-ares fix since it gets less visibility then...

lib/asyn-ares.c Outdated
Curl_multi_closed(easy->easy_conn, socket_fd);
}
else {
DEBUGASSERT(easy->easy_conn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest you either do the if() or the assert, not both. You either allow and expect NULLs to enter there, or not. Unless you have a reason to allow NULLs, I would prefer unconditional DEBUGASSERT there. I would also suggest a DEBUGASSERT(easy) to catch any such mistakes better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely expect easy to be != NULL, so I could add a DEBUGASSERT(easy).
For easy->easy_conn, looking at the code I don't expect it to be NULL either, but I'm not familiar enough with the code to be 100% sure that it can't happen, do you prefer I keep the if or the assert ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking closer, it seems that easy_conn might be NULL in some cases, if the socket for example gets closed before there's a connection setup.

But also, if that happens this code won't call Curl_multi_closed() and thus the application won't be told about the closure, which seems wrong. I'm thinking Curl_multi_closed() perhaps should also be modified ever so slightly to take a struct Curl_easy *data as input argument instead of a connection. The actual function doesn't care about the connection and just wants the easy handle...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed the change you suggested.

@bagder bagder self-assigned this Nov 6, 2018
@r0ro
Copy link
Contributor Author

r0ro commented Nov 9, 2018

It seems that travis is still marked as "pending" when it's actually passed: https://travis-ci.org/curl/curl/builds/451829268

@bagder
Copy link
Member

bagder commented Nov 20, 2018

Thanks!

@bagder bagder closed this in 6765e6d Nov 20, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Feb 18, 2019
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants