Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On MacOS, if a remote socket is closed while curl is connecting to it, the program will spin until the connection timeout elapses #7595

Closed
marc-groundctl opened this issue Aug 19, 2021 · 11 comments

Comments

@marc-groundctl
Copy link
Contributor

@marc-groundctl marc-groundctl commented Aug 19, 2021

I did this

Compiled and ran the following program:

#include <curl/curl.h>
#include <boost/asio.hpp>

using namespace boost::asio;

int progress_callback(void *clientp, curl_off_t dltotal, curl_off_t dlnow, curl_off_t ultotal, curl_off_t ulnow)
{
	static int progress_callback_times = 0;

	++progress_callback_times;
	// Close the socket when mstate == MSTATE_CONNECTING
	if (progress_callback_times == 2)
	{
		auto &a = *(ip::tcp::acceptor *)clientp;
		auto sock = a.accept();
		sock.close();
	}
	return 0;
}

int main()
{
	io_context ctx(1);
	ip::tcp::acceptor a(ctx.get_executor());

	a.open(ip::tcp::v4());
	socket_base::reuse_address ra(true);
	a.set_option(ra);
	ip::tcp::endpoint e{ip::address_v4::loopback(), 8080};
	a.bind(e);
	a.listen();

	auto easy = curl_easy_init();
	curl_easy_setopt(easy, CURLOPT_URL, "http://127.0.0.1:8080");
	curl_easy_setopt(easy, CURLOPT_VERBOSE, 1L);
	curl_easy_setopt(easy, CURLOPT_NOPROGRESS, 0L);
	curl_easy_setopt(easy, CURLOPT_XFERINFOFUNCTION, progress_callback);
	curl_easy_setopt(easy, CURLOPT_XFERINFODATA, &a);

	curl_easy_perform(easy);
	return 0;
}

I expected the following

The program should exit almost immediately, since it is attempting to communicate with a closed socket

What actually happened

The program consumed an entire CPU core for 5 minutes before the connection timed out.

This only happens on Mac; Linux works fine.

The issue appears to be that when waiting for the socket to be ready, multi_wait calls poll with events set to POLLOUT, and the OS sets revents to POLLOUT and returns immediately. However, when multi_runsingle calls poll it sets events to POLLOUT|POLLPRI and the OS sets revents to POLLHUP|POLLPRI. Curl interprets the POLLHUP as an error, but then when verifyconnect checks SO_ERROR it gets back 0 so it assumes that there wasn't actually an error. Because revents didn't contain POLLOUT curl thinks that it needs to wait for the socket to be writable, so it calls multi_wait and the cycle repeats until the connect timeout elapses.

curl/libcurl version

curl 7.78.0-DEV (Darwin) libcurl/7.78.0-DEV SecureTransport zlib/1.2.11
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp 
Features: alt-svc AsynchDNS Debug HSTS IPv6 Largefile libz NTLM SSL TrackMemory UnixSockets

I built curl off of commit bfbde88

operating system

Darwin NYC-L5169-MAC.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64 i386 MacBookPro16,1 Darwin

@bagder
Copy link
Member

@bagder bagder commented Aug 20, 2021

What does your sample code need to build? I tried brew install boost on my mac but then your example causes a number of build errors and warnings?

@marc-groundctl
Copy link
Contributor Author

@marc-groundctl marc-groundctl commented Aug 20, 2021

I forgot to mention that you need to compile it with -std=c++17 because clang still defaults to C++03.

@bagder
Copy link
Member

@bagder bagder commented Aug 20, 2021

I can't reproduce, but I have a few other versions in my env:

  1. I used curl from current git master (since I don't think any of this logic has changed since)
  2. I have macOS 11.5.2 here, Darwin Kernel Version 20.6.0

Presumably, my kernel version doesn't behave the same as yours...

@bagder
Copy link
Member

@bagder bagder commented Aug 20, 2021

I reran the test with a fresh 7.78.0 build (bfbde88) but again it doesn't reproduce.

@jay
Copy link
Member

@jay jay commented Aug 21, 2021

This doesn't reproduce in Windows but that's probably because Curl_poll emulates poll with select. It looks like this:

ufds[i].events: 1040 POLLOUT (POLLWRNORM) | POLLPRI
ufds[i].revents: 0
ufds[i].events: 1792 POLLIN (POLLRDNORM | POLLRDBAND) | POLLPRI
ufds[i].revents: 0
ufds[i].events: 1792 POLLIN (POLLRDNORM | POLLRDBAND) | POLLPRI
ufds[i].revents: 0

@marc-groundctl
Copy link
Contributor Author

@marc-groundctl marc-groundctl commented Aug 23, 2021

That's odd, because it happens to me on kernel 20.6.0. Maybe it's affected by the build method? I'm building with cmake, with the variables CMAKE_BUILD_TYPE=Debug, ENABLE_DEBUG=ON, and CMAKE_USE_SECTRANSP=ON. It's possible that for you mstate isn't MSTATE_CONNECTING during the second progress function call, and it'll reproduce if you use another constant in the progress_callback_times == 2 condition.

I've also bisected it, and it seems to have been introduced in commit e21bd22, which makes sense since that's when POLLPRI was added to the set of polled events.

@bagder
Copy link
Member

@bagder bagder commented Aug 23, 2021

I built with configure, but I can't see how that would change these things. Does reverting that commit fix things for you?

@mback2k, do you recall if you brought POLLPRI there for a particular reason other than "completeness"?

@marc-groundctl
Copy link
Contributor Author

@marc-groundctl marc-groundctl commented Aug 23, 2021

Yes, reverting that commit fixes the issue.

bagder added a commit that referenced this issue Aug 23, 2021
This reverts commit e21bd22.

Reported-by: marc-groundctl
Fixes #7595
@bagder
Copy link
Member

@bagder bagder commented Aug 23, 2021

The original commit says it was needed for "poll()" to detect connection problems, but we've used poll for many years before that without problems to detect connect problems...

@bagder
Copy link
Member

@bagder bagder commented Aug 23, 2021

Uh-oh. I think we can take this in another direction. poll() should not be used on macOS, because Apple keeps messing it up and we avoid it on configure builds. That's why I didn't spot any problem with this! This is yet another bug in the cmake build...

bagder added a commit that referenced this issue Aug 23, 2021
... like we've do in configure builds. Since poll() on macOS is not
reliable enough.

Reported-by: marc-groundctl
Fixes #7595
@bagder bagder added the cmake label Aug 23, 2021
bagder added a commit that referenced this issue Aug 23, 2021
... like we do in configure builds. Since poll() on macOS is not
reliable enough.

Reported-by: marc-groundctl
Fixes #7595
@Daoudia

This comment has been minimized.

@curl curl temporarily blocked Daoudia Aug 23, 2021
@bagder bagder self-assigned this Aug 24, 2021
@bagder bagder closed this in 825911b Aug 24, 2021
mitchellwong added a commit to GroundControl-Solutions/curl that referenced this issue Aug 26, 2021
... like we do in configure builds. Since poll() on macOS is not
reliable enough.

Reported-by: marc-groundctl
Fixes curl#7595
Closes curl#7619

(cherry picked from commit 825911b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants