New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poll() on macOS 10.12 returns prematurely #1057

Closed
ngg opened this Issue Oct 7, 2016 · 27 comments

Comments

Projects
None yet
@ngg
Contributor

ngg commented Oct 7, 2016

Tested with both curl 7.50.3 (which I compiled) and with curl 7.49.1 (bundled with OSX).
It only affects 10.12 OSX version (I tried it on 10.11 and 10.7 and both of those work well).

It can be reproduced with this command:
curl --limit-rate 10 http://google.com

Programatically I could reproduce this with the easy interface by setting either CURLOPT_MAX_RECV_SPEED_LARGE or CURLOPT_MAX_SEND_SPEED_LARGE.

@peterbudai

This comment has been minimized.

Show comment
Hide comment
@peterbudai

peterbudai Oct 7, 2016

I found an issue in Python's bug tracker, that is possibly related. There may seem a bug or change in the poll() syscall on Mac OS 12 that causes it to return prematurely.

As far as I know, libcurl preferably uses poll() system call internally on platforms where it is supported.
Though not defining HAVE_POLL_FINE for OSX builds would solve the issue I think (haven't tested), this seems rather drastic.

peterbudai commented Oct 7, 2016

I found an issue in Python's bug tracker, that is possibly related. There may seem a bug or change in the poll() syscall on Mac OS 12 that causes it to return prematurely.

As far as I know, libcurl preferably uses poll() system call internally on platforms where it is supported.
Though not defining HAVE_POLL_FINE for OSX builds would solve the issue I think (haven't tested), this seems rather drastic.

@ngg

This comment has been minimized.

Show comment
Hide comment
@ngg

ngg Oct 7, 2016

Contributor

I can confirm that using select() instead of poll() solves the issue.

Contributor

ngg commented Oct 7, 2016

I can confirm that using select() instead of poll() solves the issue.

@ngg

This comment has been minimized.

Show comment
Hide comment
@ngg

ngg Oct 7, 2016

Contributor

On Linux the manual page of poll says:
The timeout argument specifies the minimum number of milliseconds that poll() will block.

On OSX the manual page says:
If timeout is greater than zero, it specified a maximum interval (in milliseconds) to wait for any file descriptor to become ready.

This means that it's not guaranteed that calling poll without any fds will sleep at all.

Unfortunately the OSX select man page also says:
If timeout is a non-nil pointer, it specifies a maximum interval to wait for the selection to complete.

Which means that select can return immediately also, but that's not the case according to my tests.

Contributor

ngg commented Oct 7, 2016

On Linux the manual page of poll says:
The timeout argument specifies the minimum number of milliseconds that poll() will block.

On OSX the manual page says:
If timeout is greater than zero, it specified a maximum interval (in milliseconds) to wait for any file descriptor to become ready.

This means that it's not guaranteed that calling poll without any fds will sleep at all.

Unfortunately the OSX select man page also says:
If timeout is a non-nil pointer, it specifies a maximum interval to wait for the selection to complete.

Which means that select can return immediately also, but that's not the case according to my tests.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 7, 2016

Member

Thanks for the excellent report and researching the specifics here!

poll() is defined by POSIX and TSUS, Nobody can just arbitrarily change how it works without causing havoc in applications using it. Like in this case. Note that macOS has a history of broken poll implementations in the past as well and that's why we have the HAVE_POLL_FINE define in the first place.

The initial fix for this problem is probably to make the CURL_CHECK_FUNC_POLL configure macro return false for the check so that libcurl will switch to using select() instead.

I've reached out to a friend at Apple about this issue since it also affects Apple's own curl, maybe that'll give something.

Member

bagder commented Oct 7, 2016

Thanks for the excellent report and researching the specifics here!

poll() is defined by POSIX and TSUS, Nobody can just arbitrarily change how it works without causing havoc in applications using it. Like in this case. Note that macOS has a history of broken poll implementations in the past as well and that's why we have the HAVE_POLL_FINE define in the first place.

The initial fix for this problem is probably to make the CURL_CHECK_FUNC_POLL configure macro return false for the check so that libcurl will switch to using select() instead.

I've reached out to a friend at Apple about this issue since it also affects Apple's own curl, maybe that'll give something.

@koczkatamas

This comment has been minimized.

Show comment
Hide comment
@koczkatamas

koczkatamas Oct 7, 2016

I compared the poll implementations between 10.11 El Capitan and 10.12 Sierra and it looks like they changed this line https://github.com/opensource-apple/xnu/blob/10.11/bsd/kern/sys_generic.c#L1784 :

From: if (rfds > 0) to if (rfds == nfds).

If I understand correctly the change, poll previously exited immediately if it found error with ANY fd while the new implementation returns only if founds error with ALL fds.

In our case nfds is 0, so rfds == nfds will always be true thus it never calls kqueue_scan and poll never waits for the timeout.

This was probably an unintended side-effect of the change.

koczkatamas commented Oct 7, 2016

I compared the poll implementations between 10.11 El Capitan and 10.12 Sierra and it looks like they changed this line https://github.com/opensource-apple/xnu/blob/10.11/bsd/kern/sys_generic.c#L1784 :

From: if (rfds > 0) to if (rfds == nfds).

If I understand correctly the change, poll previously exited immediately if it found error with ANY fd while the new implementation returns only if founds error with ALL fds.

In our case nfds is 0, so rfds == nfds will always be true thus it never calls kqueue_scan and poll never waits for the timeout.

This was probably an unintended side-effect of the change.

@tobypeterson

This comment has been minimized.

Show comment
Hide comment
@tobypeterson

tobypeterson Oct 7, 2016

Contributor

For anyone inquiring with Apple, the Radar number is 28372390.

Contributor

tobypeterson commented Oct 7, 2016

For anyone inquiring with Apple, the Radar number is 28372390.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 8, 2016

Member

Did anyone try to make the configure check detect this problem yet? It'd be cool to have the work-around in place when we ship the next release! I don't have any 10.12 mac myself to try this out on right now.

Member

bagder commented Oct 8, 2016

Did anyone try to make the configure check detect this problem yet? It'd be cool to have the work-around in place when we ship the next release! I don't have any 10.12 mac myself to try this out on right now.

@bagder bagder changed the title from Setting bandwidth limit causes 100% CPU usage on Mac OSX 10.12 to poll() on macOS 10.12 returns prematurely Oct 8, 2016

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 11, 2016

Member

How to detect this? The following code seems to be good enough.

#include <poll.h>
#include <stdio.h>
#include <sys/time.h>

int main(void)
{
  struct timeval before, after;
  int rc;
  size_t us;

  gettimeofday(&before, NULL);
  rc = poll(NULL, 0, 500);
  gettimeofday(&after, NULL);

  us = (after.tv_sec - before.tv_sec) * 1000000 +
    (after.tv_usec - before.tv_usec);

  if(us < 400000) {
    puts("poll() is broken");
    return 1;
  }
  else {
    puts("poll() works");
  }
  return 0;
}
Member

bagder commented Oct 11, 2016

How to detect this? The following code seems to be good enough.

#include <poll.h>
#include <stdio.h>
#include <sys/time.h>

int main(void)
{
  struct timeval before, after;
  int rc;
  size_t us;

  gettimeofday(&before, NULL);
  rc = poll(NULL, 0, 500);
  gettimeofday(&after, NULL);

  us = (after.tv_sec - before.tv_sec) * 1000000 +
    (after.tv_usec - before.tv_usec);

  if(us < 400000) {
    puts("poll() is broken");
    return 1;
  }
  else {
    puts("poll() works");
  }
  return 0;
}

@bagder bagder closed this in 9297ca4 Oct 11, 2016

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 11, 2016

Member

I added a check based on the snippet above in commit 9297ca4. It detects the problem for me on my 10.12 machine and it verifies a working poll() on my Linux machine.

Please try it out and report if you experience any problems with it!

Member

bagder commented Oct 11, 2016

I added a check based on the snippet above in commit 9297ca4. It detects the problem for me on my 10.12 machine and it verifies a working poll() on my Linux machine.

Please try it out and report if you experience any problems with it!

@peterbudai

This comment has been minimized.

Show comment
Hide comment
@peterbudai

peterbudai Oct 11, 2016

I tried out the detection code from the comment above and it seems to work.
It prints poll() works on OSX 10.11, but prints poll() is broken on OSX 10.12.

peterbudai commented Oct 11, 2016

I tried out the detection code from the comment above and it seems to work.
It prints poll() works on OSX 10.11, but prints poll() is broken on OSX 10.12.

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Oct 12, 2016

Member

A few questions about this from someone with no familiarity to mac. What happens when curl is built for mac using something other than autotools, if that's possible, and also what if libcurl built for mac xx is used on mac yy where poll doesn't work? I'm thinking about this from a Windows standpoint where we have several build systems and compile-time decisions in a case like this I think would be more correct to do at runtime.

Member

jay commented Oct 12, 2016

A few questions about this from someone with no familiarity to mac. What happens when curl is built for mac using something other than autotools, if that's possible, and also what if libcurl built for mac xx is used on mac yy where poll doesn't work? I'm thinking about this from a Windows standpoint where we have several build systems and compile-time decisions in a case like this I think would be more correct to do at runtime.

@nickzman

This comment has been minimized.

Show comment
Hide comment
@nickzman

nickzman Oct 12, 2016

Collaborator

curl would have to be built using either autotools or CMake. poll() was introduced 12 years ago in macOS 10.4, and very, very few people use a macOS version that old or older. Apple's XP is 10.6.

The one advantage one would get from a runtime check is, if it was built under 10.11, then it could use select() in 10.12 and poll() in 10.11 even though it wasn't built for 10.12. But then the test would have to be run every time the tool is started.

Collaborator

nickzman commented Oct 12, 2016

curl would have to be built using either autotools or CMake. poll() was introduced 12 years ago in macOS 10.4, and very, very few people use a macOS version that old or older. Apple's XP is 10.6.

The one advantage one would get from a runtime check is, if it was built under 10.11, then it could use select() in 10.12 and poll() in 10.11 even though it wasn't built for 10.12. But then the test would have to be run every time the tool is started.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 12, 2016

Member

If we would do the check in run-time, we would also have to actually build both versions of Curl_poll() which would break new ground.

I think one of the problems with a run-time check is that it makes it more important that it is reliable, as a build-time check can always be fixed manually by for example editing the configure generated header file. Also, the current detection logic for this poll() problems requires a waiting period, which, if we would do the test in run-time would artificiality make the curl_global_init call much slower - and since that is often not invoked directly it will delay the first curl transfer for many programs. And for the curl tool, all invokes. Not an ideal situation.

Member

bagder commented Oct 12, 2016

If we would do the check in run-time, we would also have to actually build both versions of Curl_poll() which would break new ground.

I think one of the problems with a run-time check is that it makes it more important that it is reliable, as a build-time check can always be fixed manually by for example editing the configure generated header file. Also, the current detection logic for this poll() problems requires a waiting period, which, if we would do the test in run-time would artificiality make the curl_global_init call much slower - and since that is often not invoked directly it will delay the first curl transfer for many programs. And for the curl tool, all invokes. Not an ideal situation.

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Oct 13, 2016

Member

but why run the test at runtime if apple will tell you what versions are affected? can't you do a getversion at runtime and if it's osx 10.12 or whatever then use select?

Member

jay commented Oct 13, 2016

but why run the test at runtime if apple will tell you what versions are affected? can't you do a getversion at runtime and if it's osx 10.12 or whatever then use select?

@epipping

This comment has been minimized.

Show comment
Hide comment
@epipping

epipping Oct 13, 2016

Is unconditionally switching to select on macOS not an option or highly undesirable?

epipping commented Oct 13, 2016

Is unconditionally switching to select on macOS not an option or highly undesirable?

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 13, 2016

Member

can't you do a getversion at runtime and if it's osx 10.12 or whatever then use select?

Ah yes, that's right. I was thinking for the case where we also have a fixed 10.12 version out, but we don't need to that until we know it is needed or wanted.

Is unconditionally switching to select on macOS not an option or highly undesirable?

That's also an option, and a pretty attractive one. Switch to select() for all macOS builds no matter which version as then you can probably run the same library on different mac os versions without problems.

And really, using select() instead of poll() in there is a very minor difference and doesn't really have a measurable impact to anything.

Member

bagder commented Oct 13, 2016

can't you do a getversion at runtime and if it's osx 10.12 or whatever then use select?

Ah yes, that's right. I was thinking for the case where we also have a fixed 10.12 version out, but we don't need to that until we know it is needed or wanted.

Is unconditionally switching to select on macOS not an option or highly undesirable?

That's also an option, and a pretty attractive one. Switch to select() for all macOS builds no matter which version as then you can probably run the same library on different mac os versions without problems.

And really, using select() instead of poll() in there is a very minor difference and doesn't really have a measurable impact to anything.

@craig65535

This comment has been minimized.

Show comment
Hide comment
@craig65535

craig65535 Oct 18, 2016

Contributor

@bagder I just wanted to voice my support for a runtime check. We have a project that for various reasons is bundled with its own statically linked libcurl. It is distributed as a binary and may run on any version of the OS from 10.8 to 10.12. While the fix in 9297ca4 appears to work for 10.12 when the project is built on macOS 10.12, we build on OSX 10.10 so I had to disable poll entirely (curl_disallow_poll=yes). I would not have known to do that if I had not been following libcurl closely.

Contributor

craig65535 commented Oct 18, 2016

@bagder I just wanted to voice my support for a runtime check. We have a project that for various reasons is bundled with its own statically linked libcurl. It is distributed as a binary and may run on any version of the OS from 10.8 to 10.12. While the fix in 9297ca4 appears to work for 10.12 when the project is built on macOS 10.12, we build on OSX 10.10 so I had to disable poll entirely (curl_disallow_poll=yes). I would not have known to do that if I had not been following libcurl closely.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 18, 2016

Member

In my view that's just another voice for switching to select() for macOS unconditionally, as I don't think there's enough value in poll() over select() to justify a run-time check, but there is value for people to build one version of libcurl that runs on multiple macOS versions.

I propose this additional patch:

diff --git a/m4/curl-functions.m4 b/m4/curl-functions.m4
index e1a1e32..61ed1ac 100644
--- a/m4/curl-functions.m4
+++ b/m4/curl-functions.m4
@@ -4737,16 +4737,17 @@ AC_DEFUN([CURL_CHECK_FUNC_POLL], [
   tst_compi_poll="unknown"
   tst_works_poll="unknown"
   tst_allow_poll="unknown"
   #
   case $host_os in
-    darwin[[123456789]].*|darwin10.*|darwin11.*|darwin12.*|interix*)
+    darwin[[123456789]]*.*|interix*)
       dnl poll() does not work on these platforms
       dnl Interix: "does provide poll(), but the implementing developer must
       dnl have been in a bad mood, because poll() only works on the /proc
       dnl filesystem here"
       curl_disallow_poll="yes"
+      tst_compi_poll="no"
       ;;
   esac
   #
   AC_MSG_CHECKING([if poll can be linked])
   AC_LINK_IFELSE([
Member

bagder commented Oct 18, 2016

In my view that's just another voice for switching to select() for macOS unconditionally, as I don't think there's enough value in poll() over select() to justify a run-time check, but there is value for people to build one version of libcurl that runs on multiple macOS versions.

I propose this additional patch:

diff --git a/m4/curl-functions.m4 b/m4/curl-functions.m4
index e1a1e32..61ed1ac 100644
--- a/m4/curl-functions.m4
+++ b/m4/curl-functions.m4
@@ -4737,16 +4737,17 @@ AC_DEFUN([CURL_CHECK_FUNC_POLL], [
   tst_compi_poll="unknown"
   tst_works_poll="unknown"
   tst_allow_poll="unknown"
   #
   case $host_os in
-    darwin[[123456789]].*|darwin10.*|darwin11.*|darwin12.*|interix*)
+    darwin[[123456789]]*.*|interix*)
       dnl poll() does not work on these platforms
       dnl Interix: "does provide poll(), but the implementing developer must
       dnl have been in a bad mood, because poll() only works on the /proc
       dnl filesystem here"
       curl_disallow_poll="yes"
+      tst_compi_poll="no"
       ;;
   esac
   #
   AC_MSG_CHECKING([if poll can be linked])
   AC_LINK_IFELSE([
@ryandesign

This comment has been minimized.

Show comment
Hide comment
@ryandesign

ryandesign Oct 18, 2016

@badger Doesn't your proposal neglect to handle "darwin10" and any future Darwin versions that contain "0"?

ryandesign commented Oct 18, 2016

@badger Doesn't your proposal neglect to handle "darwin10" and any future Darwin versions that contain "0"?

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 18, 2016

Member

Ehm, yes it would appear so, thanks @ryandesign . It should probably just use:

darwin*|interix*)

and thus match all darwin and interix versions.

Member

bagder commented Oct 18, 2016

Ehm, yes it would appear so, thanks @ryandesign . It should probably just use:

darwin*|interix*)

and thus match all darwin and interix versions.

bagder added a commit that referenced this issue Oct 18, 2016

configure/CURL_CHECK_FUNC_POLL: disable poll completely on mac
... so that the same libcurl build easier can run on any version.

Follow-up to issue #1057
@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 18, 2016

Member

I pushed that version now to let people try it out for real.

Member

bagder commented Oct 18, 2016

I pushed that version now to let people try it out for real.

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Oct 28, 2016

Member

so is this still an issue for cmake, for example should it be

if(NOT APPLE)
check_symbol_exists(poll          "${CURL_INCLUDES}" HAVE_POLL)
endif()

also should there be some sort of failsafe in setup like

#if defined(__APPLE__) && (defined(HAVE_POLL) || defined(HAVE_POLL_FINE))
#error "failsafe: we don't allow poll on apple, build config should have detected this"
#endif

this way it doesn't get away from us as more build systems are added or say for example someone adds HAVE_POLL_FINE to cmake

Member

jay commented Oct 28, 2016

so is this still an issue for cmake, for example should it be

if(NOT APPLE)
check_symbol_exists(poll          "${CURL_INCLUDES}" HAVE_POLL)
endif()

also should there be some sort of failsafe in setup like

#if defined(__APPLE__) && (defined(HAVE_POLL) || defined(HAVE_POLL_FINE))
#error "failsafe: we don't allow poll on apple, build config should have detected this"
#endif

this way it doesn't get away from us as more build systems are added or say for example someone adds HAVE_POLL_FINE to cmake

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Oct 28, 2016

Member

The configure script still has numerous tiny tweaks that aren't present in cmake and its hard to come up with a firm stance on how to deal with them. For this particular case, I think we should file a separate issue so that we can tag it cmake and ping the most active cmake contributors rather than to keep this thread going in an already-closed issue.

Member

bagder commented Oct 28, 2016

The configure script still has numerous tiny tweaks that aren't present in cmake and its hard to come up with a firm stance on how to deal with them. For this particular case, I think we should file a separate issue so that we can tag it cmake and ping the most active cmake contributors rather than to keep this thread going in an already-closed issue.

@jimjag

This comment has been minimized.

Show comment
Hide comment
@jimjag

jimjag Jan 11, 2017

On 10.12.2, at least, poll works:

% uname -a
Darwin jimsys.local 16.3.0 Darwin Kernel Version 16.3.0: Thu Nov 17 20:23:58 PST 2016; root:xnu-3789.31.2~1/RELEASE_X86_64 x86_64
% ./a.out
poll() works

jimjag commented Jan 11, 2017

On 10.12.2, at least, poll works:

% uname -a
Darwin jimsys.local 16.3.0 Darwin Kernel Version 16.3.0: Thu Nov 17 20:23:58 PST 2016; root:xnu-3789.31.2~1/RELEASE_X86_64 x86_64
% ./a.out
poll() works

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Jan 11, 2017

Member

Thank for testing and confirming!

In curl we will stick with select() on macos for the foreseeable future so that the same binaries have a higher chance of working across versions.

Member

bagder commented Jan 11, 2017

Thank for testing and confirming!

In curl we will stick with select() on macos for the foreseeable future so that the same binaries have a higher chance of working across versions.

@gpshead

This comment has been minimized.

Show comment
Hide comment
@gpshead

gpshead May 7, 2017

For reference, this drop over the wall from Apple restored poll's behavior:
opensource-apple/xnu@0cccba1#diff-e61c2932bb9d5cea2dd0732acd8ec626R1783

No idea how to map that to exact MacOS 10.12 patch level.

gpshead commented May 7, 2017

For reference, this drop over the wall from Apple restored poll's behavior:
opensource-apple/xnu@0cccba1#diff-e61c2932bb9d5cea2dd0732acd8ec626R1783

No idea how to map that to exact MacOS 10.12 patch level.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 7, 2017

Member

Thanks! I updated my blog post about this bug with that info before. poll() is fixed again in 10.12.2. But since we want curl to work for all users on macos, we'll stick to select() for the time being...

Member

bagder commented May 7, 2017

Thanks! I updated my blog post about this bug with that info before. poll() is fixed again in 10.12.2. But since we want curl to work for all users on macos, we'll stick to select() for the time being...

@lock lock bot locked as resolved and limited conversation to collaborators May 6, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.