pthread_cond_timedwait error with EINVAL on macos #4

skywhale · 2020-12-10T02:51:05Z

This happens when I run tests::bench_ipmpsc.

$ cd ipc-benchmarks
$ RUST_BACKTRACE=1 cargo bench tests::bench_ipmpsc -- --nocapture
...
running 1 test
thread 'main' panicked at 'error receiving: Runtime("timeout_ok(libc::pthread_cond_timedwait(self.0.condition.get(),\n                                        self.0.mutex.get(), &then)) failed: 22")', src/lib.rs:107:27
stack backtrace:
   0: rust_begin_unwind
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/std/src/panicking.rs:493:5
   1: std::panicking::begin_panic_fmt
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/std/src/panicking.rs:435:5
   2: test::bench::ns_iter_inner
   3: test::bench::Bencher::iter
   4: ipc_benchmarks::tests::bench_ipmpsc
   5: core::ops::function::FnOnce::call_once
   6: core::ops::function::FnMut::call_mut
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/core/src/ops/function.rs:150:5
   7: test::bench::Bencher::bench
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/test/src/bench.rs:45:9
   8: test::bench::benchmark::{{closure}}
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/test/src/bench.rs:192:51
   9: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/std/src/panic.rs:322:9
  10: std::panicking::try::do_call
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/std/src/panicking.rs:379:40
  11: std::panicking::try
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/std/src/panicking.rs:343:19
  12: std::panic::catch_unwind
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/std/src/panic.rs:396:14
  13: test::bench::benchmark
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/test/src/bench.rs:192:18
  14: test::run_test
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/test/src/lib.rs:489:13
  15: test::run_tests
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/test/src/lib.rs:339:13
  16: test::console::run_tests_console
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/test/src/console.rs:289:5
  17: test::test_main
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/test/src/lib.rs:120:15
  18: test::test_main_static
             at /rustc/1700ca07c6dd7becff85678409a5df6ad4cf4f47/library/test/src/lib.rs:139:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test tests::bench_ipmpsc            ... FAILED

The error originates at

ipmpsc/src/lib.rs

Lines 160 to 164 in 2b4357e

    
           nonzero!(timeout_ok(libc::pthread_cond_timedwait( 
        
               self.0.condition.get(), 
        
               self.0.mutex.get(), 
        
               &then 
        
           )))

A quick search suggests that EINVAL is returned when the tv_nsec is invalid, but it doesn't seem to be the case. I added a println right before the call, and I got:

tv_sec=1922927639
tv_nsec=612420000

Note the error happens non-deterministically. Some iterations succeed. The error does not happen on a Ubuntu machine with the same version of rustc.

Environment:
OS: macOS Catalina 10.15.7 (19H15)
rustc: nightly-2020-12-09-x86_64-apple-darwin

The text was updated successfully, but these errors were encountered:

skywhale · 2020-12-10T02:56:11Z

@dicej I tried to investigate further, but I don't have a clue. I changed the code to give a constant abstime and it still happens. I'm more than happy to help investigate the issue if you could help shed a bit more light.

dicej · 2020-12-15T16:34:13Z

@skywhale Thanks so much for reporting this, and sorry for the delayed response. I'll take a look at this when I get a chance, probably this weekend.

dicej · 2020-12-19T18:50:14Z

I've refactored the code a bit so it's using pthread_cond_wait instead of pthread_cond_timedwait, so that at least eliminates the timespec as a factor. However, it's still failing occasionally:

thread 'main' panicked at 'error receiving: Runtime("libc::pthread_cond_wait(self.0.header().condition.get(),\n                        self.0.header().mutex.get()) failed: 22")', src/lib.rs:106:27

Seems like it doesn't like either the condition or the mutex (or both). And it's a race condition, because sometimes it runs to completion without any error. Will continue studying it.

dicej · 2020-12-19T20:03:10Z

I've also noticed that tests::aribitrary_case in the main test suite frequently deadlocks forever (with all threads waiting forever and never being notified). I'm guessing that's related even if the symptom is different.

Here's my current theory: unlike on Linux, Android, and Windows, using the same interprocess mutex and/or condition variable from different memory mappings within the same process is not safe on MacOS (and maybe not on the BSDs in general). That's what the tests and benchmarks do: spawn separate threads for each receiver and sender as if they were other processes. I'm guessing there's something in the MacOS/BSD pthread implementation that assumes PTHREAD_PROCESS_SHARED mutexes and condition variables are not aliased in different memory locations within the same process, and although it usually works anyway, it doesn't always.

When I change the code to make the senders and receiver all use the same memory mapping, everything seems to work reliably. That defeats the purpose of the tests, though, since there's no way that can happen when using separate processes.

Next, I'm going to change tests to actually spawn separate processes instead of separate threads and see what happens.

dicej · 2020-12-23T05:30:21Z

Update: I went ahead and modified the tests and benchmarks to fork processes instead of spawning threads. Unfortunately, that didn't address the issue -- pthread_cond_wait still returns EINVAL unpredictably.

I don't see any way forward from here, unfortunately. I'll update the README to indicate that MacOS is not currently supported until someone figures out how to fix this.

dicej · 2020-12-23T05:57:50Z

This comment indicates that PTHREAD_PROCESS_SHARED has been broken in MacOS since Lion: bitcoin/bitcoin#19411 (comment).

skywhale · 2020-12-28T02:33:15Z

Thank you so much for looking into this @dicej. It's unfortunate that macOS Posix compliance is partial.

I did a bit more research, and it seems pthread shared-process read/write lock has never been supported on macOS. Their pthread implementation seems to be based on BSD 7.4, which lacks the support of PTHREAD_PROCESS_SHARED attribute as stated in the BUGS section (due to the use of pointers in their structs). It was resolved in BSD 11.0, but it hasn't been picked up by macOS sdk yet.

remifontan · 2022-06-30T21:19:15Z

Hi,
out of curiosity I tried the repro-steps on my mac and it is still failing with the same error - MacOs X 12.4.

I went down the internet rabbit hole and searched about similar pthread errors and see if there are alternatives.
ipc-channel seems to rely on mach ports to pass data from one process to another one.
Would it be possible to use mach port to implement an IP-Mutex? I am not familiar with this, so this may be a silly qestion.

Interestingly, boost has an interprocess_mutex class, which seems to work with MacOs.

https://github.com/dials/boost/blob/master/boost/interprocess/sync/interprocess_mutex.hpp

However, digging a bit, it seems that on Mac, the implementation falls back to using a spin mutex.

so... perhaps the same thing should be done in ipmpsc?

dicej · 2022-06-30T22:40:01Z

Yes, I was wondering about some Mach-specific way to do locking, or else using named pipes just for locking (but still using the ring buffer for actually moving data around).

The spin lock approach could also work. We'd either need to port the Boost code to Rust or create a Rust wrapper for a C++ library that exports the Boost functionality using a C API Rust can talk to. The latter would be easiest, I imagine. Ideally this would be its own crate so others could reuse it.

Note that ipmpsc needs both a mutex and a condition variable implementation; Boost appears to support both of those in its interprocess spin lock suite.

I don't expect I'll have time to work on this myself any time soon, but I'd be happy to review a PR.

remifontan · 2022-07-02T04:43:05Z

I'll see what I can do... no promises :-)

Dudemanguy mentioned this issue Sep 17, 2023

vo: change vsync base to nanoseconds mpv-player/mpv#12371

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pthread_cond_timedwait error with EINVAL on macos #4

pthread_cond_timedwait error with EINVAL on macos #4

skywhale commented Dec 10, 2020

skywhale commented Dec 10, 2020 •

edited

Loading

dicej commented Dec 15, 2020

dicej commented Dec 19, 2020

dicej commented Dec 19, 2020

dicej commented Dec 23, 2020 •

edited

Loading

dicej commented Dec 23, 2020

skywhale commented Dec 28, 2020 •

edited

Loading

remifontan commented Jun 30, 2022 •

edited

Loading

dicej commented Jun 30, 2022

remifontan commented Jul 2, 2022

pthread_cond_timedwait error with EINVAL on macos #4

pthread_cond_timedwait error with EINVAL on macos #4

Comments

skywhale commented Dec 10, 2020

skywhale commented Dec 10, 2020 • edited Loading

dicej commented Dec 15, 2020

dicej commented Dec 19, 2020

dicej commented Dec 19, 2020

dicej commented Dec 23, 2020 • edited Loading

dicej commented Dec 23, 2020

skywhale commented Dec 28, 2020 • edited Loading

remifontan commented Jun 30, 2022 • edited Loading

dicej commented Jun 30, 2022

remifontan commented Jul 2, 2022

skywhale commented Dec 10, 2020 •

edited

Loading

dicej commented Dec 23, 2020 •

edited

Loading

skywhale commented Dec 28, 2020 •

edited

Loading

remifontan commented Jun 30, 2022 •

edited

Loading