This is simple test application to reproduce the deadlock in pthread_cond_singal sporadically observed in the project I'm working on. It was the boost::asio::detail::conditionally_enabled_event where the problem originally manifested itself in the project, so the test tries to mimic the boost::asio in working with the condvar.
In the presence of a high contention on a condition variable all signalling threads (producers) eventually get blocked on a futex call inside pthread_cond_signal. An attempt to take the process dump with a debugger or Google Breakpad revives the paralyzed process and the producers threads resume spontaneously without any other internal/external interference in the system. Individual producer threads can be also woken up by sending asynchronous signal (like SIGSTOP/SIGCONT), however the process returns to the paralyzed state very quickly. The problem is reproduced regularly but it may take from several minutes to several hours to observe it.
- backtraces in odroid-n2.stacks
- compact view of the call graph in odroid-n2.pdf
The test allocates threads of 2 types: producer and consumer. Producer changes the shared state and signals the condvar. Consumer waits for the shared state change on the condvar. The number of consumer threads is fixed and corresponds to the number of available CPU cores. The number of the producer threads is specified via the first argument as a number threads per single CPU core. The second and the third arguments specify the producer and consumer rate.
The most tricky part is to pick up the "right" values for the arguments. The higher is the contention the more the chances are to reproduce the issue. For example,
it took several hours on ODROID N2 to reproduce the issue with odroid-n2.sh.
On a one many-core server with the command line
./test_pthread_cond_singal_deadlock.aarch64-linux-gnu 10 500 1000
I was able to reproduce the issue in a less than 10 minutes.
- reduce the contention
- ptrace the process when the deadlock situation is observed
- call pthread_cond_signal from under the lock
Seen on ODROID N2 and other aarch64-based devices/servers.
Ubuntu 18.04
- 4.9.187
- 4.15.0
- 4.19.114
2.27