You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whilst experimenting with the osu_latency test using a sliding semaphore I found a problem that looks like a race in sliding_semaphore. The attached code snippet should reproduce a deadlock when run on >1 localities and >1 threads.
It looks as though the tread that calls wait on the semaphore, takes the lock and goes into a wait, but the signal to awake the thread comes in before the wait thread has actually properly begun waiting - so the notify_one call does not wake it and then it locks (because only one thread is processing actions, with a window_size>1 the problem goes away as another thread can signal a new lower_value and things awake properly).
This is only my guesswork from testing today ...
#include <hpx/hpx_main.hpp>
#include <hpx/include/lcos.hpp>
#include <hpx/include/actions.hpp>
#include <hpx/lcos/local/detail/sliding_semaphore.hpp>
#include <iostream>
// -----------------------------------------------------------------------------------
double message_double(double d)
{
return d;
}
HPX_PLAIN_ACTION(message_double);
// -----------------------------------------------------------------------------------
int main()
{
// use the first remote locality to bounce messages, if possible
hpx::id_type here = hpx::find_here();
hpx::id_type there = here;
std::vector<hpx::id_type> localities = hpx::find_remote_localities();
if (!localities.empty())
there = localities[0];
std::size_t parcel_count = 0;
std::size_t loop = 10000;
std::size_t window_size = 1;
std::size_t skip = 50;
hpx::lcos::local::sliding_semaphore sem(window_size,0);
message_double_action msg;
//
//
for (std::size_t i = 0; i < (loop*window_size) + skip; ++i) {
// launch a message to the remote node
hpx::async(msg, there, 3.5).then(
hpx::launch::sync,
// when the message completes, increment our semaphore count
// so that N are always in flight
[&,parcel_count](auto &&f) -> void {
sem.signal(parcel_count);
std::cout << "Signalled with value " << parcel_count << std::endl;;
}
);
//
parcel_count++;
//
std::cout << "Waiting with value " << parcel_count << std::endl;
sem.wait(parcel_count);
}
// wait on the last message, otherwise semaphore throws an exception
// because it is signalled, but nobody is waiting on it
std::cout << "Waiting for final signal before exit pc is " << parcel_count <<
" wait is " << parcel_count+window_size-1 << "\n";
sem.wait(parcel_count + window_size - 1);
std::cout << "Finished Waiting for final signal before exit \n";
return 0;
}
The text was updated successfully, but these errors were encountered:
Whilst experimenting with the osu_latency test using a sliding semaphore I found a problem that looks like a race in sliding_semaphore. The attached code snippet should reproduce a deadlock when run on >1 localities and >1 threads.
It looks as though the tread that calls wait on the semaphore, takes the lock and goes into a wait, but the signal to awake the thread comes in before the wait thread has actually properly begun waiting - so the notify_one call does not wake it and then it locks (because only one thread is processing actions, with a window_size>1 the problem goes away as another thread can signal a new lower_value and things awake properly).
This is only my guesswork from testing today ...
The text was updated successfully, but these errors were encountered: