Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race in sliding semaphore #2338

Closed
biddisco opened this issue Sep 21, 2016 · 2 comments
Closed

Possible race in sliding semaphore #2338

biddisco opened this issue Sep 21, 2016 · 2 comments

Comments

@biddisco
Copy link
Contributor

Whilst experimenting with the osu_latency test using a sliding semaphore I found a problem that looks like a race in sliding_semaphore. The attached code snippet should reproduce a deadlock when run on >1 localities and >1 threads.

It looks as though the tread that calls wait on the semaphore, takes the lock and goes into a wait, but the signal to awake the thread comes in before the wait thread has actually properly begun waiting - so the notify_one call does not wake it and then it locks (because only one thread is processing actions, with a window_size>1 the problem goes away as another thread can signal a new lower_value and things awake properly).

This is only my guesswork from testing today ...

#include <hpx/hpx_main.hpp>
#include <hpx/include/lcos.hpp>
#include <hpx/include/actions.hpp>
#include <hpx/lcos/local/detail/sliding_semaphore.hpp>

#include <iostream>
// -----------------------------------------------------------------------------------
double message_double(double d)
{
    return d;
}
HPX_PLAIN_ACTION(message_double);

// -----------------------------------------------------------------------------------
int main()
{
    // use the first remote locality to bounce messages, if possible
    hpx::id_type here = hpx::find_here();

    hpx::id_type there = here;
    std::vector<hpx::id_type> localities = hpx::find_remote_localities();
    if (!localities.empty())
        there = localities[0];

    std::size_t parcel_count = 0;
    std::size_t loop         = 10000;
    std::size_t window_size  = 1;
    std::size_t skip         = 50;

    hpx::lcos::local::sliding_semaphore sem(window_size,0);
    message_double_action msg;
    //
    //
    for (std::size_t i = 0; i < (loop*window_size) + skip; ++i) {
        // launch a message to the remote node
        hpx::async(msg, there, 3.5).then(
            hpx::launch::sync,
            // when the message completes, increment our semaphore count
            // so that N are always in flight
            [&,parcel_count](auto &&f) -> void {
                sem.signal(parcel_count);
                std::cout << "Signalled with value " << parcel_count << std::endl;;
            }
        );

        //
        parcel_count++;

        //
        std::cout << "Waiting with value " << parcel_count << std::endl;
        sem.wait(parcel_count);
    }

    // wait on the last message, otherwise semaphore throws an exception
    // because it is signalled, but nobody is waiting on it
    std::cout << "Waiting for final signal before exit pc is " << parcel_count <<
        " wait is " << parcel_count+window_size-1 << "\n";
    sem.wait(parcel_count + window_size - 1);
    std::cout << "Finished Waiting for final signal before exit \n";

    return 0;
}
@hkaiser
Copy link
Member

hkaiser commented Sep 22, 2016

@biddisco Could you try the fixing_2338 branch if it fixes your issues?

@hkaiser
Copy link
Member

hkaiser commented Sep 22, 2016

This has been fixed by merging #2339

@hkaiser hkaiser closed this as completed Sep 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants