Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition suspected in runtime #2155

Closed
biddisco opened this issue May 12, 2016 · 12 comments
Closed

Race condition suspected in runtime #2155

biddisco opened this issue May 12, 2016 · 12 comments

Comments

@biddisco
Copy link
Contributor

biddisco commented May 12, 2016

//
//  Distributed under the Boost Software License, Version 1.0. (See accompanying
//  file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
//
// launch with --hpx:threads=8 or more
//
// OSX : set seed to 3227422469
// Linux : set seed to 2835337046
//
// if it does not crash set seed to (unsigned int)std::time(0) + std::rand()
// and be patient, it throw an exception eventually. Then retry uing the seed printed

#include <hpx/hpx_init.hpp>
#include <hpx/hpx.hpp>
#include <hpx/include/parallel_copy.hpp>
#include <hpx/util/lightweight_test.hpp>

#include <string>
#include <vector>

#include <tests/unit/parallel/algorithms/test_utils.hpp>


#define SEED 3227422469
//#define SEED 2835337046
//#define SEED (unsigned int)std::time(0) + std::rand()

//#define PREFIX_SCAN 1

#ifdef PREFIX_SCAN
# include <hpx/parallel/algorithms/prefix_copy_if.hpp>
# define COPY_X prefix_copy_if
#else
# define COPY_X copy_if
#endif


////////////////////////////////////////////////////////////////////////////
template <typename ExPolicy>
void test_copy_if(ExPolicy policy)
{
    static_assert(
        hpx::parallel::is_execution_policy<ExPolicy>::value,
        "hpx::parallel::is_execution_policy<ExPolicy>::value");

    typedef std::vector<int>::iterator base_iterator;
    typedef test::test_iterator<base_iterator, std::forward_iterator_tag> iterator;

    std::vector<int> c(10007);
    std::vector<int> d(c.size());
    auto middle = boost::begin(c) + c.size()/2;
    std::iota(boost::begin(c), middle, std::rand());
    std::fill(middle, boost::end(c), -1);

    hpx::parallel::COPY_X(policy,
        iterator(boost::begin(c)), iterator(boost::end(c)),
        boost::begin(d), [](int i){ return !(i < 0); });

    std::size_t count = 0;
    HPX_TEST(std::equal(boost::begin(c), middle, boost::begin(d),
        [&count](int v1, int v2) -> bool {
            HPX_TEST_EQ(v1, v2);
            ++count;
            if (v1!=v2)
              throw std::string("help");
            return v1 == v2;
        }));

    HPX_TEST(std::equal(middle, boost::end(c),
        boost::begin(d) + d.size()/2,
        [&count](int v1, int v2) -> bool {
            HPX_TEST_NEQ(v1, v2);
            ++count;
            if (v1==v2)
              throw std::string("help");
            return v1!=v2;
    }));

    HPX_TEST_EQ(count, d.size());
}

int hpx_main(boost::program_options::variables_map& vm)
{
    hpx::util::high_resolution_timer t;
    do {
      unsigned int seed = SEED;
      std::cout << "using seed: " << seed << std::endl;
      std::srand(seed);
      test_copy_if(hpx::parallel::par);
    } while (t.elapsed()<300);

    return hpx::finalize();
}

int main(int argc, char* argv[])
{
    // add command line option which controls the random number generator seed
    using namespace boost::program_options;
    options_description desc_commandline(
        "Usage: " HPX_APPLICATION_STRING " [options]");

    desc_commandline.add_options()
        ("seed,s", value<unsigned int>(),
        "the random number generator seed to use for this run")
        ;

    // By default this test should run on all available cores
    std::vector<std::string> cfg;
    cfg.push_back("hpx.os_threads=" +
        std::to_string(hpx::threads::hardware_concurrency()));

    // Initialize and run HPX
    HPX_TEST_EQ_MSG(hpx::init(desc_commandline, argc, argv, cfg), 0,
        "HPX main exited with non-zero status");

    return hpx::util::report_errors();
}
@biddisco
Copy link
Contributor Author

The above code snippet (assuming it does not contain errors) reproduces for me reliably on two machines OSX/Linux an error in the copy_if calculation, which I believe is caused by a race condition somewhere in hpx.

@biddisco
Copy link
Contributor Author

biddisco commented May 12, 2016

Note : this is a very high priority for me because when I run the benchmark suite for vtkm, it tests thousands of iterations of algorithms and during these tests I get random segfaults/errors that I cannot explain in terms of faulty algorithms. The HPX unit tests usually only test a small number of iterations and do not show any signs of this error.

edit : git commit 483b8a9 is used as the base for the test

edit2: the fact that a particular seed triggers an error might mean the algorithm is broken somewhere rather than a race condition?

@biddisco
Copy link
Contributor Author

This is nonsense. It gives an error even with seq policy and hpx:threads=1 I must have made a mistake in the checks

@biddisco
Copy link
Contributor Author

The error is
/Users/biddisco/src/hvtkm/testing/copyif.cpp(62): test 'v1 == v2' failed in function 'auto test_copy_if(hpx::parallel::v1::sequential_execution_policy)::(anonymous class)::operator()(int, int) const': '-2147483648' != '0'

which is actually 0xFFFFFFFF80000000 - some sort of overflow is taking place

@AntonBikineev
Copy link
Contributor

AntonBikineev commented May 12, 2016

@biddisco the error is happening when you iterate over source vector with iota and get signed overrflow (actually, undefined behaviour)

@biddisco
Copy link
Contributor Author

so a static_cast<> should fix it....

@AntonBikineev
Copy link
Contributor

AntonBikineev commented May 12, 2016

@biddisco: the fix might be to do something like

    std::iota(boost::begin(c), middle, std::abs(std::rand() - c.size()));

:)

@zao
Copy link
Contributor

zao commented May 12, 2016

As a side note, MSVC RAND_MAX is criminally low and masks the problem in this snippet, I reckon.

@biddisco
Copy link
Contributor Author

biddisco commented May 12, 2016

@AntonBikineev I put int
std::iota(boost::begin(c), middle, std::rand()%65535 );
and the error goes away. This is deeply troubling as it means the error is a trivial buffer fill and does not help me solve the other (presumably unrelated) segfaults I have been getting.
Thanks for spotting the problem. I don;t see why the rand() isn't converted correctly to an int by std::iota, but at least I can stop wasting time on this.

@AntonBikineev
Copy link
Contributor

AntonBikineev commented May 12, 2016

@biddisco are you still getting segfaults? I've tried the code on OS X and Linux with --hpx:threads=8 and couldn't reproduce.

@biddisco
Copy link
Contributor Author

I used the iota fix above and the errors are gone. The segfaults are in my main benchmarking code and I tried to reproduce them by making a cut-down test based on the copy_if test. When I got errors with the copy_if test I thought 'aha' this might be a symptom of the same underlying problem (my code calls copy_if etc etc) and it seems like I was chasing a red-herring.
I've restarted my benchmarks and will see if they complete without error. (they won't).

@biddisco
Copy link
Contributor Author

Closing this issue because although I still have suspicions about races in the runtime, the test above was not relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants