Fix race conditions in regression tests #899

jlippuner · 2013-09-28T00:28:48Z

I have seen several times now (using different versions of the code) that the tests are not consistent.

For example, I just compiled 93c88b4b77581452530c03fd2bcfd775cb0d5ea0 with GCC 4.8.1 and the MPI parcel port.

So I did make and then I did make -j 8 tests and got

The following tests FAILED:
          4 - tests.regressions.actions.plain_action_dataflow_move_semantics (Failed)

Then I did make tests and got

The following tests FAILED:
          4 - tests.regressions.actions.plain_action_dataflow_move_semantics (Failed)
         30 - tests.unit.agas.local_address_rebind (Failed)

Then I did make -j 8 tests and all tests passed.

Then I did make -j 3 tests and all tests passed again.

No other instances of HPX were running while I was running these tests and I didn't do anything else in between running the tests (except compile HPX in a different ssh session in a different build directory).

Has anybody else observed a behavior like this? This was a release version and the failed tests did not output any useful information about why they failed.

The text was updated successfully, but these errors were encountered:

brycelelbach · 2013-10-05T01:37:42Z

This happens due to parallelism-related bugs. Many of the tests are designed to uncover pathological race conditions. Seeing some tests fail/succeed intermittently is, in a way, good, because it indicates that the tests are revealing the sort of race conditions that they are designed to reveal.

make -j N tests does not affect the number of cores that the tests will be run on: the number of cores and localities to use is specified for each test (ideally it would specify a fraction of the total available cores to use).

hkaiser · 2013-10-05T13:16:59Z

Well, I think we should leave that open to remind us to fix those race conditions in the first place.

brycelelbach · 2013-10-09T17:17:51Z

I'd rather open up specific tickets for specific failures...

jlippuner · 2013-11-15T20:50:19Z

I have noticed inconsistent behavior in the following tests (not a complete list) in 0.9.7:
tests.regressions.lcos.future_hang_on_get_629
tests.unit.threads.thread
tests.regressions.lcos.after_588

hkaiser · 2013-11-15T22:08:53Z

Yes, the first two seem to be genuine race conditions most likely in the tests themselves.

The last one is known to fail and is just one way of how a particular problem manifests itself (which is well understood by know, but we have no solution found yet) - see #987. The issues #993 and #1007 are probably related, and we know that #1010 and #1014 have to be fixed for this to be solved eventually.

Thanks for sharing this information, though!

hkaiser · 2014-10-30T15:40:57Z

We have not seen these effects for a long time. Ill go ahead and close this ticket. Please re-open if the problem persists.

ghost assigned hkaiser Sep 28, 2013

brycelelbach closed this as completed Oct 5, 2013

hkaiser reopened this Oct 5, 2013

hkaiser closed this as completed Mar 25, 2014

hkaiser reopened this Mar 25, 2014

hkaiser modified the milestones: 0.9.9, 0.9.8 Mar 25, 2014

hkaiser closed this as completed Oct 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race conditions in regression tests #899

Fix race conditions in regression tests #899

jlippuner commented Sep 28, 2013

brycelelbach commented Oct 5, 2013

hkaiser commented Oct 5, 2013

brycelelbach commented Oct 9, 2013

jlippuner commented Nov 15, 2013

hkaiser commented Nov 15, 2013

hkaiser commented Oct 30, 2014

Fix race conditions in regression tests #899

Fix race conditions in regression tests #899

Comments

jlippuner commented Sep 28, 2013

brycelelbach commented Oct 5, 2013

hkaiser commented Oct 5, 2013

brycelelbach commented Oct 9, 2013

jlippuner commented Nov 15, 2013

hkaiser commented Nov 15, 2013

hkaiser commented Oct 30, 2014