Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in regression tests #899

Closed
jlippuner opened this issue Sep 28, 2013 · 6 comments
Closed

Fix race conditions in regression tests #899

jlippuner opened this issue Sep 28, 2013 · 6 comments

Comments

@jlippuner
Copy link

I have seen several times now (using different versions of the code) that the tests are not consistent.

For example, I just compiled 93c88b4b77581452530c03fd2bcfd775cb0d5ea0 with GCC 4.8.1 and the MPI parcel port.

So I did make and then I did make -j 8 tests and got

The following tests FAILED:
          4 - tests.regressions.actions.plain_action_dataflow_move_semantics (Failed)

Then I did make tests and got

The following tests FAILED:
          4 - tests.regressions.actions.plain_action_dataflow_move_semantics (Failed)
         30 - tests.unit.agas.local_address_rebind (Failed)

Then I did make -j 8 tests and all tests passed.

Then I did make -j 3 tests and all tests passed again.

No other instances of HPX were running while I was running these tests and I didn't do anything else in between running the tests (except compile HPX in a different ssh session in a different build directory).

Has anybody else observed a behavior like this? This was a release version and the failed tests did not output any useful information about why they failed.

@ghost ghost assigned hkaiser Sep 28, 2013
@brycelelbach
Copy link
Member

This happens due to parallelism-related bugs. Many of the tests are designed to uncover pathological race conditions. Seeing some tests fail/succeed intermittently is, in a way, good, because it indicates that the tests are revealing the sort of race conditions that they are designed to reveal.

make -j N tests does not affect the number of cores that the tests will be run on: the number of cores and localities to use is specified for each test (ideally it would specify a fraction of the total available cores to use).

@hkaiser
Copy link
Member

hkaiser commented Oct 5, 2013

Well, I think we should leave that open to remind us to fix those race conditions in the first place.

@hkaiser hkaiser reopened this Oct 5, 2013
@brycelelbach
Copy link
Member

I'd rather open up specific tickets for specific failures...

@jlippuner
Copy link
Author

I have noticed inconsistent behavior in the following tests (not a complete list) in 0.9.7:
tests.regressions.lcos.future_hang_on_get_629
tests.unit.threads.thread
tests.regressions.lcos.after_588

@hkaiser
Copy link
Member

hkaiser commented Nov 15, 2013

Yes, the first two seem to be genuine race conditions most likely in the tests themselves.

The last one is known to fail and is just one way of how a particular problem manifests itself (which is well understood by know, but we have no solution found yet) - see #987. The issues #993 and #1007 are probably related, and we know that #1010 and #1014 have to be fixed for this to be solved eventually.

Thanks for sharing this information, though!

@hkaiser hkaiser closed this as completed Mar 25, 2014
@hkaiser hkaiser reopened this Mar 25, 2014
@hkaiser hkaiser modified the milestones: 0.9.9, 0.9.8 Mar 25, 2014
@hkaiser
Copy link
Member

hkaiser commented Oct 30, 2014

We have not seen these effects for a long time. Ill go ahead and close this ticket. Please re-open if the problem persists.

@hkaiser hkaiser closed this as completed Oct 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants