New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multithread CScheduler and reenable test #8016

Merged
merged 2 commits into from May 10, 2016

Conversation

Projects
None yet
5 participants
@paveljanik
Contributor

paveljanik commented May 6, 2016

This fixes the deadlock in the CScheduler when there are other serviceQueues waiting for the new task to be added to the taskQueue, but schedule notifies only one of them who processes it and ends. And the other ones get stucked waiting for the new task which doesn't come in at all.

Hard to reproduce, see #6540 for the discussion.

Fixes #6540.

@MarcoFalke MarcoFalke added the Tests label May 6, 2016

@fanquake

This comment has been minimized.

Show comment
Hide comment
@fanquake

fanquake May 6, 2016

Member

Will also close #8005

Member

fanquake commented May 6, 2016

Will also close #8005

@MarcoFalke

This comment has been minimized.

Show comment
Hide comment
@MarcoFalke

MarcoFalke May 6, 2016

Member

Tested ACK

28655b8:

$ for i in {1..10000}; do timeout 1 src/test/test_bitcoin --run_test=scheduler_tests &> /dev/null ;echo $? >> /tmp/res ;done;cat /tmp/res|sort|uniq -c
   9630 0
    370 124

70af981:

$ rm /tmp/res; for i in {1..10000}; do timeout 1 src/test/test_bitcoin --run_test=scheduler_tests &> /dev/null ;echo $? >> /tmp/res ;done;cat /tmp/res|sort|uniq -c
  10000 0

Travis issue unrelated.

Member

MarcoFalke commented May 6, 2016

Tested ACK

28655b8:

$ for i in {1..10000}; do timeout 1 src/test/test_bitcoin --run_test=scheduler_tests &> /dev/null ;echo $? >> /tmp/res ;done;cat /tmp/res|sort|uniq -c
   9630 0
    370 124

70af981:

$ rm /tmp/res; for i in {1..10000}; do timeout 1 src/test/test_bitcoin --run_test=scheduler_tests &> /dev/null ;echo $? >> /tmp/res ;done;cat /tmp/res|sort|uniq -c
  10000 0

Travis issue unrelated.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 6, 2016

Member

Thanks for fixing this!

Member

laanwj commented May 6, 2016

Thanks for fixing this!

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 6, 2016

Member

For some reason Travis is failing on this due to python zmq problems. I don't understand why, as there is no Python (or zmq) related change here. Will try to clear Travis' caches.

Member

laanwj commented May 6, 2016

For some reason Travis is failing on this due to python zmq problems. I don't understand why, as there is no Python (or zmq) related change here. Will try to clear Travis' caches.

@paveljanik

This comment has been minimized.

Show comment
Hide comment
@paveljanik

paveljanik May 6, 2016

Contributor

This is because my branch is based on the pre-python3 merge. Master is python3, thus there is no zmq module for python2...

Contributor

paveljanik commented May 6, 2016

This is because my branch is based on the pre-python3 merge. Master is python3, thus there is no zmq module for python2...

@paveljanik

This comment has been minimized.

Show comment
Hide comment
@paveljanik

paveljanik May 6, 2016

Contributor

Rebased should solve this, I think. Maybe travis kick can help now.

Contributor

paveljanik commented May 6, 2016

Rebased should solve this, I think. Maybe travis kick can help now.

@paveljanik

This comment has been minimized.

Show comment
Hide comment
@paveljanik

paveljanik May 6, 2016

Contributor

This helped. Hmm, shouldn't we temporary add python-zmq back to travis to workaround this?

Or should we modify the test script that requires python's zmq module to dot no anything when there is no such module installed (this could also simplify the tests setup for everyone!)? @MarcoFalke what do you think?

Contributor

paveljanik commented May 6, 2016

This helped. Hmm, shouldn't we temporary add python-zmq back to travis to workaround this?

Or should we modify the test script that requires python's zmq module to dot no anything when there is no such module installed (this could also simplify the tests setup for everyone!)? @MarcoFalke what do you think?

@MarcoFalke

This comment has been minimized.

Show comment
Hide comment
@MarcoFalke

MarcoFalke May 7, 2016

Member

@paveljanik python-zmq is already present in .travis.yml, you can't add it more than once ;)

I changed it (#7851) to fail instead, so errors are detected (and not silently ignored). The travis failure was an error due to broken cache. I think @laanwj cleared the cache and the travis issue is now fixed.

Member

MarcoFalke commented May 7, 2016

@paveljanik python-zmq is already present in .travis.yml, you can't add it more than once ;)

I changed it (#7851) to fail instead, so errors are detected (and not silently ignored). The travis failure was an error due to broken cache. I think @laanwj cleared the cache and the travis issue is now fixed.

@MarcoFalke

This comment has been minimized.

Show comment
Hide comment
@MarcoFalke

MarcoFalke May 7, 2016

Member

ut re-ACK 166e4b0

Member

MarcoFalke commented May 7, 2016

ut re-ACK 166e4b0

@paveljanik

This comment has been minimized.

Show comment
Hide comment
@paveljanik

paveljanik May 7, 2016

Contributor

@MarcoFalke there is python3-zmq, not python-zmq...

Contributor

paveljanik commented May 7, 2016

@MarcoFalke there is python3-zmq, not python-zmq...

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni May 7, 2016

Member

Does it need to signal in the catch/rethrow case as well?

Member

theuni commented May 7, 2016

Does it need to signal in the catch/rethrow case as well?

@paveljanik

This comment has been minimized.

Show comment
Hide comment
@paveljanik

paveljanik May 8, 2016

Contributor

@theuni I think rethrow case is there only for f throwing some exception. It is not a case for our multithread test. But in general, yes, I think there could be the same problem for exception throwing f.

Contributor

paveljanik commented May 8, 2016

@theuni I think rethrow case is there only for f throwing some exception. It is not a case for our multithread test. But in general, yes, I think there could be the same problem for exception throwing f.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 10, 2016

Member

@paveljanik Are you going to fix that here?

Member

laanwj commented May 10, 2016

@paveljanik Are you going to fix that here?

@paveljanik

This comment has been minimized.

Show comment
Hide comment
@paveljanik

paveljanik May 10, 2016

Contributor

No. I'm think about writing another test that will use the exception calls, but not here and not now, sorry.

Contributor

paveljanik commented May 10, 2016

No. I'm think about writing another test that will use the exception calls, but not here and not now, sorry.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 10, 2016

Member

No need to be sorry, thanks for your fix! just needed clarity about which state this pull is in.

utACK 166e4b0

Member

laanwj commented May 10, 2016

No need to be sorry, thanks for your fix! just needed clarity about which state this pull is in.

utACK 166e4b0

@laanwj laanwj merged commit 166e4b0 into bitcoin:master May 10, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request May 10, 2016

Merge #8016: Fix multithread CScheduler and reenable test
166e4b0 Notify other serviceQueue thread we are finished to prevent deadlocks. (Pavel Janík)
db18ab2 Reenable multithread scheduler test. (Pavel Janík)

MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this pull request Jun 9, 2016

nomnombtc added a commit to nomnombtc/bitcoin that referenced this pull request Nov 12, 2016

nomnombtc added a commit to nomnombtc/bitcoin that referenced this pull request Nov 12, 2016

nomnombtc added a commit to nomnombtc/bitcoin that referenced this pull request Nov 13, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment