RFC: Assert on probable deadlocks if the second lock isnt try_lock #5515

Merged
merged 1 commit into from Jul 23, 2015

Conversation

Projects
None yet
4 participants
@TheBlueMatt
Contributor

TheBlueMatt commented Dec 20, 2014

Just for travis

@TheBlueMatt TheBlueMatt changed the title from TRAVIS_TEST: Assert on probable deadlocks if the second lock isnt try_lock to RFC: Assert on probable deadlocks if the second lock isnt try_lock Dec 20, 2014

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Dec 20, 2014

Contributor

Since travis passes, I figured we might actually consider this...essentially, it asserts that we dont have deadlocks assuming: (a) no locking-conditions are based on data other than the current set of locked locks and (b) if we do a TRY_LOCK, and it fails, we will immediately bail out and just skip that section of code.
Since DEBUG_LOCKORDER is pretty much travis-only, if it might crash in cases that regular bitcoind might not, I dont really mind.

Contributor

TheBlueMatt commented Dec 20, 2014

Since travis passes, I figured we might actually consider this...essentially, it asserts that we dont have deadlocks assuming: (a) no locking-conditions are based on data other than the current set of locked locks and (b) if we do a TRY_LOCK, and it fails, we will immediately bail out and just skip that section of code.
Since DEBUG_LOCKORDER is pretty much travis-only, if it might crash in cases that regular bitcoind might not, I dont really mind.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Dec 20, 2014

Contributor

Mostly, the DEBUG_LOCKORDER warnings arent even shown by travis anymore (they're just hidden behind the debug.log, so the only thing that flag does is check the lock assertions), so I figure we should have some semi-automated screeching if common-path code has lock inversions.

Contributor

TheBlueMatt commented Dec 20, 2014

Mostly, the DEBUG_LOCKORDER warnings arent even shown by travis anymore (they're just hidden behind the debug.log, so the only thing that flag does is check the lock assertions), so I figure we should have some semi-automated screeching if common-path code has lock inversions.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Apr 1, 2015

Member

Concept ACK, but this makes the deadlock detection code kind of complex. I can't immediately see whether it's correct or wrong.

Member

laanwj commented Apr 1, 2015

Concept ACK, but this makes the deadlock detection code kind of complex. I can't immediately see whether it's correct or wrong.

@gavinandresen

This comment has been minimized.

Show comment
Hide comment
@gavinandresen

gavinandresen Apr 1, 2015

Contributor

Hmmm....

Would this work instead? Modify sync.h CMutexLock::TryEnter to only call EnterCritical() if the lock is successfully acquired?

    bool TryEnter(const char* pszName, const char* pszFile, int nLine)
    {
        lock.try_lock();
        if (lock.owns_lock()
              EnterCritical(pszName, pszFile, nLine, (void*)(lock.mutex()), true);
        return lock.owns_lock();
    }

(or, in other words: just ignore TRY_LOCKS that don't acquire the lock)

Contributor

gavinandresen commented Apr 1, 2015

Hmmm....

Would this work instead? Modify sync.h CMutexLock::TryEnter to only call EnterCritical() if the lock is successfully acquired?

    bool TryEnter(const char* pszName, const char* pszFile, int nLine)
    {
        lock.try_lock();
        if (lock.owns_lock()
              EnterCritical(pszName, pszFile, nLine, (void*)(lock.mutex()), true);
        return lock.owns_lock();
    }

(or, in other words: just ignore TRY_LOCKS that don't acquire the lock)

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Apr 3, 2015

Contributor

@gavinandresen No, because often you see warnings because you have a lock(B) followed by a try_lock(A) and a lock(A) followed by a lock(B) at different times, you'll get a warning, but your suggestion would ignore it because you didnt actually see a deadlock.

Also, you cant just ignore try_locks entirely because it is very easy to create a deadlock with a try_lock.

Contributor

TheBlueMatt commented Apr 3, 2015

@gavinandresen No, because often you see warnings because you have a lock(B) followed by a try_lock(A) and a lock(A) followed by a lock(B) at different times, you'll get a warning, but your suggestion would ignore it because you didnt actually see a deadlock.

Also, you cant just ignore try_locks entirely because it is very easy to create a deadlock with a try_lock.

@gavinandresen

This comment has been minimized.

Show comment
Hide comment
@gavinandresen

gavinandresen Apr 3, 2015

Contributor

@TheBlueMatt : maybe I'm being dense... but:

If we get: lock(B) then a successful (lock-acquired) try_lock(A), then the lockorders map will contain "B then A".
If at any point in the future lock(A) is followed by lock(B), a warning will be triggered -- my suggestion doesn't require that there be an actual deadlock.

I'm not suggesting that try_lock be ignored entirely....

Contributor

gavinandresen commented Apr 3, 2015

@TheBlueMatt : maybe I'm being dense... but:

If we get: lock(B) then a successful (lock-acquired) try_lock(A), then the lockorders map will contain "B then A".
If at any point in the future lock(A) is followed by lock(B), a warning will be triggered -- my suggestion doesn't require that there be an actual deadlock.

I'm not suggesting that try_lock be ignored entirely....

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Apr 4, 2015

Contributor

@gavinandresen Yes, and if a warning were triggered in that (beningn) case, then you'd see things crash on the new assert?

Contributor

TheBlueMatt commented Apr 4, 2015

@gavinandresen Yes, and if a warning were triggered in that (beningn) case, then you'd see things crash on the new assert?

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 8, 2015

Member

Haven't spent too much time thinking about this, but having (A, B) in one place and (B, try A) in another place can never cause a deadlock. I believe that the correct checking behaviour is to add any acquired lock to the table, but never do a deadlock check when entering a try_lock.

Member

sipa commented Apr 8, 2015

Haven't spent too much time thinking about this, but having (A, B) in one place and (B, try A) in another place can never cause a deadlock. I believe that the correct checking behaviour is to add any acquired lock to the table, but never do a deadlock check when entering a try_lock.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Apr 8, 2015

Contributor

@sipa: Consider the case A, tryB, C and C, tryB, A. This is actually not a deadlock, but if you simply skip the deadlock check for B, you'll get an error.

Contributor

TheBlueMatt commented Apr 8, 2015

@sipa: Consider the case A, tryB, C and C, tryB, A. This is actually not a deadlock, but if you simply skip the deadlock check for B, you'll get an error.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Apr 8, 2015

Contributor

IIRC, there was a similar case that I saw that prompted this pull.

Contributor

TheBlueMatt commented Apr 8, 2015

IIRC, there was a similar case that I saw that prompted this pull.

@gavinandresen

This comment has been minimized.

Show comment
Hide comment
@gavinandresen

gavinandresen Apr 8, 2015

Contributor

@TheBlueMatt : sorry, I WAS just being dense.

I'm with @laanwj -- ACK on the concept, but I'm having trouble convincing myself the code is correct (I may just need more caffeine before thinking about it some more). It feels like there should be a simpler solution.

Contributor

gavinandresen commented Apr 8, 2015

@TheBlueMatt : sorry, I WAS just being dense.

I'm with @laanwj -- ACK on the concept, but I'm having trouble convincing myself the code is correct (I may just need more caffeine before thinking about it some more). It feels like there should be a simpler solution.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Apr 8, 2015

Member

Interesting. I agree (a,try b,c),(c,try b,a)) is not a deadlock, but it is
not something any code should be doing - it looks extremely fragile
(remove a lock, and turn it into a deadlock). So I'd rather trigger
deadlock warnings for it too.

Member

sipa commented Apr 8, 2015

Interesting. I agree (a,try b,c),(c,try b,a)) is not a deadlock, but it is
not something any code should be doing - it looks extremely fragile
(remove a lock, and turn it into a deadlock). So I'd rather trigger
deadlock warnings for it too.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Apr 8, 2015

Contributor

IIRC we (used to) do something like that. We may not anymore, but some of the network processing stuff is a mess.

On April 8, 2015 3:46:03 PM GMT+01:00, Pieter Wuille notifications@github.com wrote:

Interesting. I agree (a,try b,c),(c,try b,a)) is not a deadlock, but it
is
not something any code should be doing - it looks extremely fragile
(remove a lock, and turn it into a deadlock). So I'd rather trigger
deadlock warnings for it too.


Reply to this email directly or view it on GitHub:
#5515 (comment)

Contributor

TheBlueMatt commented Apr 8, 2015

IIRC we (used to) do something like that. We may not anymore, but some of the network processing stuff is a mess.

On April 8, 2015 3:46:03 PM GMT+01:00, Pieter Wuille notifications@github.com wrote:

Interesting. I agree (a,try b,c),(c,try b,a)) is not a deadlock, but it
is
not something any code should be doing - it looks extremely fragile
(remove a lock, and turn it into a deadlock). So I'd rather trigger
deadlock warnings for it too.


Reply to this email directly or view it on GitHub:
#5515 (comment)

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Jul 22, 2015

Contributor

@sipa checked, turns out the deadlock detection code never called potential_deadlock_detected for TRY_LOCKs so we definitely still do require the code in this pull before we can assert on deadlocks. Also rebased and added a comment so that this is more readable.

Contributor

TheBlueMatt commented Jul 22, 2015

@sipa checked, turns out the deadlock detection code never called potential_deadlock_detected for TRY_LOCKs so we definitely still do require the code in this pull before we can assert on deadlocks. Also rebased and added a comment so that this is more readable.

@laanwj laanwj merged commit 0fcc4e1 into bitcoin:master Jul 23, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Jul 23, 2015

Merge pull request #5515
0fcc4e1 Assert on probable deadlocks if the second lock isnt try_lock (Matt Corallo)

@TheBlueMatt TheBlueMatt deleted the TheBlueMatt:probable_assert branch Jul 24, 2015

laanwj added a commit that referenced this pull request Aug 6, 2015

Revert "Assert on probable deadlocks if the second lock isnt try_lock"
Disabling this for now - too many intermittent Travis issues.

This reverts commit 0fcc4e1
(pull #5515).

@str4d str4d referenced this pull request in zcash/zcash Feb 15, 2017

Merged

Bitcoin 0.12 test PRs 1 #2101

zkbot added a commit to zcash/zcash that referenced this pull request Feb 15, 2017

zkbot added a commit to zcash/zcash that referenced this pull request Feb 20, 2017

zkbot added a commit to zcash/zcash that referenced this pull request Mar 3, 2017

Auto merge of #2101 - str4d:2074-tests, r=arcalinea
Bitcoin 0.12 test PRs 1

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6337
- bitcoin/bitcoin#6390
- bitcoin/bitcoin#5515
- bitcoin/bitcoin#6287 (partial, remainder included in bitcoin/bitcoin#6703)
- bitcoin/bitcoin#6465

Part of #2074.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment