New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang lock debug #6287

Merged
merged 3 commits into from Jul 23, 2015

Conversation

Projects
None yet
6 participants
@theuni
Member

theuni commented Jun 16, 2015

This is part of a much larger networking refactor.

After spending a while looking at different approaches for cleaning up the networking code, it's become apparent that some functions/classes are going to have to drop cs_main in favor of using their own fine-grained locks.

Breaking up cs_main would be a monstrous task, and would likely be even harder to review.

As a first step, I've fixed up our locking so that LOCK() and friends play nicely with clang's -Wthread-safety static-analysis option. This will allow us to document locking assumptions while verifying them at the same time. I added a794284 as an example of how this is useful.

My goal is to add the EXCLUSIVE_LOCKS_REQUIRED(cs_main)/GUARDED_BY(cs_main) as needed for functions that need to be broken out of the cs_main lock, that way locking changes can be verified much more easily.

To test:

./configure CXXFLAGS="-O2 -g -Wthread-safety" CXX=clang++ CC=clang

To see an example of a failure in action, comment out LOCK(cs_main); in FinalizeNode(). The result should be something like:

main.cpp:293:5: warning: calling function 'EraseOrphansFor' requires exclusive lock on 'cs_main' [-Wthread-safety-analysis]
    EraseOrphansFor(nodeid);

theuni added some commits Jun 16, 2015

locking: teach Clang's -Wthread-safety to cope with our scoped lock m…
…acros

This allows us to use function/variable/class attributes to specify locking
requisites, allowing problems to be detected during static analysis.

This works perfectly with newer Clang versions (tested with 3.3-3.7). For older
versions (tested 3.2), it compiles fine but spews lots of false-positives.
locking: fix a few small issues uncovered by -Wthread-safety
- rpcwallet: No need to lock twice here
- openssl: Clang doesn't understand selective lock/unlock here. Ignore it.
- CNode: Fix a legitimate (though very unlikely) locking bug.
locking: add a quick example of GUARDED_BY
This was chosen not because it's necessarily helpful, but because its locking
assumptions were already correct.
@Diapolo

This comment has been minimized.

Show comment
Hide comment
@Diapolo

Diapolo Jun 16, 2015

This looks very nice and it's great to be able to use such static analyzers with our codebase. Looking forward to see what issues can be uncovered with this.

Diapolo commented Jun 16, 2015

This looks very nice and it's great to be able to use such static analyzers with our codebase. Looking forward to see what issues can be uncovered with this.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 16, 2015

Member

utACK

Member

sipa commented Jun 16, 2015

utACK

@fanquake

This comment has been minimized.

Show comment
Hide comment
@fanquake

fanquake Jun 16, 2015

Member

Tested building with -Wthread-safety. Warning is generated as expected when commenting out the LOCK() in FinalizeNode()

main.cpp:293:5: warning: calling function 'EraseOrphansFor' requires holding mutex 'cs_main' exclusively [-Wthread-safety-analysis]
    EraseOrphansFor(nodeid);
    ^
1 warning generated.
Member

fanquake commented Jun 16, 2015

Tested building with -Wthread-safety. Warning is generated as expected when commenting out the LOCK() in FinalizeNode()

main.cpp:293:5: warning: calling function 'EraseOrphansFor' requires holding mutex 'cs_main' exclusively [-Wthread-safety-analysis]
    EraseOrphansFor(nodeid);
    ^
1 warning generated.
@fanquake

This comment has been minimized.

Show comment
Hide comment
@fanquake

fanquake Jun 16, 2015

Member

The other warnings fixed by 2b890dd

  CXX      libbitcoin_server_a-net.o
net.cpp:2059:1: warning: mutex 'cs_vSend' is not held on every path through here
      [-Wthread-safety-analysis]
}
^
net.cpp:2020:26: note: mutex acquired here
void CNode::EndMessage() UNLOCK_FUNCTION(cs_vSend)
                         ^
./threadsafety.h:28:45: note: expanded from macro 'UNLOCK_FUNCTION'
#define UNLOCK_FUNCTION(...) __attribute__((unlock_function(__VA_ARGS__)))
                                            ^
1 warning generated.
  CXX      libbitcoin_server_a-rpcmining.o
rpcmining.cpp:463:9: warning: releasing mutex 'cs_main' that was not held
      [-Wthread-safety-analysis]
        LEAVE_CRITICAL_SECTION(cs_main);
        ^
./sync.h:177:9: note: expanded from macro 'LEAVE_CRITICAL_SECTION'
        (cs).unlock();             \
        ^
rpcmining.cpp:487:5: warning: mutex 'cs_main' is not held on every path through
      here [-Wthread-safety-analysis]
    static CBlockIndex* pindexPrev;
    ^
rpcmining.cpp:479:9: note: mutex acquired here
        ENTER_CRITICAL_SECTION(cs_main);
        ^
./sync.h:172:9: note: expanded from macro 'ENTER_CRITICAL_SECTION'
        (cs).lock();                                          \
        ^
2 warnings generated.
  CXX      libbitcoin_util_a-util.o
util.cpp:121:9: warning: releasing mutex 'ppmutexOpenSSL[i]' that was not held [-Wthread-safety-analysis]
        LEAVE_CRITICAL_SECTION(*ppmutexOpenSSL[i]);
        ^
./sync.h:177:9: note: expanded from macro 'LEAVE_CRITICAL_SECTION'
        (cs).unlock();             \
        ^
util.cpp:123:1: warning: mutex 'ppmutexOpenSSL[i]' is not held on every path through here
      [-Wthread-safety-analysis]
}
^
util.cpp:119:9: note: mutex acquired here
        ENTER_CRITICAL_SECTION(*ppmutexOpenSSL[i]);
        ^
./sync.h:172:9: note: expanded from macro 'ENTER_CRITICAL_SECTION'
        (cs).lock();                                          \
        ^
2 warnings generated.
Member

fanquake commented Jun 16, 2015

The other warnings fixed by 2b890dd

  CXX      libbitcoin_server_a-net.o
net.cpp:2059:1: warning: mutex 'cs_vSend' is not held on every path through here
      [-Wthread-safety-analysis]
}
^
net.cpp:2020:26: note: mutex acquired here
void CNode::EndMessage() UNLOCK_FUNCTION(cs_vSend)
                         ^
./threadsafety.h:28:45: note: expanded from macro 'UNLOCK_FUNCTION'
#define UNLOCK_FUNCTION(...) __attribute__((unlock_function(__VA_ARGS__)))
                                            ^
1 warning generated.
  CXX      libbitcoin_server_a-rpcmining.o
rpcmining.cpp:463:9: warning: releasing mutex 'cs_main' that was not held
      [-Wthread-safety-analysis]
        LEAVE_CRITICAL_SECTION(cs_main);
        ^
./sync.h:177:9: note: expanded from macro 'LEAVE_CRITICAL_SECTION'
        (cs).unlock();             \
        ^
rpcmining.cpp:487:5: warning: mutex 'cs_main' is not held on every path through
      here [-Wthread-safety-analysis]
    static CBlockIndex* pindexPrev;
    ^
rpcmining.cpp:479:9: note: mutex acquired here
        ENTER_CRITICAL_SECTION(cs_main);
        ^
./sync.h:172:9: note: expanded from macro 'ENTER_CRITICAL_SECTION'
        (cs).lock();                                          \
        ^
2 warnings generated.
  CXX      libbitcoin_util_a-util.o
util.cpp:121:9: warning: releasing mutex 'ppmutexOpenSSL[i]' that was not held [-Wthread-safety-analysis]
        LEAVE_CRITICAL_SECTION(*ppmutexOpenSSL[i]);
        ^
./sync.h:177:9: note: expanded from macro 'LEAVE_CRITICAL_SECTION'
        (cs).unlock();             \
        ^
util.cpp:123:1: warning: mutex 'ppmutexOpenSSL[i]' is not held on every path through here
      [-Wthread-safety-analysis]
}
^
util.cpp:119:9: note: mutex acquired here
        ENTER_CRITICAL_SECTION(*ppmutexOpenSSL[i]);
        ^
./sync.h:172:9: note: expanded from macro 'ENTER_CRITICAL_SECTION'
        (cs).lock();                                          \
        ^
2 warnings generated.
@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 17, 2015

Member

Concept ACK, still need to review in detail.
Very nice to see the static analysis annotations finally used.

Member

laanwj commented Jun 17, 2015

Concept ACK, still need to review in detail.
Very nice to see the static analysis annotations finally used.

@laanwj laanwj added the Refactoring label Jun 17, 2015

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 17, 2015

Member

@laanwj There are a few details that aren't quite right, but I didn't want to change too much in this PR until there was some interest. I can take a stab at fixing them up before/after pull, whichever you'd prefer:

  • during analysis, TRY_LOCKs are interpreted as always gaining the lock. I don't think that's much of an issue in practice, since from a static point-of-view there's not much difference. Just need to remember that it won't point out cases where you TRY_LOCK and forget to test the result.
  • There's no distinction between a mutex/recursive mutex. I think that's not really an issue either, since clang points out when we're locking a mutex that's already locked (see rpcwallet in 2b890dd)

For examples of real usage, see my play tree here: https://github.com/theuni/bitcoin/commits/clang-lock-debug-more. It fleshed out several missing locks already.

Member

theuni commented Jun 17, 2015

@laanwj There are a few details that aren't quite right, but I didn't want to change too much in this PR until there was some interest. I can take a stab at fixing them up before/after pull, whichever you'd prefer:

  • during analysis, TRY_LOCKs are interpreted as always gaining the lock. I don't think that's much of an issue in practice, since from a static point-of-view there's not much difference. Just need to remember that it won't point out cases where you TRY_LOCK and forget to test the result.
  • There's no distinction between a mutex/recursive mutex. I think that's not really an issue either, since clang points out when we're locking a mutex that's already locked (see rpcwallet in 2b890dd)

For examples of real usage, see my play tree here: https://github.com/theuni/bitcoin/commits/clang-lock-debug-more. It fleshed out several missing locks already.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 19, 2015

Member

I think that's not really an issue either, since clang points out when we're locking a mutex that's already locked (see rpcwallet in 2b890dd)

Could this be an issue if a lock is held multiple times along one code path, but only once along another code path? If you then remove the locking (according to clang's warning) in one place it may result in the lock not held anymore at all in one of the cases.

Member

laanwj commented Jun 19, 2015

I think that's not really an issue either, since clang points out when we're locking a mutex that's already locked (see rpcwallet in 2b890dd)

Could this be an issue if a lock is held multiple times along one code path, but only once along another code path? If you then remove the locking (according to clang's warning) in one place it may result in the lock not held anymore at all in one of the cases.

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 23, 2015

Member

@laanwj I've looked deeper into this, here's a good sample of some snags we hit:

static RecursiveMutex cs_main;
static int myint = 3;
static std::vector<int*> GUARDED_BY(cs_main) vec(1,&myint);
int* func2()
{
  LOCK(cs_main);
  return vec[0];
}

void func1() EXCLUSIVE_LOCKS_REQUIRED(cs_main)
{
  func2();
  vec.clear();
}

int main()
{
    int* unguarded = func2();
    *unguarded = 0;
    LOCK(cs_main);
    func1();
    unguarded = func2();
    return unguarded;
}

Things clang doesn't complain about:

  • *unguarded = 0 works without locking cs_main. Unfortunately, we return pointers like that pretty often.
  • cs_main is locked twice in the second call to func2().

Its analysis seems to be entirely within the scope of each function. If you don't lock in func2, it complains about accessing vec, because it doesn't know that it should already be locked. You can tell it that it should assume cs_main is locked by specifying EXCLUSIVE_LOCKS_REQUIRED(cs_main), but then it complains about locking twice.

After going through and marking a few functions/variables, it becomes very clear how it all fits together. Since our mutexes are recursive anyway, the ignored double-locks aren't a problem anyway. If you really want to prevent double-locks (treat as a non-recursive mutex) you can notate the function with EXCLUSIVE_LOCKS_REQUIRED(!cs_main).

However, it does point out functions where we needlessly lock twice. rpcwallet in a794284 is a good example of that. If you have 2 scoped locks in the same function, or if you specify that a lock is required then lock again, it'll warn.

So to answer your question: if you remove a lock because it's locked twice in one path, and that leaves another path exposed, it'll warn about the new exposed path.

The basic work-flow is:

  • add GUARDED_BY(cs_foo) to a variable.
  • build. You'll get a slew of new warnings because functions aren't marked.
  • go through each warning and add EXCLUSIVE_LOCKS_REQUIRED(cs_foo) to each function where it needs to be locked before entering if that function does not lock first.
  • build again. if you have new warnings, it means that we do unlocked calls into the functions that you just notated. Potential bug squashed! Now you need to either lock inside of the called function and remove the _REQUIRED, or leave it and lock first.

So the big question is: For this to be useful, tons of functions need to be notated. Would you be ok with that? I'm happy to do the work (i've already done much of it locally), but I'm afraid that it will get pretty noisy. For what it's worth, once the current issues are cleaned up, travis could be set to fail when new problems are detected.

Member

theuni commented Jun 23, 2015

@laanwj I've looked deeper into this, here's a good sample of some snags we hit:

static RecursiveMutex cs_main;
static int myint = 3;
static std::vector<int*> GUARDED_BY(cs_main) vec(1,&myint);
int* func2()
{
  LOCK(cs_main);
  return vec[0];
}

void func1() EXCLUSIVE_LOCKS_REQUIRED(cs_main)
{
  func2();
  vec.clear();
}

int main()
{
    int* unguarded = func2();
    *unguarded = 0;
    LOCK(cs_main);
    func1();
    unguarded = func2();
    return unguarded;
}

Things clang doesn't complain about:

  • *unguarded = 0 works without locking cs_main. Unfortunately, we return pointers like that pretty often.
  • cs_main is locked twice in the second call to func2().

Its analysis seems to be entirely within the scope of each function. If you don't lock in func2, it complains about accessing vec, because it doesn't know that it should already be locked. You can tell it that it should assume cs_main is locked by specifying EXCLUSIVE_LOCKS_REQUIRED(cs_main), but then it complains about locking twice.

After going through and marking a few functions/variables, it becomes very clear how it all fits together. Since our mutexes are recursive anyway, the ignored double-locks aren't a problem anyway. If you really want to prevent double-locks (treat as a non-recursive mutex) you can notate the function with EXCLUSIVE_LOCKS_REQUIRED(!cs_main).

However, it does point out functions where we needlessly lock twice. rpcwallet in a794284 is a good example of that. If you have 2 scoped locks in the same function, or if you specify that a lock is required then lock again, it'll warn.

So to answer your question: if you remove a lock because it's locked twice in one path, and that leaves another path exposed, it'll warn about the new exposed path.

The basic work-flow is:

  • add GUARDED_BY(cs_foo) to a variable.
  • build. You'll get a slew of new warnings because functions aren't marked.
  • go through each warning and add EXCLUSIVE_LOCKS_REQUIRED(cs_foo) to each function where it needs to be locked before entering if that function does not lock first.
  • build again. if you have new warnings, it means that we do unlocked calls into the functions that you just notated. Potential bug squashed! Now you need to either lock inside of the called function and remove the _REQUIRED, or leave it and lock first.

So the big question is: For this to be useful, tons of functions need to be notated. Would you be ok with that? I'm happy to do the work (i've already done much of it locally), but I'm afraid that it will get pretty noisy. For what it's worth, once the current issues are cleaned up, travis could be set to fail when new problems are detected.

@gavinandresen

This comment has been minimized.

Show comment
Hide comment
@gavinandresen

gavinandresen Jun 23, 2015

Contributor

I vote for noisy annotations-- locking bugs are NASTY.

Contributor

gavinandresen commented Jun 23, 2015

I vote for noisy annotations-- locking bugs are NASTY.

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jun 23, 2015

Member

For anyone wanting to try out the semantics, here's an easy test that simulates our environment and conventions: https://gist.github.com/theuni/f5d6d3db9434ca546422.
Compile with: clang++ -std=c++11 -fsyntax-only -Wthread-safety locks.cpp

Member

theuni commented Jun 23, 2015

For anyone wanting to try out the semantics, here's an easy test that simulates our environment and conventions: https://gist.github.com/theuni/f5d6d3db9434ca546422.
Compile with: clang++ -std=c++11 -fsyntax-only -Wthread-safety locks.cpp

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jul 20, 2015

Member

Is this ready for merging? If not, can we get a useful subset in?

Member

laanwj commented Jul 20, 2015

Is this ready for merging? If not, can we get a useful subset in?

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Jul 22, 2015

Member

Yes, this is useful as a starting point.

Member

theuni commented Jul 22, 2015

Yes, this is useful as a starting point.

@laanwj laanwj merged commit a794284 into bitcoin:master Jul 23, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Jul 23, 2015

Merge pull request #6287
a794284 locking: add a quick example of GUARDED_BY (Cory Fields)
2b890dd locking: fix a few small issues uncovered by -Wthread-safety (Cory Fields)
cd27bba locking: teach Clang's -Wthread-safety to cope with our scoped lock macros (Cory Fields)

@str4d str4d referenced this pull request Feb 15, 2017

Merged

Bitcoin 0.12 test PRs 1 #2101

zkbot added a commit to zcash/zcash that referenced this pull request Feb 15, 2017

zkbot added a commit to zcash/zcash that referenced this pull request Feb 20, 2017

zkbot added a commit to zcash/zcash that referenced this pull request Mar 3, 2017

Auto merge of #2101 - str4d:2074-tests, r=arcalinea
Bitcoin 0.12 test PRs 1

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6337
- bitcoin/bitcoin#6390
- bitcoin/bitcoin#5515
- bitcoin/bitcoin#6287 (partial, remainder included in bitcoin/bitcoin#6703)
- bitcoin/bitcoin#6465

Part of #2074.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment