Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang lock debug #6287

Merged
merged 3 commits into from Jul 23, 2015
Merged

Clang lock debug #6287

merged 3 commits into from Jul 23, 2015

Conversation

@theuni
Copy link
Member

theuni commented Jun 16, 2015

This is part of a much larger networking refactor.

After spending a while looking at different approaches for cleaning up the networking code, it's become apparent that some functions/classes are going to have to drop cs_main in favor of using their own fine-grained locks.

Breaking up cs_main would be a monstrous task, and would likely be even harder to review.

As a first step, I've fixed up our locking so that LOCK() and friends play nicely with clang's -Wthread-safety static-analysis option. This will allow us to document locking assumptions while verifying them at the same time. I added a794284 as an example of how this is useful.

My goal is to add the EXCLUSIVE_LOCKS_REQUIRED(cs_main)/GUARDED_BY(cs_main) as needed for functions that need to be broken out of the cs_main lock, that way locking changes can be verified much more easily.

To test:

./configure CXXFLAGS="-O2 -g -Wthread-safety" CXX=clang++ CC=clang

To see an example of a failure in action, comment out LOCK(cs_main); in FinalizeNode(). The result should be something like:

main.cpp:293:5: warning: calling function 'EraseOrphansFor' requires exclusive lock on 'cs_main' [-Wthread-safety-analysis]
    EraseOrphansFor(nodeid);
theuni added 3 commits Jun 16, 2015
…acros

This allows us to use function/variable/class attributes to specify locking
requisites, allowing problems to be detected during static analysis.

This works perfectly with newer Clang versions (tested with 3.3-3.7). For older
versions (tested 3.2), it compiles fine but spews lots of false-positives.
- rpcwallet: No need to lock twice here
- openssl: Clang doesn't understand selective lock/unlock here. Ignore it.
- CNode: Fix a legitimate (though very unlikely) locking bug.
This was chosen not because it's necessarily helpful, but because its locking
assumptions were already correct.
@Diapolo
Copy link

Diapolo commented Jun 16, 2015

This looks very nice and it's great to be able to use such static analyzers with our codebase. Looking forward to see what issues can be uncovered with this.

@sipa
Copy link
Member

sipa commented Jun 16, 2015

utACK

@fanquake
Copy link
Member

fanquake commented Jun 16, 2015

Tested building with -Wthread-safety. Warning is generated as expected when commenting out the LOCK() in FinalizeNode()

main.cpp:293:5: warning: calling function 'EraseOrphansFor' requires holding mutex 'cs_main' exclusively [-Wthread-safety-analysis]
    EraseOrphansFor(nodeid);
    ^
1 warning generated.
@fanquake
Copy link
Member

fanquake commented Jun 16, 2015

The other warnings fixed by 2b890dd

  CXX      libbitcoin_server_a-net.o
net.cpp:2059:1: warning: mutex 'cs_vSend' is not held on every path through here
      [-Wthread-safety-analysis]
}
^
net.cpp:2020:26: note: mutex acquired here
void CNode::EndMessage() UNLOCK_FUNCTION(cs_vSend)
                         ^
./threadsafety.h:28:45: note: expanded from macro 'UNLOCK_FUNCTION'
#define UNLOCK_FUNCTION(...) __attribute__((unlock_function(__VA_ARGS__)))
                                            ^
1 warning generated.
  CXX      libbitcoin_server_a-rpcmining.o
rpcmining.cpp:463:9: warning: releasing mutex 'cs_main' that was not held
      [-Wthread-safety-analysis]
        LEAVE_CRITICAL_SECTION(cs_main);
        ^
./sync.h:177:9: note: expanded from macro 'LEAVE_CRITICAL_SECTION'
        (cs).unlock();             \
        ^
rpcmining.cpp:487:5: warning: mutex 'cs_main' is not held on every path through
      here [-Wthread-safety-analysis]
    static CBlockIndex* pindexPrev;
    ^
rpcmining.cpp:479:9: note: mutex acquired here
        ENTER_CRITICAL_SECTION(cs_main);
        ^
./sync.h:172:9: note: expanded from macro 'ENTER_CRITICAL_SECTION'
        (cs).lock();                                          \
        ^
2 warnings generated.
  CXX      libbitcoin_util_a-util.o
util.cpp:121:9: warning: releasing mutex 'ppmutexOpenSSL[i]' that was not held [-Wthread-safety-analysis]
        LEAVE_CRITICAL_SECTION(*ppmutexOpenSSL[i]);
        ^
./sync.h:177:9: note: expanded from macro 'LEAVE_CRITICAL_SECTION'
        (cs).unlock();             \
        ^
util.cpp:123:1: warning: mutex 'ppmutexOpenSSL[i]' is not held on every path through here
      [-Wthread-safety-analysis]
}
^
util.cpp:119:9: note: mutex acquired here
        ENTER_CRITICAL_SECTION(*ppmutexOpenSSL[i]);
        ^
./sync.h:172:9: note: expanded from macro 'ENTER_CRITICAL_SECTION'
        (cs).lock();                                          \
        ^
2 warnings generated.
@laanwj
Copy link
Member

laanwj commented Jun 17, 2015

Concept ACK, still need to review in detail.
Very nice to see the static analysis annotations finally used.

@laanwj laanwj added the Refactoring label Jun 17, 2015
@theuni
Copy link
Member Author

theuni commented Jun 17, 2015

@laanwj There are a few details that aren't quite right, but I didn't want to change too much in this PR until there was some interest. I can take a stab at fixing them up before/after pull, whichever you'd prefer:

  • during analysis, TRY_LOCKs are interpreted as always gaining the lock. I don't think that's much of an issue in practice, since from a static point-of-view there's not much difference. Just need to remember that it won't point out cases where you TRY_LOCK and forget to test the result.
  • There's no distinction between a mutex/recursive mutex. I think that's not really an issue either, since clang points out when we're locking a mutex that's already locked (see rpcwallet in 2b890dd)

For examples of real usage, see my play tree here: https://github.com/theuni/bitcoin/commits/clang-lock-debug-more. It fleshed out several missing locks already.

@laanwj
Copy link
Member

laanwj commented Jun 19, 2015

I think that's not really an issue either, since clang points out when we're locking a mutex that's already locked (see rpcwallet in 2b890dd)

Could this be an issue if a lock is held multiple times along one code path, but only once along another code path? If you then remove the locking (according to clang's warning) in one place it may result in the lock not held anymore at all in one of the cases.

@theuni
Copy link
Member Author

theuni commented Jun 23, 2015

@laanwj I've looked deeper into this, here's a good sample of some snags we hit:

static RecursiveMutex cs_main;
static int myint = 3;
static std::vector<int*> GUARDED_BY(cs_main) vec(1,&myint);
int* func2()
{
  LOCK(cs_main);
  return vec[0];
}

void func1() EXCLUSIVE_LOCKS_REQUIRED(cs_main)
{
  func2();
  vec.clear();
}

int main()
{
    int* unguarded = func2();
    *unguarded = 0;
    LOCK(cs_main);
    func1();
    unguarded = func2();
    return unguarded;
}

Things clang doesn't complain about:

  • *unguarded = 0 works without locking cs_main. Unfortunately, we return pointers like that pretty often.
  • cs_main is locked twice in the second call to func2().

Its analysis seems to be entirely within the scope of each function. If you don't lock in func2, it complains about accessing vec, because it doesn't know that it should already be locked. You can tell it that it should assume cs_main is locked by specifying EXCLUSIVE_LOCKS_REQUIRED(cs_main), but then it complains about locking twice.

After going through and marking a few functions/variables, it becomes very clear how it all fits together. Since our mutexes are recursive anyway, the ignored double-locks aren't a problem anyway. If you really want to prevent double-locks (treat as a non-recursive mutex) you can notate the function with EXCLUSIVE_LOCKS_REQUIRED(!cs_main).

However, it does point out functions where we needlessly lock twice. rpcwallet in a794284 is a good example of that. If you have 2 scoped locks in the same function, or if you specify that a lock is required then lock again, it'll warn.

So to answer your question: if you remove a lock because it's locked twice in one path, and that leaves another path exposed, it'll warn about the new exposed path.

The basic work-flow is:

  • add GUARDED_BY(cs_foo) to a variable.
  • build. You'll get a slew of new warnings because functions aren't marked.
  • go through each warning and add EXCLUSIVE_LOCKS_REQUIRED(cs_foo) to each function where it needs to be locked before entering if that function does not lock first.
  • build again. if you have new warnings, it means that we do unlocked calls into the functions that you just notated. Potential bug squashed! Now you need to either lock inside of the called function and remove the _REQUIRED, or leave it and lock first.

So the big question is: For this to be useful, tons of functions need to be notated. Would you be ok with that? I'm happy to do the work (i've already done much of it locally), but I'm afraid that it will get pretty noisy. For what it's worth, once the current issues are cleaned up, travis could be set to fail when new problems are detected.

@gavinandresen
Copy link
Contributor

gavinandresen commented Jun 23, 2015

I vote for noisy annotations-- locking bugs are NASTY.

@theuni
Copy link
Member Author

theuni commented Jun 23, 2015

For anyone wanting to try out the semantics, here's an easy test that simulates our environment and conventions: https://gist.github.com/theuni/f5d6d3db9434ca546422.
Compile with: clang++ -std=c++11 -fsyntax-only -Wthread-safety locks.cpp

@laanwj
Copy link
Member

laanwj commented Jul 20, 2015

Is this ready for merging? If not, can we get a useful subset in?

@theuni
Copy link
Member Author

theuni commented Jul 22, 2015

Yes, this is useful as a starting point.

@laanwj laanwj merged commit a794284 into bitcoin:master Jul 23, 2015
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
laanwj added a commit that referenced this pull request Jul 23, 2015
a794284 locking: add a quick example of GUARDED_BY (Cory Fields)
2b890dd locking: fix a few small issues uncovered by -Wthread-safety (Cory Fields)
cd27bba locking: teach Clang's -Wthread-safety to cope with our scoped lock macros (Cory Fields)
zkbot added a commit to zcash/zcash that referenced this pull request Feb 15, 2017
zkbot added a commit to zcash/zcash that referenced this pull request Feb 20, 2017
zkbot added a commit to zcash/zcash that referenced this pull request Mar 3, 2017
Bitcoin 0.12 test PRs 1

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6337
- bitcoin/bitcoin#6390
- bitcoin/bitcoin#5515
- bitcoin/bitcoin#6287 (partial, remainder included in bitcoin/bitcoin#6703)
- bitcoin/bitcoin#6465

Part of #2074.
HLXEasy added a commit to spectrecoin/spectre that referenced this pull request Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.