New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msg/async/AsyncConnection: replace Mutex with std::mutex for peformance #10340
Conversation
Hmm, are you sure things are constructed so that this lockdep warning is incorrect? In general if a Connection call chain leads to a lock on another entity's Connection, it can lead back to itself. No particular objections (I didn't look at the details either); I just get nervous when I see people disabling lockdep. ;) |
@gregsfortytwo yes, previously I consider to force no lockdep for lock. Since we have a lot of changes switches to std::mutex, so I think we could use std::mutex here too |
why it is a false alarm? |
@tchaikov lockdep uses string to record lock trace, if two async connection in the same thead and accquired, it will be reported as lock dep. Actually it's not. |
@yuyuyu101, it's not just two Connections being acquired in the same thread; it's if Connection lock is held while acquiring a lock foo, and then later it sees lock foos is held while acquiring a Connection lock. This can occur if you have two Connections, but it's usually not safe. In the SimpleMessenger the only case where this is safe is when replacing an existing Pipe with a new one on reconnect, and we do a very careful dance to make sure it's okay. If it's happening during some other case, it's probably not okay. Can you share the details of this one? |
@gregsfortytwo yes, this lock dep is the same with SimpleMessenger's replace tag. 1: (Mutex::_will_lock()+0x3b) [0x201adc7] Like this line(https://github.com/ceph/ceph/blob/master/src/msg/async/AsyncConnection.cc#L1808), in order to avoid lockdep false alarm, we pass "false" to disable lockdep. The false alarm introduced by this commit(e66a48f) because we make _stop() call within existing->write_lock.Lock(). So the other fix is we add more lock.Lock(false) to other codes which may expected too. |
Another reason why I prefer to discard Mutex instead of fix lock dep issue is that AsyncMessenger uses simple and clear lock rule, it doesn't really rely on Mutex detection... |
@yuriw hope this one also can be added to testing |
i don't follow your explanation, as existing->write_lock should be different from the this->write_lock at the moment of assert failure. |
lgtm. |
@yuyuyu101 pls rebase and add needs-qa tag back (its conflicting with newly merged PRs) |
@yuriw rebased |
@yuyuyu101 thx, retrying |
I pushed the build, but Maybe there is something else ? |
The 16714 issue is caused by when replacing process. Accept connection will try to acquire another connection's lock. But all connection's lock name are the same. So it will result in lockdep map make wrong judgement. Fixes: http://tracker.ceph.com/issues/16714 Signed-off-by: Haomai Wang <haomai@xsky.com>
@yuriw done, thanks! |
@yuyuyu101 looks better, thx, will be running tests now |
Issue 16715 is a false alarm for lock deps, now we remove this check.
Fixes: http://tracker.ceph.com/issues/16715
Signed-off-by: Haomai Wang haomai@xsky.com