Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async/AsyncConnection: replace Mutex with std::mutex for peformance #10340

Merged
merged 1 commit into from Aug 1, 2016

Conversation

yuyuyu101
Copy link
Member

Issue 16715 is a false alarm for lock deps, now we remove this check.

Fixes: http://tracker.ceph.com/issues/16715
Signed-off-by: Haomai Wang haomai@xsky.com

@gregsfortytwo
Copy link
Member

Hmm, are you sure things are constructed so that this lockdep warning is incorrect? In general if a Connection call chain leads to a lock on another entity's Connection, it can lead back to itself.
If it's a specific scenario being hit that you can prove is safe, you can also set the (default-false) "nolockdep" parameter to true; we do that in SimpleMessenger when one Pipe replaces another.

No particular objections (I didn't look at the details either); I just get nervous when I see people disabling lockdep. ;)

@yuyuyu101
Copy link
Member Author

@gregsfortytwo yes, previously I consider to force no lockdep for lock. Since we have a lot of changes switches to std::mutex, so I think we could use std::mutex here too

@tchaikov
Copy link
Contributor

Issue 16715 is a false alarm for lock deps, now we remove this check.

why it is a false alarm?

@yuyuyu101
Copy link
Member Author

@tchaikov lockdep uses string to record lock trace, if two async connection in the same thead and accquired, it will be reported as lock dep. Actually it's not.

@gregsfortytwo
Copy link
Member

@yuyuyu101, it's not just two Connections being acquired in the same thread; it's if Connection lock is held while acquiring a lock foo, and then later it sees lock foos is held while acquiring a Connection lock. This can occur if you have two Connections, but it's usually not safe. In the SimpleMessenger the only case where this is safe is when replacing an existing Pipe with a new one on reconnect, and we do a very careful dance to make sure it's okay. If it's happening during some other case, it's probably not okay. Can you share the details of this one?

@yuyuyu101
Copy link
Member Author

@gregsfortytwo yes, this lock dep is the same with SimpleMessenger's replace tag.

1: (Mutex::_will_lock()+0x3b) [0x201adc7]
2: (Mutex::Lock(bool)+0x41) [0x201ab5d]
3: (Mutex::Locker::Locker(Mutex&)+0x2f) [0x1856bf7]
4: (AsyncConnection::_stop()+0x1c3) [0x22e5d1d]
5: (AsyncConnection::handle_connect_msg(ceph_msg_connect&, ceph::buffer::list&, ceph::buffer::list&)+0x2d7e) [0x22e0fc0]
6: (AsyncConnection::_process_connection()+0x54f5) [0x22dba31]
7: (AsyncConnection::process()+0x52af) [0x22d6093]

Like this line(https://github.com/ceph/ceph/blob/master/src/msg/async/AsyncConnection.cc#L1808), in order to avoid lockdep false alarm, we pass "false" to disable lockdep.

The false alarm introduced by this commit(e66a48f) because we make _stop() call within existing->write_lock.Lock(). So the other fix is we add more lock.Lock(false) to other codes which may expected too.

@yuyuyu101
Copy link
Member Author

Another reason why I prefer to discard Mutex instead of fix lock dep issue is that AsyncMessenger uses simple and clear lock rule, it doesn't really rely on Mutex detection...

@yuyuyu101
Copy link
Member Author

@yuriw hope this one also can be added to testing

@tchaikov
Copy link
Contributor

The false alarm introduced by this commit(e66a48f) because we make _stop() call within existing->write_lock.Lock(). So the other fix is we add more lock.Lock(false) to other codes which may expected too.

i don't follow your explanation, as existing->write_lock should be different from the this->write_lock at the moment of assert failure.

@tchaikov
Copy link
Contributor

lgtm.

@yuriw
Copy link
Contributor

yuriw commented Jul 26, 2016

@yuyuyu101 pls rebase and add needs-qa tag back (its conflicting with newly merged PRs)

@yuyuyu101
Copy link
Member Author

@yuriw rebased

@yuriw
Copy link
Contributor

yuriw commented Jul 27, 2016

@yuyuyu101 thx, retrying

@yuriw
Copy link
Contributor

yuriw commented Jul 27, 2016

I pushed the build, but make check still fails http://paste2.org/tpdA3jyv, so not sure if it good for testing
@yuyuyu101 @jdurgin ^ ?

Maybe there is something else ?

The 16714 issue is caused by when replacing process. Accept connection will
try to acquire another connection's lock. But all connection's lock name
are the same. So it will result in lockdep map make wrong judgement.

Fixes: http://tracker.ceph.com/issues/16714
Signed-off-by: Haomai Wang <haomai@xsky.com>
@yuyuyu101
Copy link
Member Author

@yuriw done, thanks!

@yuriw
Copy link
Contributor

yuriw commented Jul 27, 2016

@yuyuyu101 looks better, thx, will be running tests now

@yuriw yuriw merged commit 8548a1e into ceph:master Aug 1, 2016
@yuyuyu101 yuyuyu101 deleted the wip-16715 branch August 2, 2016 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants