Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] jewel: client: dual client segfault with racing ceph_shutdown #21153

Closed
wants to merge 5 commits into from

Conversation

smithfarm
Copy link
Contributor

We can do it under the same mutex, which should be more efficient.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit 01863bb)
If the cct is unregistered while other threads are flogging mutexes,
then we can hit all sorts of bugs. Ensure that we handle that
situation sanely, by checking that g_lockdep is still set after
we take the lockdep_mutex.

Also, remove an assertion from lockdep_unregister, and just turn it into
an immediate return. It's possible to have a call to
lockdep_unregister_ceph_context, and then a call to
lockdep_register_ceph_context while a mutex is being held by another
task.

In that case, it's possible the lock does not exist in the map
when we go to unregister it. That's not a bug though, just a natural
consequence of that series of actions.

Tracker: http://tracker.ceph.com/issues/20988
Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit 75f41a9)
The build says:

src/client/Client.cc: In member function ‘void Client::trim_caps(MetaSession*, int)’:
src/client/Client.cc:4121:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   if (s->caps.size() > max)
       ~~~~~~~~~~~~~~~^~~~~

Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit e057b67)
Spawn threads that bring up a bunch of ceph_mounts with individual
CephContext objects, and then tear them down in parallel.

Tracker: http://tracker.ceph.com/issues/20988
Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit 8252f31)
...and make g_lockdep a bool.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit 0cd0bd7)

Conflicts:
	src/test/common/test_mutex.cc (file does not exist in jewel)
@smithfarm smithfarm self-assigned this Mar 31, 2018
@smithfarm smithfarm added this to the jewel milestone Mar 31, 2018
@smithfarm smithfarm added cephfs Ceph File System and removed core labels Mar 31, 2018
@smithfarm
Copy link
Contributor Author

Does this need rados suite as well as fs?

@jcsp
Copy link
Contributor

jcsp commented Apr 2, 2018

@smithfarm probably just need the rados/verify bit rather than the whole rados suite

@smithfarm smithfarm changed the title jewel: client: dual client segfault with racing ceph_shutdown [DNM] jewel: client: dual client segfault with racing ceph_shutdown Apr 4, 2018
@smithfarm
Copy link
Contributor Author

This PR adds ShutdownRace test case to ceph_test_libcephfs, but it appears to segfault every time: http://tracker.ceph.com/issues/23556

Removing this PR from the integration branch and retesting.

@smithfarm
Copy link
Contributor Author

This PR passed a rados suite at http://tracker.ceph.com/issues/21742#note-21

@batrick
Copy link
Member

batrick commented Apr 4, 2018

@batrick batrick closed this Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix cephfs Ceph File System DNM
Projects
None yet
4 participants