jewel: fs: client: add missing client_lock for get_root #10921

Merged
merged 1 commit into from Oct 11, 2016

Projects

None yet

6 participants

@jcsp
Contributor
jcsp commented Aug 30, 2016 edited

Backport tracker issue: http://tracker.ceph.com/issues/17206

A segmentation fault was observed in a recent test failure
(test_client_cache_size) [1]. An analysis of the core dump revealed that the
client->root inode was NULL. The client log [2] showed that the root inode was
deleted prior to the segfault:

2016-06-27 14:24:13.358500 7fe75cff9700 20 client.4161 _ll_put 0x7fe76000ba40 100000003e9 1 -> 0
2016-06-27 14:24:13.358502 7fe75cff9700 10 client.4161 put_inode on 100000003e9.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=pAsLsXsFs(0=pAsLsXsFs) 0x7fe76000ba40)
2016-06-27 14:24:13.358513 7fe75cff9700 15 inode.put on 0x7fe76000ba40 100000003e9.head now 0
2016-06-27 14:24:13.358515 7fe75cff9700 10 client.4161 remove_cap mds.0 on 100000003e9.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=pAsLsXsFs(0=pAsLsXsFs) 0x7fe76000ba40)
2016-06-27 14:24:13.358538 7fe75cff9700 15 client.4161 remove_cap last one, closing snaprealm 0x7fe76000c1b0
2016-06-27 14:24:13.358544 7fe75cff9700 20 client.4161 put_snap_realm 1 0x7fe76000c1b0 1002 -> 1001
2016-06-27 14:24:13.358546 7fe75cff9700 10 client.4161 put_inode deleting 100000003e9.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=- 0x7fe76000ba40)

After looking through the reference count inc/dec, I noticed this mismatched inc/dec:

2016-06-27 14:24:13.352622 7fe75dffb700  3 client.4161 ll_forget 100000003e9 1
2016-06-27 14:24:13.352601 7fe7767fc700  1 -- 172.21.15.58:0/3762258427 >> 172.21.15.57:6804/11121 conn(0x7fe789bf3b60 sd=20 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7 cs=1 l=0). == rx == mds.0 seq 1473 0x7fe764037a80 client_reply(???:470 = 0 (0) Success safe) v1
2016-06-27 14:24:13.352665 7fe75dffb700 20 client.4161 _ll_put 0x7fe76000ba40 100000003e9 1 -> 1
2016-06-27 14:24:13.352687 7fe75d7fa700  3 client.4161 ll_lookup 0x7fe76000ba40 testdir
2016-06-27 14:24:13.352699 7fe75d7fa700 20 client.4161 _lookup have dn testdir mds.-1 ttl 0.000000 seq 0
2016-06-27 14:24:13.352705 7fe75d7fa700 15 inode.get on 0x7fe760011da0 100000003ea.head now 537
2016-06-27 14:24:13.352693 7fe76effd700  1 -- 172.21.15.58:0/3762258427 <== mds.0 172.21.15.57:6804/11121 1473 ==== client_reply(???:470 = 0 (0) Success safe) v1 ==== 27+0+0 (3458149698 0 0) 0x7fe764037a80 con 0x7fe789bf3b60
2016-06-27 14:24:13.352708 7fe75d7fa700 10 client.4161 _lookup 100000003e9.head(faked_ino=0 ref=3 ll_ref=1 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=pAsLsXsFs(0=pAsLsXsFs) 0x7fe76000ba40) testdir = 100000003ea.head(faked_ino=0 ref=537 ll_ref=999 cap_refs={} open={} mode=40755 size=0/0 mtime=2016-06-27 14:24:13.325297 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x7fe750001f50 0x7fe760011da0)

This sequence of ll_forget/ll_lookup is repeated in this test several hundred times. This is the prior sequence (in one thread):

2016-06-27 14:24:13.324896 7fe75dffb700  3 client.4161 ll_forget 100000003e9 1
2016-06-27 14:24:13.324904 7fe75dffb700 20 client.4161 _ll_put 0x7fe76000ba40 100000003e9 1 -> 1
2016-06-27 14:24:13.324915 7fe75dffb700  3 client.4161 ll_lookup 0x7fe76000ba40 testdir
2016-06-27 14:24:13.324921 7fe75dffb700 20 client.4161 _lookup have dn testdir mds.-1 ttl 0.000000 seq 0
2016-06-27 14:24:13.324931 7fe75dffb700 15 inode.get on 0x7fe760011da0 100000003ea.head now 621
2016-06-27 14:24:13.324904 7fe7767fc700  1 -- 172.21.15.58:0/3762258427 >> 172.21.15.57:6804/11121 conn(0x7fe789bf3b60 sd=20 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7 cs=1 l=0). == rx == mds.0 seq 1387 0x7fe764026720 client_reply(???:385 = 0 (0) Success safe) v1
2016-06-27 14:24:13.324934 7fe75dffb700 10 client.4161 _lookup 100000003e9.head(faked_ino=0 ref=3 ll_ref=2 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=pAsLsXsFs(0=pAsLsXsFs) 0x7fe76000ba40) testdir = 100000003ea.head(faked_ino=0 ref=621 ll_ref=998 cap_refs={} open={} mode=40755 size=0/0 mtime=2016-06-27 14:24:13.321021 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x7fe750001f50 0x7fe760011da0)

The key line is _lookup 100000003e9.head(...). ll_ref=2 is the expected count
after this sequence but, as we see at 14:24:13.352708, ll_ref=1.

This racing behavior is not serialized by client_lock because
Client::get_root() does not lock the client_lock. Additionally, this race is
not easy to identify in the logs because Client::get_root() does not use
Client::_ll_get to increment the reference, which hides the increments in the
logs. Instead it directly increments the reference using Inode:ll_get().

This commit adds the client_lock for Client::get_root() and
Client::get_root_ino() (which may not strictly be necessary but can't hurt).

[1] http://pulpito.ceph.com/pdonnell-2016-06-27_06:54:23-fs-wip-pdonnell-i16164---basic-smithi/280361/
[2] http://qa-proxy.ceph.com/teuthology/pdonnell-2016-06-27_06:54:23-fs-wip-pdonnell-i16164---basic-smithi/280361/remote/smithi058/log/ceph-client.0.23448.log.gz

Signed-off-by: Patrick Donnelly pdonnell@redhat.com
(cherry picked from commit 208e3b6)

@batrick @jcsp batrick client: add missing client_lock for get_root
A segmentation fault was observed in a recent test failure
(test_client_cache_size) [1]. An analysis of the core dump revealed that the
client->root inode was NULL. The client log [2] showed that the root inode was
deleted prior to the segfault:

    2016-06-27 14:24:13.358500 7fe75cff9700 20 client.4161 _ll_put 0x7fe76000ba40 100000003e9 1 -> 0
    2016-06-27 14:24:13.358502 7fe75cff9700 10 client.4161 put_inode on 100000003e9.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=pAsLsXsFs(0=pAsLsXsFs) 0x7fe76000ba40)
    2016-06-27 14:24:13.358513 7fe75cff9700 15 inode.put on 0x7fe76000ba40 100000003e9.head now 0
    2016-06-27 14:24:13.358515 7fe75cff9700 10 client.4161 remove_cap mds.0 on 100000003e9.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=pAsLsXsFs(0=pAsLsXsFs) 0x7fe76000ba40)
    2016-06-27 14:24:13.358538 7fe75cff9700 15 client.4161 remove_cap last one, closing snaprealm 0x7fe76000c1b0
    2016-06-27 14:24:13.358544 7fe75cff9700 20 client.4161 put_snap_realm 1 0x7fe76000c1b0 1002 -> 1001
    2016-06-27 14:24:13.358546 7fe75cff9700 10 client.4161 put_inode deleting 100000003e9.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=- 0x7fe76000ba40)

After looking through the reference count inc/dec, I noticed this mismatched inc/dec:

    2016-06-27 14:24:13.352622 7fe75dffb700  3 client.4161 ll_forget 100000003e9 1
    2016-06-27 14:24:13.352601 7fe7767fc700  1 -- 172.21.15.58:0/3762258427 >> 172.21.15.57:6804/11121 conn(0x7fe789bf3b60 sd=20 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7 cs=1 l=0). == rx == mds.0 seq 1473 0x7fe764037a80 client_reply(???:470 = 0 (0) Success safe) v1
    2016-06-27 14:24:13.352665 7fe75dffb700 20 client.4161 _ll_put 0x7fe76000ba40 100000003e9 1 -> 1
    2016-06-27 14:24:13.352687 7fe75d7fa700  3 client.4161 ll_lookup 0x7fe76000ba40 testdir
    2016-06-27 14:24:13.352699 7fe75d7fa700 20 client.4161 _lookup have dn testdir mds.-1 ttl 0.000000 seq 0
    2016-06-27 14:24:13.352705 7fe75d7fa700 15 inode.get on 0x7fe760011da0 100000003ea.head now 537
    2016-06-27 14:24:13.352693 7fe76effd700  1 -- 172.21.15.58:0/3762258427 <== mds.0 172.21.15.57:6804/11121 1473 ==== client_reply(???:470 = 0 (0) Success safe) v1 ==== 27+0+0 (3458149698 0 0) 0x7fe764037a80 con 0x7fe789bf3b60
    2016-06-27 14:24:13.352708 7fe75d7fa700 10 client.4161 _lookup 100000003e9.head(faked_ino=0 ref=3 ll_ref=1 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=pAsLsXsFs(0=pAsLsXsFs) 0x7fe76000ba40) testdir = 100000003ea.head(faked_ino=0 ref=537 ll_ref=999 cap_refs={} open={} mode=40755 size=0/0 mtime=2016-06-27 14:24:13.325297 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x7fe750001f50 0x7fe760011da0)

This sequence of ll_forget/ll_lookup is repeated in this test several hundred times. This is the prior sequence (in one thread):

    2016-06-27 14:24:13.324896 7fe75dffb700  3 client.4161 ll_forget 100000003e9 1
    2016-06-27 14:24:13.324904 7fe75dffb700 20 client.4161 _ll_put 0x7fe76000ba40 100000003e9 1 -> 1
    2016-06-27 14:24:13.324915 7fe75dffb700  3 client.4161 ll_lookup 0x7fe76000ba40 testdir
    2016-06-27 14:24:13.324921 7fe75dffb700 20 client.4161 _lookup have dn testdir mds.-1 ttl 0.000000 seq 0
    2016-06-27 14:24:13.324931 7fe75dffb700 15 inode.get on 0x7fe760011da0 100000003ea.head now 621
    2016-06-27 14:24:13.324904 7fe7767fc700  1 -- 172.21.15.58:0/3762258427 >> 172.21.15.57:6804/11121 conn(0x7fe789bf3b60 sd=20 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7 cs=1 l=0). == rx == mds.0 seq 1387 0x7fe764026720 client_reply(???:385 = 0 (0) Success safe) v1
    2016-06-27 14:24:13.324934 7fe75dffb700 10 client.4161 _lookup 100000003e9.head(faked_ino=0 ref=3 ll_ref=2 cap_refs={} open={} mode=41777 size=0/0 mtime=2016-06-27 14:24:10.700562 caps=pAsLsXsFs(0=pAsLsXsFs) 0x7fe76000ba40) testdir = 100000003ea.head(faked_ino=0 ref=621 ll_ref=998 cap_refs={} open={} mode=40755 size=0/0 mtime=2016-06-27 14:24:13.321021 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x7fe750001f50 0x7fe760011da0)

The key line is `_lookup 100000003e9.head(...)`. ll_ref=2 is the expected count
after this sequence but, as we see at 14:24:13.352708, ll_ref=1.

This racing behavior is not serialized by client_lock because
Client::get_root() does not lock the client_lock. Additionally, this race is
not easy to identify in the logs because Client::get_root() does not use
Client::_ll_get to increment the reference, which hides the increments in the
logs. Instead it directly increments the reference using Inode:ll_get().

This commit adds the client_lock for Client::get_root() and
Client::get_root_ino() (which may not strictly be necessary but can't hurt).

[1] http://pulpito.ceph.com/pdonnell-2016-06-27_06:54:23-fs-wip-pdonnell-i16164---basic-smithi/280361/
[2] http://qa-proxy.ceph.com/teuthology/pdonnell-2016-06-27_06:54:23-fs-wip-pdonnell-i16164---basic-smithi/280361/remote/smithi058/log/ceph-client.0.23448.log.gz

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 208e3b6)
1bc047b
@jcsp jcsp added the cephfs label Aug 30, 2016
@wido
Member
wido commented Aug 30, 2016

Since this is a fairly small fix, don't we want this one to go into 10.2.3 as well? After seeing all the FUSE related issues on the ML.

@gregsfortytwo gregsfortytwo added this to the jewel milestone Aug 31, 2016
@gregsfortytwo
Member

I think we missed the window, right @dachary, @yuriw?

But yes, I think it should go in if possible.

@yuriw
Contributor
yuriw commented Aug 31, 2016

I am ~50% in QE validation now, but if it has to be added ... up to @dachary

@dachary dachary changed the title from Backport: client: add missing client_lock for get_root to jewel client: add missing client_lock for get_root Aug 31, 2016
@dachary dachary self-assigned this Aug 31, 2016
@dachary dachary added the bug fix label Aug 31, 2016
@dachary
Member
dachary commented Aug 31, 2016

I think that should be discussed between the leads. Is it important enough to restart the backport validation and delay the release ?

@jcsp
Contributor
jcsp commented Sep 1, 2016

@dachary I possibly don't know enough about the process; is it possible to just re-run the fs tests on this branch?

@dachary
Member
dachary commented Sep 1, 2016

@jcsp yes, it is possible. The process is mostly common sense, nothing carved in stone ;-)

@jcsp
Contributor
jcsp commented Sep 1, 2016

It would make me happy if we could include this in 10.2.3, since it's userspace and doesn't touch anything outside cephfs it should only be the fs suite that needs re-running. It's been on master without issues for almost 2 months, so the risk is low. It's my fault that this backport was opened so late though, so apologies for that!

@dachary
Member
dachary commented Sep 1, 2016

@jcsp is there a tracker issue for that ?

@jcsp
Contributor
jcsp commented Sep 2, 2016

@dachary there wasn't at the time (hence us missing this) but I've created one for reference here http://tracker.ceph.com/issues/17197

@wido
Member
wido commented Sep 2, 2016

Not to push, but just as information. I had two customers approach me since they were experiencing crashes with libcephfs (FUSE and Ganesha).

I recompiled packages for them with this fix in there and the problems were gone.

For them this patch fixed the problems they were seeing with 10.2.2

@dachary dachary merged commit db2e822 into ceph:jewel Oct 11, 2016

2 checks passed

Signed-off-by all commits in this PR are signed
Details
default Build finished.
Details
@theanalyst theanalyst changed the title from jewel client: add missing client_lock for get_root to "jewel client: add missing client_lock for get_root" Nov 17, 2016
@theanalyst theanalyst changed the title from "jewel client: add missing client_lock for get_root" to jewel client: add missing client_lock for get_root Nov 17, 2016
@theanalyst theanalyst changed the title from jewel client: add missing client_lock for get_root to jewel: fs: client: add missing client_lock for get_root Nov 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment