Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: trim null dentries proactively #10606

Merged
merged 1 commit into from Sep 9, 2016
Merged

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Aug 7, 2016

Instead of leaving null dentries (e.g. left
behind from unlinks) in the cache until they
fall out of the LRU, actively push them
to the bottom of the LRU and then consume
all nulls at the bottom in trim() even if
the cache is not oversized yet.

This fixes the case where standby replay daemons
would otherwise accumulate a cache full of
null dentries resulting from unlinks, and it
makes the behaviour of active daemons more
deterministic.

Fixes: http://tracker.ceph.com/issues/16919
Signed-off-by: John Spray john.spray@redhat.com

@jcsp jcsp added the cephfs Ceph File System label Aug 7, 2016
@gregsfortytwo
Copy link
Member

Okay, so this is really only pushing them to the bottom on replay, right? The commit message concerned me (as obviously we sometimes want to keep them around for fast responses on incomplete dirs).

@jcsp
Copy link
Contributor Author

jcsp commented Aug 9, 2016

Tested by ceph/ceph-qa-suite#1111

@jcsp
Copy link
Contributor Author

jcsp commented Aug 9, 2016

Right, the extra touch_dentry_bottom calls are only in replay. There are other places that we do that already, like unlinks and renames, so those will also get those stray dentries thrown away immediately instead of waiting for the cache fill up. The (important) case of null dentries that cache lookup misses should be unaffected though.

@gregsfortytwo
Copy link
Member

LGTM

gregsfortytwo added a commit that referenced this pull request Aug 24, 2016
…o greg-fs-testing

#10606

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
@gregsfortytwo
Copy link
Member

@gregsfortytwo
Copy link
Member

http://pulpito.ceph.com/gregf-2016-08-29_04:22:27-fs-greg-fs-testing-828---basic-mira/ has the same branch minus this PR running that test job.

@gregsfortytwo
Copy link
Member

Yeah, looks like it's busted for some reason. I unzipped the MDS logs and looked a little bit and I'm not quite sure why they're stuck, but it does indeed involve stray inodes and trimming on the export.

You may just have lucked into revealing a bug that was being disguised by timing and isn't any more; not sure.

Instead of leaving null dentries (e.g. left
behind from unlinks) in the cache until they
fall out of the LRU, actively push them
to the bottom of the LRU and then consume
all nulls at the bottom in trim() even if
the cache is not oversized yet.

This fixes the case where standby replay daemons
would otherwise accumulate a cache full of
null dentries resulting from unlinks, and it
makes the behaviour of active daemons more
deterministic.

Fixes: http://tracker.ceph.com/issues/16919
Signed-off-by: John Spray <john.spray@redhat.com>
@jcsp
Copy link
Contributor Author

jcsp commented Sep 8, 2016

The offending test (TestStrays.test_migration_on_shutdown) is passing with latest update to this patch:
http://pulpito.ceph.com/jspray-2016-09-07_23:07:40-fs-wip-jcsp-testing-20160907-testing-basic-mira/405046/

I suspect there was an underlying bug here that was triggered by the extra trimming, so I'm going to create a branch that sets cache size to zero to see if it triggers it.

@jcsp
Copy link
Contributor Author

jcsp commented Sep 8, 2016

@gregsfortytwo quick re-review?

@jcsp
Copy link
Contributor Author

jcsp commented Sep 8, 2016

(the bit that changed was the extra if (lru.lru_get_size() + unexpirable <= (unsigned)max) { check inside the loop. Previously it was always going through the loop at least once so each tick was trimming one item (which should have still been legal but wasn't the intent of this patch)

@gregsfortytwo
Copy link
Member

Reviewed-by:

@gregsfortytwo gregsfortytwo removed their assignment Sep 8, 2016
@jcsp jcsp merged commit 9faf778 into ceph:master Sep 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System needs-qa
Projects
None yet
2 participants