Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: ensure fragmentation happens promptly #10607

Closed
wants to merge 1 commit into from

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Aug 7, 2016

Hit directories during file creation so that we
don't create oversized fragments (previously
would wait until other metadata ops to break them up).

Call MDBalancer::do_fragmenting at the end of
MDSRank dispatch, to avoid waiting for the next
tick to fragment dirs. Previously, we would end up
processing up to 5 seconds of extra creates before
actually doing the split, leading to oversized fragments.

Signed-off-by: John Spray john.spray@redhat.com

Hit directories during file creation so that we
don't create oversized fragments (previously
would wait until other metadata ops to break them up).

Call MDBalancer::do_fragmenting at the end of
MDSRank dispatch, to avoid waiting for the next
tick to fragment dirs.  Previously, we would end up
processing up to 5 seconds of extra creates before
actually doing the split, leading to oversized fragments.

Signed-off-by: John Spray <john.spray@redhat.com>
@jcsp jcsp added the cephfs Ceph File System label Aug 7, 2016
@ukernel
Copy link
Contributor

ukernel commented Aug 8, 2016

I think do fragmentation every 5 seconds is good enough. No need to call MDBalancer::do_fragmenting in MDSRank dispatch

@jcsp
Copy link
Contributor Author

jcsp commented Aug 8, 2016

I don't think we should wait 5 seconds just because we can: at the point we've decided to split a directory we should go ahead and do it immediately. It's much easier to test it this way, because the size of dirfrags is strictly limited and we can assert that the limit is not exceeded

Calling it on every dispatch is a bit gratuitous (although it is quite a cheap check), so I should probably change this to do a MDSRank::queue_waiter when we have something waiting to split.

@gregsfortytwo
Copy link
Member

Well, don't forget that fragmenting a directory requires us to freeze it. I'm not sure we want to enable that on every op since it's not uncommon to see a quick burst of ops on a single inode which we would probably prefer not to interrupt. (The counter-argument is that we're probably better off not freezing all dirs on a predictable cycle, but I doubt that's as big a problem. Need data!)

@jcsp
Copy link
Contributor Author

jcsp commented Aug 11, 2016

We can have the best of both worlds (not doing it immediately but also being a bit more deterministic) by having a limit on how far fragments are allowed to exceed the split size, so that we don't necessarily split immediately, but we also don't have an unbounded 5-seconds-worth of growth past the limit.

@jcsp
Copy link
Contributor Author

jcsp commented Aug 11, 2016

Also worth noting that the current situation is that the user has a random chance of having the split done immediately anyway if they happen to come in right before a tick. If we really want the behaviour of "wait a bit before splitting" then tick() isn't accomplishing that reliably either.

@jcsp
Copy link
Contributor Author

jcsp commented Nov 16, 2016

Closing in favour of #12022

@jcsp jcsp closed this Nov 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System
Projects
None yet
3 participants