mds: ensure fragmentation happens promptly #10607

jcsp · 2016-08-07T19:30:53Z

Hit directories during file creation so that we
don't create oversized fragments (previously
would wait until other metadata ops to break them up).

Call MDBalancer::do_fragmenting at the end of
MDSRank dispatch, to avoid waiting for the next
tick to fragment dirs. Previously, we would end up
processing up to 5 seconds of extra creates before
actually doing the split, leading to oversized fragments.

Signed-off-by: John Spray john.spray@redhat.com

Hit directories during file creation so that we don't create oversized fragments (previously would wait until other metadata ops to break them up). Call MDBalancer::do_fragmenting at the end of MDSRank dispatch, to avoid waiting for the next tick to fragment dirs. Previously, we would end up processing up to 5 seconds of extra creates before actually doing the split, leading to oversized fragments. Signed-off-by: John Spray <john.spray@redhat.com>

ukernel · 2016-08-08T02:27:59Z

I think do fragmentation every 5 seconds is good enough. No need to call MDBalancer::do_fragmenting in MDSRank dispatch

jcsp · 2016-08-08T13:27:15Z

I don't think we should wait 5 seconds just because we can: at the point we've decided to split a directory we should go ahead and do it immediately. It's much easier to test it this way, because the size of dirfrags is strictly limited and we can assert that the limit is not exceeded

Calling it on every dispatch is a bit gratuitous (although it is quite a cheap check), so I should probably change this to do a MDSRank::queue_waiter when we have something waiting to split.

gregsfortytwo · 2016-08-09T14:06:20Z

Well, don't forget that fragmenting a directory requires us to freeze it. I'm not sure we want to enable that on every op since it's not uncommon to see a quick burst of ops on a single inode which we would probably prefer not to interrupt. (The counter-argument is that we're probably better off not freezing all dirs on a predictable cycle, but I doubt that's as big a problem. Need data!)

jcsp · 2016-08-11T10:31:49Z

We can have the best of both worlds (not doing it immediately but also being a bit more deterministic) by having a limit on how far fragments are allowed to exceed the split size, so that we don't necessarily split immediately, but we also don't have an unbounded 5-seconds-worth of growth past the limit.

jcsp · 2016-08-11T10:33:30Z

Also worth noting that the current situation is that the user has a random chance of having the split done immediately anyway if they happen to come in right before a tick. If we really want the behaviour of "wait a bit before splitting" then tick() isn't accomplishing that reliably either.

jcsp · 2016-11-16T14:20:16Z

Closing in favour of #12022

jcsp added the cephfs Ceph File System label Aug 7, 2016

jcsp closed this Nov 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mds: ensure fragmentation happens promptly #10607

mds: ensure fragmentation happens promptly #10607

jcsp commented Aug 7, 2016

ukernel commented Aug 8, 2016

jcsp commented Aug 8, 2016

gregsfortytwo commented Aug 9, 2016

jcsp commented Aug 11, 2016

jcsp commented Aug 11, 2016

jcsp commented Nov 16, 2016

mds: ensure fragmentation happens promptly #10607

mds: ensure fragmentation happens promptly #10607

Conversation

jcsp commented Aug 7, 2016

ukernel commented Aug 8, 2016

jcsp commented Aug 8, 2016

gregsfortytwo commented Aug 9, 2016

jcsp commented Aug 11, 2016

jcsp commented Aug 11, 2016

jcsp commented Nov 16, 2016