New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/filestore/HashIndex: be loud about splits #12421

Merged
merged 1 commit into from Mar 3, 2017

Conversation

Projects
None yet
6 participants
@cernceph

cernceph commented Dec 9, 2016

This ML thread "[ceph-users] filestore_split_multiple hardcoded maximum" has coincided with ~1 week of slow requests on one of our clusters. After much debugging we finally tracked it down to filestore splitting.

It would have been much easier to understand the root cause if filestore splitting was verbose. Splitting is a rare but rather important event -- the OSD should be super verbose when it happens. (Otherwise, it simply looks like a filestore write is taking many 10s of seconds, which seems absurd).

So I'm sending this patch for comments. I'm not sure if the dout is needed in the _create function, and I didn't make the merges verbose yet (which I guess is also needed?)

If this makes it to master, I'd appreciate a backport to jewel. (filestore will lose relevance anyway in L and beyond).

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Dec 9, 2016

@cernceph Could you open a tracker issue for this? You can flag the issue for jewel backport while doing that, just fill in Backport: jewel.

If you open the tracker, please also add a line Fixes: http://tracker.ceph.com/issues/... right above the Signed-off-to line.

@liewegas liewegas added the core label Dec 12, 2016

@liewegas

This comment has been minimized.

Member

liewegas commented Dec 12, 2016

I would use dout(1) instead of dout(0) (which will also go to the log by default). Otherwise, LGTM!

@liewegas

pls change dout(0) to dout(1)

os/filestore/HashIndex: be loud about splits
Filestore splits are a rare yet important enough event that an
OSD should visibly report when they happen.

Without this reporting an operator could spend hours trying to
understand the cause of any split-induced slow requests.

Fixes: http://tracker.ceph.com/issues/18235
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Feb 28, 2017

@liewegas @cernceph has addressed your comment, so it's ready for needs-qa, i guess?

@liewegas liewegas changed the title from RFC: os/filestore/HashIndex: be loud about splits to os/filestore/HashIndex: be loud about splits Feb 28, 2017

@liewegas liewegas added the needs-qa label Feb 28, 2017

@liewegas liewegas merged commit 39e6ad1 into ceph:master Mar 3, 2017

2 of 3 checks passed

default Build triggered. sha1 is merged.
Details
Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment