Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: fix slow op warning on mon, improve slow op warnings #21684

Merged
merged 6 commits into from May 3, 2018

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Apr 26, 2018

@batrick
Copy link
Member

batrick commented Apr 26, 2018

@jdurgin issues like 23769 make me nervous about global whitelisting SLOW_OPS. Perhaps we should distinguish slow ops and "stuck" ops somehow.

@jdurgin
Copy link
Member

jdurgin commented Apr 27, 2018

@batrick that's a good idea. a higher threshold for detecting bugs like that makes sense

Otherwise it is very hard to identify which OSD ops are slow when we've
seen a SLOW_OPS health warning in a qa run.

Notably, without this, bugs like http://tracker.ceph.com/issues/23769
are very challenging to track down.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
If we don't note that we don't reply then we don't close out the routed
mon request and the op will appear as slow on the forwarding mon.

Fixes: http://tracker.ceph.com/issues/23769
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas liewegas changed the title osd: log 'slow op' debug messages for individual slow ops mon: fix slow op warning on mon, improve slow op warnings May 2, 2018
@batrick
Copy link
Member

batrick commented May 2, 2018

Great catch! Thanks Sage.

@liewegas
Copy link
Member Author

liewegas commented May 2, 2018

@batrick i think with this we should revert the blanket SLOW_OPS whitelist in teuthology. IMO we should do that explicitly on runs doing thrashing or stressy/heavy workloads or whatever.

@batrick
Copy link
Member

batrick commented May 2, 2018

Agreed

@liewegas
Copy link
Member Author

liewegas commented May 3, 2018

@liewegas liewegas merged commit db5ec08 into ceph:master May 3, 2018
@liewegas liewegas deleted the wip-23769 branch May 3, 2018 13:40
batrick added a commit to batrick/teuthology that referenced this pull request May 3, 2018
Should no longer be necessary after [1].

[1] ceph/ceph#21684

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick
Copy link
Member

batrick commented May 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants