New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: fix slow op warning on mon, improve slow op warnings #21684
Conversation
@jdurgin issues like 23769 make me nervous about global whitelisting SLOW_OPS. Perhaps we should distinguish slow ops and "stuck" ops somehow. |
@batrick that's a good idea. a higher threshold for detecting bugs like that makes sense |
Otherwise it is very hard to identify which OSD ops are slow when we've seen a SLOW_OPS health warning in a qa run. Notably, without this, bugs like http://tracker.ceph.com/issues/23769 are very challenging to track down. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
If we don't note that we don't reply then we don't close out the routed mon request and the op will appear as slow on the forwarding mon. Fixes: http://tracker.ceph.com/issues/23769 Signed-off-by: Sage Weil <sage@redhat.com>
Great catch! Thanks Sage. |
@batrick i think with this we should revert the blanket SLOW_OPS whitelist in teuthology. IMO we should do that explicitly on runs doing thrashing or stressy/heavy workloads or whatever. |
Agreed |
|
Should no longer be necessary after [1]. [1] ceph/ceph#21684 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Fixes http://tracker.ceph.com/issues/23769