Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nautilus: osd/OSD: Log slow ops/types to cluster logs #33503

Merged
merged 5 commits into from Mar 17, 2020

Conversation

In addition to logging slow ops in mon and osd specific log files,
re-introduce logging the same information along with slow op type
details to cluster logs as well. The objective is to make debugging
slow ops easier.

Modify the log whitelisting string to "slow request" within qa suites in
order to make the search for the new warning log message within the
cluster log successful. This should not cause any issue as it's a
substring of the earlier string.

Fixes: https://tracker.ceph.com/issues/43975
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit d20f570)
@sseshasa sseshasa added this to the nautilus milestone Feb 24, 2020
@sseshasa sseshasa added the core label Feb 24, 2020
@sseshasa
Copy link
Contributor Author

sseshasa commented Feb 24, 2020

NOTE: #33497 would need to be backported as well which I guess should be merged separately with its own backport tracker.

@smithfarm
Copy link
Contributor

NOTE: #33497 would need to be backported as well which I guess should be merged separately with its own backport tracker.

@sseshasa Actually, I much prefer to cherry-pick any follow-on fixes directly into the main backport PR as we have done so to great advantage in the past. So please feel free to cherry-pick it right into this PR.

@smithfarm
Copy link
Contributor

smithfarm commented Feb 25, 2020

(Following the thought a little further) If the follow-on fix gets its own tracker, there is a danger that:

  • the original fix might get backported, merged and released without the follow-on fix
  • even worse, the follow-on fix might get backported and merged first, and possibly also released, without the patch it was intended to fix

@sseshasa
Copy link
Contributor Author

NOTE: #33497 would need to be backported as well which I guess should be merged separately with its own backport tracker.

@sseshasa Actually, I much prefer to cherry-pick any follow-on fixes directly into the main backport PR as we have done so to great advantage in the past. So please feel free to cherry-pick it right into this PR.

@smithfarm #33497 is already merged and I don't see any tracker for it. I will therefore cherry-pick the commits from that PR.

@smithfarm
Copy link
Contributor

make check failure is

The following tests FAILED:
	  7 - run-tox-mgr-ansible (Failed)
	  8 - run-tox-mgr-orchestrator_cli (Failed)

It is most likely unrelated to this PR, since we see it happening in other nautilus PRs as well.

Copy link
Contributor

@smithfarm smithfarm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed to cherry-pick commits from #33497

liewegas and others added 3 commits February 26, 2020 15:16
The mons may have slow ops.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 07badf0)
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f4156ae)
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit e527067)

 Conflicts:
	qa/suites/rados/singleton-nomsgr/all/osd_stale_reads.yaml
            This file doesn't exist in nautilus.
@sseshasa
Copy link
Contributor Author

agreed to cherry-pick commits from #33497

All related commits are now cherry-picked from master.

@smithfarm
Copy link
Contributor

@yuriw @neha-ojha Maybe including this in 14.2.8 QE will make the testing smoother?

@neha-ojha
Copy link
Member

@yuriw @neha-ojha Maybe including this in 14.2.8 QE will make the testing smoother?

@smithfarm I don't think this PR has any dependency on 14.2.8, and can wait for the next release.

@yuriw yuriw added nautilus-batch-1 nautilus point releases wip-yuri-testing labels Mar 9, 2020
@yuriw
Copy link
Contributor

yuriw commented Mar 9, 2020

@vumrao
Copy link
Contributor

vumrao commented Mar 9, 2020

Jenkins retest this please.

@smithfarm
Copy link
Contributor

I updated the PR body/description to show where the commits came from.

@smithfarm
Copy link
Contributor

@sseshasa What was the reason why you didn't cherry pick the other commit from #33497 ?

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit a4a3a3c)
@sseshasa
Copy link
Contributor Author

@sseshasa What was the reason why you didn't cherry pick the other commit from #33497 ?

@smithfarm Good catch! I am not sure how this got missed. But I have now cherry picked that missed commit. Thanks!

@sseshasa
Copy link
Contributor Author

I updated the PR body/description to show where the commits came from.

@smithfarm I took the liberty to modify the PR description after I included the missed commit. I hope that's okay?

@smithfarm
Copy link
Contributor

Any change that makes the PR description more accurate is okay with me!

@yuriw yuriw merged commit 3004038 into ceph:nautilus Mar 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants