New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pacific: os/bluestore: update perf counter priorities #47095
Conversation
|
@ljflores we need to backport this PR to expose However, I counted that this backport (and the original PR) exposed an extra 34 perf counters/OSD to Prometheus, which for the 8k OSD stretch goal means +272,000 new samples/scrape. Given Pacific is a stable release, I wouldn't want to add that much extra load, so what about adapting this backport and only exposing the 2 required perf-counters? |
@epuertat That approach sounds okay to me. @aaSharma14 we should be sure to write an explanation for this in the commit message. |
8ae578e
to
a48a4c2
Compare
Thank you @ljflores , I have added a note to the commit message for the change. |
|
jenkins test api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT. Thanks @aaSharma14 !
|
jenkins test api |
|
API tests are failing because of this pesky issue with the @yuriw I don't think we have any QA test that covers this code, so it won't make any difference. |
|
jenkins test api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial backport of 8790f04, but nothing in the commit message explains that. Can we please add an explanation in the commit message to explain why we only care about these two perf counters in pacific?
Thanks @neha-ojha , I have put a note in the commit message for the change. |
|
@neha-ojha @ljflores can we have this merged plz? It has no impact (just 2 new bluestore perfcounters exported/OSD) and it's required for unblocking a downstream release. Thanks! |
@aaSharma14 I meant adding it to the git commit message, something you can see using git log |
@neha-ojha it's in the commit (probably a blank line in between would have helped too): |
yeah, something to differentiate between the original commit message and this addition will be helpful, maybe add it after the "cherry picked from commit" line like we call out conflicts? @ljflores how is testing looking on this PR?
|
The Trello card for it is here: https://trello.com/c/62n7lFew/1583-wip-yuri2-testing-2022-07-15-0755-pacific. Yuri has not pasted a link to the test runs yet, but I have taken it on to review the card as soon as it's ready. @neha-ojha I don't think this would break anything given that it is only changing perf counter priorities, but would you still recommend waiting for the tests to run? |
|
@epuertat Yuri has already triggered rados suite tests, so they should be done running in several hours. I will prioritize reviewing it so the downstream release isn't blocked for too much longer. I'm pretty sure this wouldn't have an impact on anything as you said, but I'd feel better confirming it so we're not introducing any breakages in pacific. |
@ljflores ok, let's wait till then. Thanks a lot! |
These perf counters do not show up in telemetry unless they are set to a "useful" priority or higher. Fetching these counters in telemetry may help to diagnose problems with RocksDB / BlueFS prefetching / insufficient cache sizes. Signed-off-by: Laura Flores <lflores@redhat.com> (cherry picked from commit 8790f04) Note: This backport (and the original PR) exposed an extra 34 perf counters/OSD to Prometheus. Given Pacific is a stable release and not to add that much extra load we are adapting this backport and only exposing the 2 required perf-coun>
a48a4c2
to
d4b2b1a
Compare
|
jenkins test dashboard cephadm |
|
@epuertat we are good to go! Rados suite results: https://pulpito.ceph.com/?branch=wip-yuri2-testing-2022-07-15-0755-pacific Failures, unrelated: Details: |
|
Thanks @ljflores. I'll proceed to merge this. |
partial backport tracker: https://tracker.ceph.com/issues/56559
partial backport of #43405
parent tracker: https://tracker.ceph.com/issues/56558
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh