Skip to content

os/bluestore: update perf counter priorities#43405

Merged
tchaikov merged 1 commit intoceph:masterfrom
ljflores:wip-perfcounter-priorities
Oct 14, 2021
Merged

os/bluestore: update perf counter priorities#43405
tchaikov merged 1 commit intoceph:masterfrom
ljflores:wip-perfcounter-priorities

Conversation

@ljflores
Copy link
Copy Markdown
Member

@ljflores ljflores commented Oct 4, 2021

Perf counters can only be collected in Telemetry if their priorities are set to USEFUL or higher. Several BlueStore perf counters that were set to lower priorities may help to diagnose performance problems; as such, they have now been set to USEFUL.

Here is a comprehensive list of updated perf counters and reasoning behind including them in Telemetry:

  • *_lat: Degrading latencies with the same amount of ops processed provide a strong indicator of OSD performance problems.
  • bluestore_onode_hits/misses: These values might indicate that we are thrashing the onode cache.
  • omap_*_lat: These values might show problems with RocksDB, BlueFS prefetching, or insufficient cache sizes.
  • bluestore_pinned_onodes: This counter sometimes indicates that many nodes are locking during scrubbing.
  • bluestore_reads_with_retries: This value is incremented whenever we read BlueStore block data and the read does not match the checksum. Knowing how often this happens can give us probabilities of transient media failure.

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@ljflores ljflores requested a review from aclamk October 4, 2021 06:05
@ifed01 ifed01 self-requested a review October 5, 2021 15:27
@ljflores
Copy link
Copy Markdown
Member Author

ljflores commented Oct 6, 2021

@ifed01 I see you added yourself as a reviewer-- let me know any suggestions you might have! I know Adam has a few that he hasn't put on the PR yet. We did discuss some better "nicks" for the counters, which I have updated.

@ljflores
Copy link
Copy Markdown
Member Author

ljflores commented Oct 8, 2021

jenkins test signed

@ljflores ljflores requested a review from neha-ojha October 8, 2021 16:12
@ljflores
Copy link
Copy Markdown
Member Author

ljflores commented Oct 8, 2021

Hey @neha-ojha, adding you to this PR in case you have a chance to review. No worries if you already have a lot on your plate though. We are simply looking to increase the priority of important BlueStore perf counters (and assign new nicks along the way) so that they can be fetched in Telemetry.

Copy link
Copy Markdown
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks fine to me, left some nits on the nick naming and we can probably squash the last commit into previous ones after addressing review comments

@neha-ojha
Copy link
Copy Markdown
Member

adding @epuertat, in case this affects the perf counters displayed by the dashboard in some way

@ifed01
Copy link
Copy Markdown
Contributor

ifed01 commented Oct 12, 2021

@ljflores FYI: there is my pending PR which introduces some more counters for BlueFS and arranges ones at BlueStore, e..g groups them in the source code, improves description etc. Just to make you aware about my suggestions how to [additionally] improve these things...

@ifed01
Copy link
Copy Markdown
Contributor

ifed01 commented Oct 12, 2021

jenkins test make check

@ljflores
Copy link
Copy Markdown
Member Author

@ljflores FYI: there is my pending PR which introduces some more counters for BlueFS and arranges ones at BlueStore, e..g groups them in the source code, improves description etc. Just to make you aware about my suggestions how to [additionally] improve these things...

Ah this is good to know. Is it this one @ifed01? #41557

@ljflores ljflores requested review from ifed01 and neha-ojha October 12, 2021 20:53
@ifed01
Copy link
Copy Markdown
Contributor

ifed01 commented Oct 13, 2021

@ljflores FYI: there is my pending PR which introduces some more counters for BlueFS and arranges ones at BlueStore, e..g groups them in the source code, improves description etc. Just to make you aware about my suggestions how to [additionally] improve these things...

Ah this is good to know. Is it this one @ifed01? #41557

Yep! Sorry, missed to insert the link...

Copy link
Copy Markdown
Contributor

@ifed01 ifed01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! You might want to squash all the commits prior to merging though...

These perf counters do not show up in telemetry unless they are set to a "useful" priority or higher. Fetching these counters in telemetry may help to diagnose problems with RocksDB / BlueFS prefetching / insufficient cache sizes.

Signed-off-by: Laura Flores <lflores@redhat.com>
@ljflores ljflores force-pushed the wip-perfcounter-priorities branch from c9811fe to 8790f04 Compare October 13, 2021 14:22
Copy link
Copy Markdown
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the perf-counters' labels are not modified, I don't see any issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants