Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf counters cache + Labeled RGW Op counters #53003

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

alimaredia
Copy link
Contributor

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

This commit contains the following features:
- a perf counters cache per CephContext that acts
as a wrapper around perf counters for storing and
modifying labeled perf counters per CephContext
- instrumentation of the rgw with labeled perf
counters for the major rgw ops

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
src/common/perf_counters.h Outdated Show resolved Hide resolved
src/common/perf_counters_cache.h Outdated Show resolved Hide resolved
src/common/perf_counters_cache.h Outdated Show resolved Hide resolved
src/common/perf_counters_cache.h Outdated Show resolved Hide resolved
src/rgw/rgw_op.cc Outdated Show resolved Hide resolved
src/common/options/rgw.yaml.in Outdated Show resolved Hide resolved
src/rgw/rgw_op.cc Outdated Show resolved Hide resolved
src/rgw/rgw_perf_counters.h Outdated Show resolved Hide resolved
- combined frontend counters together
- use const string& to cache calls
- remove eviction
- SetupCounters key is now string_view

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
},
"bar": {
"type": 1,
"metric_type": "gauge",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we're getting rid of this counter from schema as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain in detail why we need the schemas for counters that aren't being dumped? Could you build and run this branch and show me what happens?

Are the schemas being stored in Prometheus too? Making the change to add the send the schemas but not the counters would be a pretty easy change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryomov Do you have any thoughts on filtering out values from a labeled counter that haven't been initialized yet from the counter dump command? I'm doing this ensure prometheus isn't filled with unitialized data in labeled counter.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
…rusive_lru

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
add_double_or_int_metric(builder, sum_value, name + "_sum", description + " total",
metric_type, labels);
json_value avg_value = perf_values.as_object()["avgtime"];
add_double_or_int_metric(builder, avg_value, name + "_avg", description + " average",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley @idryomov This change reflects that were weren't sending the avgtime metric to prometheus from our time based counters that were being dumped via counter dump.

@avanthakkar's rationale for not including these was that the average could be computed in grafana/prometheus from the total time, and the total number of entries also dump in a time based counters.

As tech leads do you have any strong opinion on including or excluding these from what the exporter is sending to Prometheus? I'm inclined to remove it just to send out less data, but I don't feel strongly either way.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants