Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labeled Perf Counters #48657

Merged
merged 2 commits into from
Feb 27, 2023

Conversation

alimaredia
Copy link
Contributor

@alimaredia alimaredia commented Oct 28, 2022

Labeled Perf Counters initial work.

This PR adds labels to perf counters allowing for multi-dimensional perf counters to be stored and dumped in the form

"rgw": {
    "labels": {
        "Bucket: "bkt1",
        "User: "user1",
    },
    "put_b": 1048576,
    "put_initial_lat": {
        "avgcount": 1,
        "sum": 0.013000200,
        "avgtime": 0.013000200
    }
},

Signed-off-by: Ali Maredia amaredia@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@alimaredia alimaredia changed the title Wip rgw labeled perf counters cache [DRAFT] Labeled Perf Counters Oct 28, 2022
src/vstart.sh Outdated Show resolved Hide resolved
@alimaredia alimaredia changed the title [DRAFT] Labeled Perf Counters Labeled Perf Counters Dec 2, 2022
@alimaredia alimaredia requested a review from a team as a code owner December 4, 2022 20:50
@alimaredia alimaredia force-pushed the wip-rgw-labeled-perf-counters-cache branch 2 times, most recently from cc8a42a to 9f1a01c Compare December 6, 2022 21:45
@jmolmo jmolmo requested review from cbodley and removed request for cbodley December 7, 2022 14:54
@cbodley
Copy link
Contributor

cbodley commented Dec 8, 2022

@alimaredia setting aside the cache part for now, can you explain the strategy for converting existing counters to use labels?

for example, rgw's multisite sync counters put the source-zone in the counter name at https://github.com/ceph/ceph/blob/bc01d2ed/src/rgw/rgw_rados.cc#L478:

      counters(sync_counters::build(store->ctx(), std::string("data-sync-from-") + source_zone->name)),

how would we change that to use a "source={zone name}" label?

src/common/perf_counters.cc Outdated Show resolved Hide resolved
src/common/perf_counters.h Outdated Show resolved Hide resolved
src/common/perf_counters_cache_key.cc Outdated Show resolved Hide resolved
src/common/perf_counters_cache_key.cc Outdated Show resolved Hide resolved
src/common/ceph_context.cc Outdated Show resolved Hide resolved
@alimaredia
Copy link
Contributor Author

@cbodley you would just need to change the string argument std::string("data-sync-from-") + source_zone->name to one in the labeled format like what you get in perf_counters_cache_key(). The PerfCounter created here goes into the same PerfCountersCollection the rest of labeled perf counters do.

You would also need to need to pass true into the last argument of the PerfCountersBuilder() constructor that is called in build() to mark the perf counter as labeled. That being said, we talked earlier about having all PerfCounters being labeled, so if that differentiation goes away then that bool would go away.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@alimaredia
Copy link
Contributor Author

jenkins test make check

@alimaredia
Copy link
Contributor Author

jenkins test make check

@alimaredia alimaredia force-pushed the wip-rgw-labeled-perf-counters-cache branch from 72b4e76 to c8920c1 Compare January 6, 2023 04:35
@alimaredia
Copy link
Contributor Author

@ljflores @yuriw: Did this PR get run as part of a job yet?

@ljflores
Copy link
Contributor

@alimaredia not yet, but I expect to get it in soon. All of Yuri's labels are currently taken up (he prefers to have up to 12 batches going at a time). But we just approved a batch so will have space to start a new batch soon

@ljflores
Copy link
Contributor

You'll see a "wip-yuri*-testing" label added to the PR when it's been batched.

@ljflores
Copy link
Contributor

@alimaredia it's going into testing now

@alimaredia
Copy link
Contributor Author

@ljflores awesome let me know how it goes!

@ljflores
Copy link
Contributor

@alimaredia I'll have results for you next week

@ljflores
Copy link
Contributor

ljflores commented Feb 22, 2023

Update on testing: A different PR in the testing batch caused a regression. To verify, I am dropping it from the batch and rerunning a few tests. This PR did not seem at all related, so I am working to unblock it.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

cbodley and others added 2 commits February 23, 2023 12:04
a flat representation of a set of prometheus labels, returned as a
std::string. this string can either be used for sorting an ordered
container of perf counters, or for hashing an unordered container

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Add the ability to dump labeled perf counters
for a daemon. Labeled perf counters are stored
in a CephContext's PerfCountersCollection.

Labeled and unlabeled perf counters are dumped
to the admin socket via `counters dump` command.

The schema for labeled and unlabeled perf
counters are dumped to the admin socket via
`counters schema` command.

This commit includes docs and additional unit tests

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A rebase was needed due to Filestore removal. src/os/filestore/FileStore.h hunk was dropped as the file is gone entirely:

diff --git a/src/os/filestore/FileStore.h b/src/os/filestore/FileStore.h
index 67caeaec0a01e..e8a1714217be5 100644
--- a/src/os/filestore/FileStore.h
+++ b/src/os/filestore/FileStore.h
@@ -509,7 +509,7 @@ class FileStore : public JournalingObjectStore,
 
   void dump_perf_counters(ceph::Formatter *f) override {
     f->open_object_section("perf_counters");
-    logger->dump_formatted(f, false);
+    logger->dump_formatted(f, false, false);
     f->close_section();
   }

@alimaredia
Copy link
Contributor Author

@idryomov thank you for the clarification around my recent rebase

@ljflores
Copy link
Contributor

Rados suite review: https://pulpito.ceph.com/yuriw-2023-02-16_22:44:43-rados-wip-yuri-testing-2023-02-16-0839-distro-default-smithi
https://pulpito.ceph.com/lflores-2023-02-20_21:22:20-rados-wip-yuri-testing-2023-02-16-0839-distro-default-smithi
https://pulpito.ceph.com/yuriw-2023-02-23_16:42:54-rados-wip-yuri-testing-2023-02-22-2037-distro-default-smithi
https://pulpito.ceph.com/lflores-2023-02-23_17:54:36-rados-wip-yuri-testing-2023-02-22-2037-distro-default-smithi

Failures, unrelated:
1. https://tracker.ceph.com/issues/58585
2. https://tracker.ceph.com/issues/58560
3. https://tracker.ceph.com/issues/58496
4. https://tracker.ceph.com/issues/49961
5. https://tracker.ceph.com/issues/58861
6. https://tracker.ceph.com/issues/58797
7. https://tracker.ceph.com/issues/49428

Details:
1. rook: failed to pull kubelet image - Ceph - Orchestrator
2. test_envlibrados_for_rocksdb.sh failed to subscribe to repo - Infrastructure
3. osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty()) - Ceph - RADOS
4. scrub/osd-recovery-scrub.sh: TEST_recovery_scrub_1 failed - Ceph - RADOS
5. OSError: cephadm config file not found - Ceph - Orchestrator
6. scrub/osd-scrub-dump.sh: TEST_recover_unexpected fails from "ERROR: Unexpectedly low amount of scrub reservations seen during test" - Ceph - RADOS
7. ceph_test_rados_api_snapshots fails with "rados_mon_command osd pool create failed with error -22" - Ceph - RADOS

@ljflores
Copy link
Contributor

From RADOS point of view, it looks good.

@ljflores
Copy link
Contributor

@idryomov @cbodley good to merge?

@idryomov
Copy link
Contributor

@idryomov @cbodley good to merge?

I'd say so, thanks for handling the RADOS suite!

@idryomov idryomov merged commit b487144 into ceph:main Feb 27, 2023
@ljflores
Copy link
Contributor

Thanks for waiting while we got lab issues sorted!

@jmolmo jmolmo added needs-reef-backport needs-quincy-backport backport required for quincy labels Mar 27, 2023
@idryomov idryomov removed needs-quincy-backport backport required for quincy needs-reef-backport labels Mar 31, 2023
@idryomov
Copy link
Contributor

@jmolmo This is already in reef and not getting backported to quincy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants