Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporter: promethize counter names #51069

Merged
merged 1 commit into from Apr 28, 2023
Merged

Conversation

pereman2
Copy link
Contributor

@pereman2 pereman2 commented Apr 13, 2023

This fix assures that metrics coming from daemons perf counters have the same name if they are exported by the Prometheus manager module or by the Ceph exporter.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2186557

src/exporter/DaemonMetricCollector.cc Outdated Show resolved Hide resolved
src/exporter/DaemonMetricCollector.cc Outdated Show resolved Hide resolved

// just in case because there are some bluestore metrics which include a single ":"
if (name.find(":") != std::string::npos) {
boost::replace_all(name, ":", "_");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this in promethize in src/pybind/mgr/prometheus/module.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the comment says, I saw a single colon around some bluestore metrics so I added that in just in case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then promethize in src/pybind/mgr/prometheus/module.py needs to updated to the same. There can't be "just in case" here, both methods should do exactly the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it's important to keep consistency with the existing metric names (even if the current approach was not the best possible). Otherwise, this will generate new time-series, and the existing Grafana Dashboards might have missing data.

Maybe those bluestore metrics are not exported due to their priority level?

@jmolmo
Copy link
Member

jmolmo commented Apr 13, 2023

Output with the new code. the "_", "+" has been properly replaced.

(standard input)752:ceph_mds_mem_cap_minus{ceph_daemon="mds.a"} 0
(standard input)753:# HELP ceph_mds_mem_cap_plus Capabilities added
(standard input)754:# TYPE ceph_mds_mem_cap_plus counter
(standard input)755:ceph_mds_mem_cap_plus{ceph_daemon="mds.a"} 0
(standard input)756:# HELP ceph_mds_mem_dir Directories
(standard input)757:# TYPE ceph_mds_mem_dir gauge
(standard input)758:ceph_mds_mem_dir{ceph_daemon="mds.a"} 12
(standard input)759:# HELP ceph_mds_mem_dir_minus Directories closed

@idryomov
Copy link
Contributor

idryomov commented Apr 13, 2023

Output with the new code. the "_", "+" has been properly replaced.

Only because + happens to be the last character in this example.

@pereman2
Copy link
Contributor Author

I made the code replace all + with _plus just in case. Expecting a + as an infix will be a weird case so I don't think is something we should care about for now.

Another thing we could add is replace all invalid characters from the prometheus name regex they provide officially to underscores or something simliar but I don't feel like we need to reach that far for the current quick fix.

src/exporter/DaemonMetricCollector.cc Outdated Show resolved Hide resolved
Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you, @pereman2 !

BTW, I see no "Fixes:...". Shouldn't this be linked to a bug report?

src/test/exporter/promethize_data.h Outdated Show resolved Hide resolved
src/exporter/DaemonMetricCollector.cc Show resolved Hide resolved
@jmolmo
Copy link
Member

jmolmo commented Apr 17, 2023

LGTM! Thank you, @pereman2 !

BTW, I see no "Fixes:...". Shouldn't this be linked to a bug report?

@pereman2 was so fast fixing this , that the bug creation took more time than post the pr ... :-D ,..
https://bugzilla.redhat.com/show_bug.cgi?id=2186557

@pereman2, please, be sure to update the commit

@pereman2
Copy link
Contributor Author

@jmolmo @epuertat Added the fixes section.

@jmolmo
Copy link
Member

jmolmo commented Apr 18, 2023

The windows tests fail with an error in one of the exporter files, the weird thing is that the offending line has not been modified.... and the line was introduced 9 monthsd ago ...
/home/ubuntu/ceph/src/exporter/DaemonMetricCollector.cc:245:27: error: ‘_SC_CLK_TCK’ was not declared in this scope; did you mean ‘CLK_TCK’?

@pereman2 what do you think?

@idryomov
Copy link
Contributor

The windows tests fail with an error in one of the exporter files, the weird thing is that the offending line has not been modified.... and the line was introduced 9 monthsd ago ...
/home/ubuntu/ceph/src/exporter/DaemonMetricCollector.cc:245:27: error: ‘_SC_CLK_TCK’ was not declared in this scope; did you mean ‘CLK_TCK’?

It's failing to build the test, not the daemon. I think the failure appeared because ceph-exporter isn't built for Windows but with this PR you are trying to build its test binary. You would need to disable that in src/test/CMakeLists.txt by moving add_subdirectory(exporter) line into the "Not available on Windows for the time being" section.

Cc @petrutlucian94 to keep me honest.

src/exporter/DaemonMetricCollector.cc Outdated Show resolved Hide resolved
src/exporter/DaemonMetricCollector.cc Outdated Show resolved Hide resolved
src/exporter/DaemonMetricCollector.cc Outdated Show resolved Hide resolved
src/test/exporter/promethize_data.h Outdated Show resolved Hide resolved
src/exporter/DaemonMetricCollector.cc Outdated Show resolved Hide resolved
@petrutlucian94
Copy link
Contributor

The windows tests fail with an error in one of the exporter files, the weird thing is that the offending line has not been modified.... and the line was introduced 9 monthsd ago ...
/home/ubuntu/ceph/src/exporter/DaemonMetricCollector.cc:245:27: error: ‘_SC_CLK_TCK’ was not declared in this scope; did you mean ‘CLK_TCK’?

It's failing to build the test, not the daemon. I think the failure appeared because ceph-exporter isn't built for Windows but with this PR you are trying to build its test binary. You would need to disable that in src/test/CMakeLists.txt by moving add_subdirectory(exporter) line into the "Not available on Windows for the time being" section.

Cc @petrutlucian94 to keep me honest.

That's correct, please skip this test on Windows. Thanks!

src/exporter/DaemonMetricCollector.cc Outdated Show resolved Hide resolved
src/exporter/util.h Outdated Show resolved Hide resolved
src/test/exporter/CMakeLists.txt Outdated Show resolved Hide resolved
@jmolmo
Copy link
Member

jmolmo commented Apr 21, 2023

jenkins test make check

@jmolmo
Copy link
Member

jmolmo commented Apr 21, 2023

jenkins test make check arm64

1 similar comment
@jmolmo
Copy link
Member

jmolmo commented Apr 21, 2023

jenkins test make check arm64

@avanthakkar
Copy link
Contributor

jenkins test dashboard cephadm

1 similar comment
@avanthakkar
Copy link
Contributor

jenkins test dashboard cephadm

@avanthakkar avanthakkar requested a review from a team as a code owner April 24, 2023 12:47
@avanthakkar avanthakkar requested review from nizamial09 and removed request for a team April 24, 2023 12:47
@avanthakkar
Copy link
Contributor

jenkins test dashboard cephadm

@jmolmo
Copy link
Member

jmolmo commented Apr 24, 2023

jenkins test dashboard cephadm

@avanthakkar avanthakkar changed the title exporter: promethize counter names exporter: promethize counter names Apr 24, 2023
@avanthakkar
Copy link
Contributor

jenkins test dashboard cephadm

@avanthakkar
Copy link
Contributor

jenkins test make check

Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com>
Fixes: https://tracker.ceph.com/issues/59475
@pereman2
Copy link
Contributor Author

@idryomov I've fixed the last comments from you.

@avanthakkar can you give a brief on why this commit was added ? 4998f85 I've removed it completely as you didn't give any explanations on the commit plus it didn't follow naming conventions of commit messages.

@avanthakkar
Copy link
Contributor

@idryomov I've fixed the last comments from you.

@avanthakkar can you give a brief on why this commit was added ? 4998f85 I've removed it completely as you didn't give any explanations on the commit plus it didn't follow naming conventions of commit messages.

I temporarily added it to see the cephadm grafana e2e tests are passing or not without enabling the feature for fetching perf counters from prometheus module just to verify if nothing is breaking in any grafana dashboards.

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on

For sure a more complete test coverage will be useful, and probably we need to set as priority. However, I suggest to be focused in this pr just in the problem to resolve.

As @jmolmo said, we should stick with the current purpose of this PR and @avanthakkar can work on on extending the testing suite.

@avanthakkar Can you link the respective tracker ticket here?

@idryomov
Copy link
Contributor

jenkins test windows

@avanthakkar
Copy link
Contributor

jenkins test dashboard

@jmolmo
Copy link
Member

jmolmo commented Apr 26, 2023

Ceph dashboard test failures not related with this change

@pereman2
Copy link
Contributor Author

@idryomov As per your request in the CLT, I've created this tracker https://tracker.ceph.com/issues/59561 (we can split it as necessary) to track the integration tests on the exporter.

cc: @jmolmo @avanthakkar

@jmolmo jmolmo merged commit 11b3cc2 into ceph:main Apr 28, 2023
11 of 13 checks passed
@jmolmo jmolmo deleted the exporter-prom branch April 28, 2023 06:21
yuvalif pushed a commit to yuvalif/ceph that referenced this pull request Apr 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
6 participants