mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format #51090

banuchka · 2023-04-15T21:19:58Z

I created a bug report on Ceph Tracker - BUG #59505

This PR is a fix for:

pereman2 · 2023-04-21T10:06:56Z

Looks great. Thanks @banuchka for taking care of this.

The Signed-off-by is missing from the commit message and the Fixes: too. If you can change that and past the current output of /metrics it would be awesome.

pereman2 · 2023-04-21T10:07:37Z

Btw, just curious, is repeated headers an issue or just cumbersome?

nizamial09 · 2023-04-21T10:13:39Z

btw @banuchka i see you targetted this PR for quincy only. Our normal way of workflow is to first create the PR in main and merge it there. Then cherry-pick those commits to stable branches like quincy or reef. Could you please target this PR to main first?

banuchka · 2023-04-21T10:40:47Z

Btw, just curious, is repeated headers an issue or just cumbersome?

@pereman2 It is a not valid output format for the Prometheus scrapper, as an example below:
error reading metrics for http://****:***/metrics: reading text format failed: text format parsing error in line 2010: second HELP line for metric name "ceph_pool_objects_repaired"

banuchka · 2023-04-21T10:46:07Z

btw @banuchka i see you targetted this PR for quincy only. Our normal way of workflow is to first create the PR in main and merge it there. Then cherry-pick those commits to stable branches like quincy or reef. Could you please target this PR to main first?

do I need to create a new PR or is it possible to change the target on the current one? (I cant find how to do that, my bad)

github-actions · 2023-04-21T12:07:28Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

trociny · 2023-04-26T16:44:20Z

do I need to create a new PR or is it possible to change the target on the current one? (I cant find how to do that, my bad)

You should set the backport field in the tracker ticket (I did it), and then when this PR is merged to the main branch and the ticket status is changed to "Pending backport" the backport tickets will be created automatically.

src/pybind/mgr/prometheus/module.py

idryomov · 2023-04-26T21:53:01Z

@pereman2 Looking at this PR, I see additional issues with the original #47494 and #48843. Please see my comments there.

idryomov · 2023-04-27T10:33:34Z

@banuchka The update looks good, please squash it into the original commit and:

amend the commit title to match the (new) title of the PR
add some details (what is being fixed, perhaps quote error reading metrics for http://****:***/metrics: reading text format failed: text format parsing error in line 2010: second HELP line for metric name "ceph_pool_objects_repaired" error, etc) to the commit message

banuchka · 2023-04-27T11:35:39Z

@idryomov done
What do you about changing poolid to pool_id here as well? Or is it a separate issue(not an issue at all)?

idryomov · 2023-04-27T12:18:47Z

What do you about changing poolid to pool_id here as well? Or is it a separate issue(not an issue at all)?

I would say that it's a separate issue (and we still need to hear from @pereman2 on why this metric deviated from the rest in this regard).

pereman2 · 2023-04-27T12:22:14Z

@idryomov I honestly didn't give much thought on it. I remember pushing those PR quickly so poolid was a clear mistake. I think it is great news if this can be fixed in this PR.

banuchka · 2023-04-27T12:53:47Z

@pereman2 @idryomov just let me know your decision and I'm happy with both ways:

change poolid to pool_id and force push here
create a new PR with the changes above

idryomov · 2023-04-27T13:00:05Z

@pereman2 @idryomov just let me know your decision

@pereman2 Could you please respond on #48843 (comment)? Then perhaps @banuchka could address all these issues in this PR and we could backport in one go.

pereman2 · 2023-04-27T13:06:06Z

@idryomov done. Summary: gauge should be ideal there.

@banuchka I think you can add those changes in this PR as ilya pointed out, it will be easier for the backport.

banuchka · 2023-04-27T14:59:03Z

@pereman2 @idryomov done as we've discussed.

src/pybind/mgr/prometheus/module.py

…ormat mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format - fix "error reading metrics for http://****:***/metrics: reading text format failed: text format parsing error in line 2010: second HELP line for metric name "ceph_pool_objects_repaired" error - rename label name "poolid" to "pool_id" like all other metrics - change type for the "daemon_health_metrics" to gauge Fixes: https://tracker.ceph.com/issues/59505 Signed-off-by: banuchka <tyrchenok@gmail.com>

banuchka · 2023-04-27T20:21:23Z

@idryomov Now it should be better as I hope.

@pereman2

Dismissing since the new version needs to be retested. @pereman2 please test as you did before and ideally post the output here.

pereman2

# HELP ceph_pool_objects_repaired Number of objects repaired in a pool
# TYPE ceph_pool_objects_repaired counter
ceph_pool_objects_repaired{pool_id="9"} 0.0
ceph_pool_objects_repaired{pool_id="8"} 0.0
ceph_pool_objects_repaired{pool_id="7"} 0.0
ceph_pool_objects_repaired{pool_id="2"} 0.0
ceph_pool_objects_repaired{pool_id="1"} 0.0
ceph_pool_objects_repaired{pool_id="3"} 0.0
ceph_pool_objects_repaired{pool_id="4"} 0.0
ceph_pool_objects_repaired{pool_id="5"} 0.0
ceph_pool_objects_repaired{pool_id="6"} 0.0
# HELP ceph_daemon_health_metrics Health metrics for Ceph daemons
# TYPE ceph_daemon_health_metrics gauge
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="mon.a"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="mon.b"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="mon.c"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="osd.0"} 0.0
ceph_daemon_health_metrics{type="PENDING_CREATING_PGS",ceph_daemon="osd.0"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="osd.1"} 0.0
ceph_daemon_health_metrics{type="PENDING_CREATING_PGS",ceph_daemon="osd.1"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="osd.2"} 0.0
ceph_daemon_health_metrics{type="PENDING_CREATING_PGS",ceph_daemon="osd.2"} 0.0

looks good.

idryomov · 2023-05-03T09:30:28Z

jenkins test make check

idryomov · 2023-05-03T09:31:04Z

jenkins test dashboard

rzarzynski · 2023-05-04T17:54:39Z

jenkins test dashboard

github-actions bot added monitoring pybind labels Apr 15, 2023

github-actions bot added this to the quincy milestone Apr 15, 2023

banuchka changed the title ~~mgr/prometheus plugin [bug]: fix 2 bugs implemented by PR#48204, PR#49519~~ mgr/prometheus plugin [bug #59505]: fix 2 bugs implemented by PR#48204, PR#49519 Apr 21, 2023

pereman2 self-requested a review April 21, 2023 10:06

banuchka changed the base branch from quincy to main April 21, 2023 10:45

banuchka requested review from a team as code owners April 21, 2023 10:45

banuchka requested review from nizamial09 and removed request for a team April 21, 2023 10:45

banuchka changed the base branch from main to quincy April 21, 2023 10:48

banuchka force-pushed the mgr/prometheus-plugin-fix-2-bugs-implemented-by-PR#48204,-PR#49519 branch 2 times, most recently from 3031f19 to 9c6258d Compare April 21, 2023 12:07

github-actions bot added the needs-rebase label Apr 21, 2023

banuchka changed the base branch from quincy to main April 21, 2023 12:07

github-actions bot added build/ops ceph-volume labels Apr 21, 2023

idryomov reviewed Apr 26, 2023

View reviewed changes

src/pybind/mgr/prometheus/module.py Outdated Show resolved Hide resolved

idryomov changed the title ~~mgr/prometheus plugin [bug #59505]: fix 2 bugs implemented by PR#48204, PR#49519~~ mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format Apr 27, 2023

banuchka force-pushed the mgr/prometheus-plugin-fix-2-bugs-implemented-by-PR#48204,-PR#49519 branch 2 times, most recently from 123f59c to 806e600 Compare April 27, 2023 11:34

idryomov approved these changes Apr 27, 2023

View reviewed changes

banuchka force-pushed the mgr/prometheus-plugin-fix-2-bugs-implemented-by-PR#48204,-PR#49519 branch from 806e600 to ad365fb Compare April 27, 2023 14:57

idryomov reviewed Apr 27, 2023

View reviewed changes

src/pybind/mgr/prometheus/module.py Outdated Show resolved Hide resolved

src/pybind/mgr/prometheus/module.py Outdated Show resolved Hide resolved

src/pybind/mgr/prometheus/module.py Show resolved Hide resolved

banuchka force-pushed the mgr/prometheus-plugin-fix-2-bugs-implemented-by-PR#48204,-PR#49519 branch from ad365fb to 95d5303 Compare April 27, 2023 20:08

idryomov approved these changes Apr 28, 2023

View reviewed changes

pereman2 approved these changes May 3, 2023

View reviewed changes

nizamial09 merged commit 614d7fa into ceph:main May 22, 2023
11 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format #51090

mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format #51090

banuchka commented Apr 15, 2023 •

edited

pereman2 commented Apr 21, 2023

pereman2 commented Apr 21, 2023

nizamial09 commented Apr 21, 2023

banuchka commented Apr 21, 2023

banuchka commented Apr 21, 2023 •

edited

github-actions bot commented Apr 21, 2023

trociny commented Apr 26, 2023

idryomov commented Apr 26, 2023

idryomov commented Apr 27, 2023

banuchka commented Apr 27, 2023 •

edited

idryomov commented Apr 27, 2023 •

edited

pereman2 commented Apr 27, 2023

banuchka commented Apr 27, 2023

idryomov commented Apr 27, 2023

pereman2 commented Apr 27, 2023

banuchka commented Apr 27, 2023

banuchka commented Apr 27, 2023

pereman2 left a comment

idryomov commented May 3, 2023

idryomov commented May 3, 2023

rzarzynski commented May 4, 2023

mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format #51090

mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format #51090

Conversation

banuchka commented Apr 15, 2023 • edited

pereman2 commented Apr 21, 2023

pereman2 commented Apr 21, 2023

nizamial09 commented Apr 21, 2023

banuchka commented Apr 21, 2023

banuchka commented Apr 21, 2023 • edited

github-actions bot commented Apr 21, 2023

trociny commented Apr 26, 2023

idryomov commented Apr 26, 2023

idryomov commented Apr 27, 2023

banuchka commented Apr 27, 2023 • edited

idryomov commented Apr 27, 2023 • edited

pereman2 commented Apr 27, 2023

banuchka commented Apr 27, 2023

idryomov commented Apr 27, 2023

pereman2 commented Apr 27, 2023

banuchka commented Apr 27, 2023

banuchka commented Apr 27, 2023

pereman2 left a comment

Choose a reason for hiding this comment

idryomov commented May 3, 2023

idryomov commented May 3, 2023

rzarzynski commented May 4, 2023

banuchka commented Apr 15, 2023 •

edited

banuchka commented Apr 21, 2023 •

edited

banuchka commented Apr 27, 2023 •

edited

idryomov commented Apr 27, 2023 •

edited