Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format #51090

Conversation

banuchka
Copy link
Contributor

@banuchka banuchka commented Apr 15, 2023

@github-actions github-actions bot added this to the quincy milestone Apr 15, 2023
@banuchka banuchka changed the title mgr/prometheus plugin [bug]: fix 2 bugs implemented by PR#48204, PR#49519 mgr/prometheus plugin [bug #59505]: fix 2 bugs implemented by PR#48204, PR#49519 Apr 21, 2023
@pereman2
Copy link
Contributor

Looks great. Thanks @banuchka for taking care of this.

The Signed-off-by is missing from the commit message and the Fixes: too. If you can change that and past the current output of /metrics it would be awesome.

@pereman2 pereman2 self-requested a review April 21, 2023 10:06
@pereman2
Copy link
Contributor

Btw, just curious, is repeated headers an issue or just cumbersome?

@nizamial09
Copy link
Member

btw @banuchka i see you targetted this PR for quincy only. Our normal way of workflow is to first create the PR in main and merge it there. Then cherry-pick those commits to stable branches like quincy or reef. Could you please target this PR to main first?

@banuchka
Copy link
Contributor Author

Btw, just curious, is repeated headers an issue or just cumbersome?

@pereman2 It is a not valid output format for the Prometheus scrapper, as an example below:
error reading metrics for http://****:***/metrics: reading text format failed: text format parsing error in line 2010: second HELP line for metric name "ceph_pool_objects_repaired"

@banuchka banuchka changed the base branch from quincy to main April 21, 2023 10:45
@banuchka banuchka requested review from a team as code owners April 21, 2023 10:45
@banuchka banuchka requested review from nizamial09 and removed request for a team April 21, 2023 10:45
@banuchka
Copy link
Contributor Author

banuchka commented Apr 21, 2023

btw @banuchka i see you targetted this PR for quincy only. Our normal way of workflow is to first create the PR in main and merge it there. Then cherry-pick those commits to stable branches like quincy or reef. Could you please target this PR to main first?

do I need to create a new PR or is it possible to change the target on the current one? (I cant find how to do that, my bad)

@banuchka banuchka changed the base branch from main to quincy April 21, 2023 10:48
@banuchka banuchka force-pushed the mgr/prometheus-plugin-fix-2-bugs-implemented-by-PR#48204,-PR#49519 branch 2 times, most recently from 3031f19 to 9c6258d Compare April 21, 2023 12:07
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@banuchka banuchka changed the base branch from quincy to main April 21, 2023 12:07
@trociny
Copy link
Contributor

trociny commented Apr 26, 2023

do I need to create a new PR or is it possible to change the target on the current one? (I cant find how to do that, my bad)

You should set the backport field in the tracker ticket (I did it), and then when this PR is merged to the main branch and the ticket status is changed to "Pending backport" the backport tickets will be created automatically.

@idryomov
Copy link
Contributor

@pereman2 Looking at this PR, I see additional issues with the original #47494 and #48843. Please see my comments there.

@idryomov idryomov changed the title mgr/prometheus plugin [bug #59505]: fix 2 bugs implemented by PR#48204, PR#49519 mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format Apr 27, 2023
@idryomov
Copy link
Contributor

@banuchka The update looks good, please squash it into the original commit and:

  • amend the commit title to match the (new) title of the PR
  • add some details (what is being fixed, perhaps quote error reading metrics for http://****:***/metrics: reading text format failed: text format parsing error in line 2010: second HELP line for metric name "ceph_pool_objects_repaired" error, etc) to the commit message

@banuchka banuchka force-pushed the mgr/prometheus-plugin-fix-2-bugs-implemented-by-PR#48204,-PR#49519 branch 2 times, most recently from 123f59c to 806e600 Compare April 27, 2023 11:34
@banuchka
Copy link
Contributor Author

banuchka commented Apr 27, 2023

@idryomov done
What do you about changing poolid to pool_id here as well? Or is it a separate issue(not an issue at all)?

@idryomov
Copy link
Contributor

idryomov commented Apr 27, 2023

What do you about changing poolid to pool_id here as well? Or is it a separate issue(not an issue at all)?

I would say that it's a separate issue (and we still need to hear from @pereman2 on why this metric deviated from the rest in this regard).

@pereman2
Copy link
Contributor

@idryomov I honestly didn't give much thought on it. I remember pushing those PR quickly so poolid was a clear mistake. I think it is great news if this can be fixed in this PR.

@banuchka
Copy link
Contributor Author

@pereman2 @idryomov just let me know your decision and I'm happy with both ways:

  1. change poolid to pool_id and force push here
  2. create a new PR with the changes above

@idryomov
Copy link
Contributor

@pereman2 @idryomov just let me know your decision

@pereman2 Could you please respond on #48843 (comment)? Then perhaps @banuchka could address all these issues in this PR and we could backport in one go.

@pereman2
Copy link
Contributor

@idryomov done. Summary: gauge should be ideal there.

@banuchka I think you can add those changes in this PR as ilya pointed out, it will be easier for the backport.

@banuchka banuchka force-pushed the mgr/prometheus-plugin-fix-2-bugs-implemented-by-PR#48204,-PR#49519 branch from 806e600 to ad365fb Compare April 27, 2023 14:57
@banuchka
Copy link
Contributor Author

@pereman2 @idryomov done as we've discussed.

src/pybind/mgr/prometheus/module.py Outdated Show resolved Hide resolved
src/pybind/mgr/prometheus/module.py Outdated Show resolved Hide resolved
src/pybind/mgr/prometheus/module.py Show resolved Hide resolved
…ormat

mgr/prometheus: fix pool_objects_repaired and daemon_health_metrics format

- fix "error reading metrics for http://****:***/metrics: reading text format failed: text format parsing error in line 2010: second HELP line for metric name "ceph_pool_objects_repaired" error

- rename label name "poolid" to "pool_id" like all other metrics
- change type for the "daemon_health_metrics" to gauge

Fixes: https://tracker.ceph.com/issues/59505
Signed-off-by: banuchka <tyrchenok@gmail.com>
@banuchka banuchka force-pushed the mgr/prometheus-plugin-fix-2-bugs-implemented-by-PR#48204,-PR#49519 branch from ad365fb to 95d5303 Compare April 27, 2023 20:08
@banuchka
Copy link
Contributor Author

@idryomov Now it should be better as I hope.

@idryomov idryomov dismissed pereman2’s stale review April 28, 2023 17:39

Dismissing since the new version needs to be retested. @pereman2 please test as you did before and ideally post the output here.

Copy link
Contributor

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# HELP ceph_pool_objects_repaired Number of objects repaired in a pool
# TYPE ceph_pool_objects_repaired counter
ceph_pool_objects_repaired{pool_id="9"} 0.0
ceph_pool_objects_repaired{pool_id="8"} 0.0
ceph_pool_objects_repaired{pool_id="7"} 0.0
ceph_pool_objects_repaired{pool_id="2"} 0.0
ceph_pool_objects_repaired{pool_id="1"} 0.0
ceph_pool_objects_repaired{pool_id="3"} 0.0
ceph_pool_objects_repaired{pool_id="4"} 0.0
ceph_pool_objects_repaired{pool_id="5"} 0.0
ceph_pool_objects_repaired{pool_id="6"} 0.0
# HELP ceph_daemon_health_metrics Health metrics for Ceph daemons
# TYPE ceph_daemon_health_metrics gauge
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="mon.a"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="mon.b"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="mon.c"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="osd.0"} 0.0
ceph_daemon_health_metrics{type="PENDING_CREATING_PGS",ceph_daemon="osd.0"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="osd.1"} 0.0
ceph_daemon_health_metrics{type="PENDING_CREATING_PGS",ceph_daemon="osd.1"} 0.0
ceph_daemon_health_metrics{type="SLOW_OPS",ceph_daemon="osd.2"} 0.0
ceph_daemon_health_metrics{type="PENDING_CREATING_PGS",ceph_daemon="osd.2"} 0.0

looks good.

@idryomov
Copy link
Contributor

idryomov commented May 3, 2023

jenkins test make check

@idryomov
Copy link
Contributor

idryomov commented May 3, 2023

jenkins test dashboard

1 similar comment
@rzarzynski
Copy link
Contributor

jenkins test dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants