Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/dashboard: fix Grafana OSD/host panels #43685

Merged
merged 4 commits into from Jan 14, 2022

Conversation

p-se
Copy link
Contributor

@p-se p-se commented Oct 27, 2021

Fix issues with PromQL expressions and vector matching with the
ceph_disk_occupation metric.

As it turns out, ceph_disk_occupation cannot simply be used as
expected, as there seem to be some edge cases for users that have
several OSDs on a single disk. This leads to issues which cannot be
approached by PromQL alone (many-to-many PromQL erros). The data we
have expected is simply different in some rare cases.

I have not found a sole PromQL solution to this issue. What we basically
need is the following.

  1. Match on labels host and instance to get one or more OSD names
    from a metadata metric (ceph_disk_occupation) to let a user know
    about which OSDs belong to which disk.

  2. Match on labels ceph_daemon of the ceph_disk_occupation metric,
    in which case the value of ceph_daemon must not refer to more than
    a single OSD. The exact opposite to requirement 1.

As both operations are currently performed on a single metric, and there
is no way to satisfy both requirements on a single metric, the intention
of this commit is to extend the metric by providing a similar metric
that satisfies one of the requirements. This enables the queries to
differentiate between a vector matching operation to show a string to
the user (where ceph_daemon could possibly be osd.1 or
osd.1+osd.2) and to match a vector by having a single ceph_daemon in
the condition for the matching.

Although the ceph_daemon label is used on a variety of daemons, only
OSDs seem to be affected by this issue (only if more than one OSD is run
on a single disk). This means that only the ceph_disk_occupation
metadata metric seems to need to be extended and provided as two
metrics.

ceph_disk_occupation is supposed to be used for matching the
ceph_daemon label value.

foo * on(ceph_daemon) group_left ceph_disk_occupation

ceph_disk_occupation_human is supposed to be used for anything where
the resulting data is displayed to be consumed by humans (graphs, alert
messages, etc).

foo * on(device,instance)
group_left(ceph_daemon) ceph_disk_occupation_human

Fixes: https://tracker.ceph.com/issues/52974

Signed-off-by: Patrick Seidensal pseidensal@suse.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@p-se p-se changed the title Fix grafana graphs ceph daemon mgr/dashboard: fix Grafana OSD/host panels Oct 27, 2021
@p-se p-se requested review from pcuzner, tchaikov, epuertat, aaSharma14 and a team and removed request for tchaikov October 27, 2021 09:05
@p-se p-se added the bug-fix label Oct 27, 2021
@p-se p-se force-pushed the fix-grafana-graphs-ceph_daemon branch from 90e03a8 to 1caecae Compare October 27, 2021 09:44
@github-actions github-actions bot added this to In progress in Dashboard Oct 27, 2021
@p-se p-se requested a review from alfonsomthd October 28, 2021 08:45
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@p-se p-se force-pushed the fix-grafana-graphs-ceph_daemon branch from 1caecae to 2d93396 Compare November 8, 2021 09:58
@p-se p-se force-pushed the fix-grafana-graphs-ceph_daemon branch from 2d93396 to 5aac4ed Compare November 8, 2021 11:28
@p-se p-se marked this pull request as ready for review November 8, 2021 15:27
Copy link
Contributor

@alfonsomthd alfonsomthd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE here.

@p-se I tested it and I got the "many-to-many matching not allowed: matching labels must be unique on one side" error. Can you take a look at it?
Screenshot from 2021-11-18 09-52-56

"found duplicate series for the match group {ceph_daemon="osd.1", device="dm-0", instance="ceph-node-01"} on the right hand-side of the operation: [{name="ceph_disk_occupation", ceph_daemon="osd.1", device="dm-0", device_ids="N/A", devices="vdb", instance="ceph-node-01", job="ceph"}, {name="ceph_disk_occupation", ceph_daemon="osd.1", device="dm-0", instance="ceph-node-01", job="ceph"}];many-to-many matching not allowed: matching labels must be unique on one side"

Dashboard automation moved this from In progress to Review in progress Nov 18, 2021
@alfonsomthd
Copy link
Contributor

alfonsomthd commented Nov 18, 2021

@p-se I tested it and I got the "many-to-many matching not allowed: matching labels must be unique on one side" error. Can

Sorry @p-se I was testing with an outdated grafana panel so ignore my previous comment.

The issue I'm facing now is that ceph_disk_occupation_human metric is being generated
Screenshot from 2021-11-18 11-51-42

Please lok at this.

Dashboard automation moved this from Review in progress to Reviewer approved Dec 13, 2021
@p-se p-se force-pushed the fix-grafana-graphs-ceph_daemon branch from 6eca893 to b10fbc1 Compare December 13, 2021 14:11
@avanthakkar
Copy link
Contributor

I tried testing some IOPS panels, but I don't see any data.
Steps:

  1. Used this benchmark for iops.
  2. Check grafana panels for osd R/W IOPS (IOPs are shown fine)
    Screenshot from 2021-12-13 22-28-46
  3. Checking host details -> OSD Disk Performance Statistics -> No data
    Screenshot from 2021-12-13 22-29-00
    ^^ ceph_disk_occupation_human prom query shows no data, I think that's the reason metrics shows no data.
  4. Same for Avg. Disk Utilization
    Screenshot from 2021-12-13 22-29-08

I think it's because per host (host details) osd statistics showed no data so avg. will be the same.
Only reason I could find for No data metrics is because ceph_disk_occupation_human shows no data.
Bcz other queries grouped with this metric works fine. for e.g. node_disk_io_time_seconds_total shows correct data

@p-se
Copy link
Contributor Author

p-se commented Dec 13, 2021

I tried testing some IOPS panels, but I don't see any data. Steps:

Thank you @avanthakkar . I'll check tomorrow and get back to you!

@p-se
Copy link
Contributor Author

p-se commented Dec 14, 2021

@avanthakkar I've checked and it makes sense to me. I also do not have a ceph_disk_occupation metric on my vstart cluster and I also haven't had it before I created this PR. The reason is that, without virtualization, a Ceph cluster doesn't seem to be able to request some required information from my disk. This leads to the mgr/prometheus module not being able to provide data for the ceph_disk_occupation metric and hence the ceph_disk_occupation_human metric cannot be derived from it. I'm assuming you've the same issue.

So, you'll likely either need to use virtualization (not sure if kcli can already be used for it) or tamper with the mgr/prometheus/module.py file. Unfortunately, I do not have the patch at hand anymore, need to rebuild Ceph and have an issue doing so at the moment.

@p-se p-se force-pushed the fix-grafana-graphs-ceph_daemon branch from b10fbc1 to b95b6a0 Compare December 14, 2021 14:53
@avanthakkar
Copy link
Contributor

@avanthakkar I've checked and it makes sense to me. I also do not have a ceph_disk_occupation metric on my vstart cluster and I also haven't had it before I created this PR. The reason is that, without virtualization, a Ceph cluster doesn't seem to be able to request some required information from my disk. This leads to the mgr/prometheus module not being able to provide data for the ceph_disk_occupation metric and hence the ceph_disk_occupation_human metric cannot be derived from it. I'm assuming you've the same issue.

So, you'll likely either need to use virtualization (not sure if kcli can already be used for it) or tamper with the mgr/prometheus/module.py file. Unfortunately, I do not have the patch at hand anymore, need to rebuild Ceph and have an issue doing so at the moment.

Yes, it worked with kcli env. Thanks @p-se

@avanthakkar
Copy link
Contributor

jenkins test dashboard cephadm

@avanthakkar
Copy link
Contributor

jenkins test dashboard

@alfonsomthd alfonsomthd moved this from Reviewer approved to Ready-to-merge in Dashboard Dec 15, 2021
Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work here @p-se (and über-sorry for the late reply)! I left some comments over there. None of them is definitely blocking, since this tackles an issue, but given it's introducing a new metric (and deprecating an existing one) I wondered if we should take the opportunity to clean up the metric in order to simplify the resulting queries.

@@ -236,11 +236,11 @@ drive statistics, special series are output like this:

::

ceph_disk_occupation{ceph_daemon="osd.0",device="sdd", exported_instance="myhost"}
ceph_disk_occupation_human{ceph_daemon="osd.0", device="sdd", exported_instance="myhost"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it mentions that this is a metadata label (a label only used for join-like queries), so as we're using the _metadata suffix for other metrics like this, why not here too?:

Suggested change
ceph_disk_occupation_human{ceph_daemon="osd.0", device="sdd", exported_instance="myhost"}
ceph_disk_metadata{ceph_daemon="osd.0", device="sdd", exported_instance="myhost"}

Or if we want to keep the _occupation_ part, why not adding a _v2? The _human suffix sounds weird to me, and I cannot relate this to the formerly existing ceph_disk_occupation metric:

Suggested change
ceph_disk_occupation_human{ceph_daemon="osd.0", device="sdd", exported_instance="myhost"}
ceph_disk_occupation_v2{ceph_daemon="osd.0", device="sdd", exported_instance="myhost"}

Comment on lines +19 to +22
| node_disk_writes_completed_total{device="sda",instance="localhost:9100"} | 10+60x1 |
| node_disk_writes_completed_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
| ceph_disk_occupation_human{ceph_daemon="osd.0 osd.1 osd.2",device="/dev/sda",instance="localhost:9283"} | 1.0 |
| ceph_disk_occupation_human{ceph_daemon="osd.3 osd.4 osd.5",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering: given we're creating a new metadata metric from the old ceph_disk_occupation, why taking the chance for fixing other issues: all the metrics involving ceph_disk_occupation require a doubly nested label_replace():

label_replace(irate(node_disk_read_bytes_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and
  on (instance, device)
  label_replace(
    label_replace(ceph_disk_occupation_human{ceph_daemon=~"$osd"}, "device", "$1", "device", "/dev/(.*)"),
  "instance", "$1", "instance", "([^:.]*).*")

By:

  • Aligning device names (in node_disk_* devices are sd*, while in ceph_disk_* ones they're /dev/sd*). Why not aligning the new ceph_disk_occupation to use sd* too? That way we would remove several label_replace() all over the code,
  • Aligning instance names,

we should expect the same promql query to work as:

irate(node_disk_read_bytes_total[1m]) and on (instance, device) ceph_disk_occupation_human{ceph_daemon=~"$osd"}

from typing import DefaultDict, Optional, Dict, Any, Set, cast, Tuple, Union, List, Callable

LabelValues = Tuple[str, ...]
Number = Union[int, float]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from numbers import Number:

Suggested change
Number = Union[int, float]

That said, the Type hints assume that float includes int.

Comment on lines +375 to +380
def group_by(
self,
keys: List[str],
joins: Dict[str, Callable[[List[str]], str]],
name: Optional[str] = None,
) -> "Metric":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this method tries to do many things (multiple keys to group by, a callable for generating the new label). Can't we go with some assumptions and simplify it a bit?:

  • Single label to group by (you can achieve the same behaviour by nesting group_by),
  • Fixed format for the new label.
Suggested change
def group_by(
self,
keys: List[str],
joins: Dict[str, Callable[[List[str]], str]],
name: Optional[str] = None,
) -> "Metric":
def group_by(
self,
key: str,
) -> "Metric":

Comment on lines +384 to +385
Label names not passed are being removed from the resulting metric but
by providing a join function, labels of metrics can be grouped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What for? What's the use case here?

... ('foo', 'y'): 10,
... }
>>> m.group_by(['label1'], {'id': lambda ids: ','.join(ids)}).value
{('foo', 'x,y'): 555}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're actually randomly taking a value here... Why not assuming this does only make sense for metadata metrics and ignore the value (set it to the default 1.0)?

Comment on lines +423 to +424
for key in keys:
assert key in self.labelnames, "unknown key: {}".format(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why forcing this check? If we allow all label names to pass...

for key in keys:
assert key in self.labelnames, "unknown key: {}".format(key)
assert joins, "joins must not be empty"
assert all(callable(c) for c in joins.values()), "joins must be callable"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide a default join behaviour (e.g.: comma-separated), rather than requiring the user to provide their own join method?

Comment on lines +429 to +435
grouped: Dict[LabelValues, List[Tuple[Dict[str, str], Number]]] = defaultdict(list)
for label_values, metric_value in self.value.items():
labels = dict(zip(self.labelnames, label_values))
if not all(k in labels for k in keys):
continue
group_key = tuple(labels[k] for k in keys)
grouped[group_key].append((labels, metric_value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost the same as invoking itertools.groupby (I think you would need to sort the collection before).

After that, and given the assumptions above mentioned, the only thing left would be collapsing the label values (joining them with commas when different):

[','.join(sorted(set(i)) for i in zip(*labels)]

Comment on lines +440 to +445
labelnames = tuple(
label for label in self.labelnames if label in keys or label in joins
)
superfluous_labelnames = [
label for label in self.labelnames if label not in labelnames
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this you could benefit from using set as that supports intersection and other set operations.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@p-se p-se force-pushed the fix-grafana-graphs-ceph_daemon branch from b95b6a0 to 6da154e Compare January 12, 2022 17:02
Fixes: https://tracker.ceph.com/issues/52974

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
Fix issues with PromQL expressions and vector matching with the
`ceph_disk_occupation` metric.

As it turns out, `ceph_disk_occupation` cannot simply be used as
expected, as there seem to be some edge cases for users that have
several OSDs on a single disk.  This leads to issues which cannot be
approached by PromQL alone (many-to-many PromQL erros).  The data we
have expected is simply different in some rare cases.

I have not found a sole PromQL solution to this issue. What we basically
need is the following.

1. Match on labels `host` and `instance` to get one or more OSD names
   from a metadata metric (`ceph_disk_occupation`) to let a user know
   about which OSDs belong to which disk.

2. Match on labels `ceph_daemon` of the `ceph_disk_occupation` metric,
   in which case the value of `ceph_daemon` must not refer to more than
   a single OSD. The exact opposite to requirement 1.

As both operations are currently performed on a single metric, and there
is no way to satisfy both requirements on a single metric, the intention
of this commit is to extend the metric by providing a similar metric
that satisfies one of the requirements. This enables the queries to
differentiate between a vector matching operation to show a string to
the user (where `ceph_daemon` could possibly be `osd.1` or
`osd.1+osd.2`) and to match a vector by having a single `ceph_daemon` in
the condition for the matching.

Although the `ceph_daemon` label is used on a variety of daemons, only
OSDs seem to be affected by this issue (only if more than one OSD is run
on a single disk).  This means that only the `ceph_disk_occupation`
metadata metric seems to need to be extended and provided as two
metrics.

`ceph_disk_occupation` is supposed to be used for matching the
`ceph_daemon` label value.

    foo * on(ceph_daemon) group_left ceph_disk_occupation

`ceph_disk_occupation_human` is supposed to be used for anything where
the resulting data is displayed to be consumed by humans (graphs, alert
messages, etc).

    foo * on(device,instance)
    group_left(ceph_daemon) ceph_disk_occupation_human

Fixes: https://tracker.ceph.com/issues/52974

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
@p-se p-se force-pushed the fix-grafana-graphs-ceph_daemon branch from 6da154e to 7d74880 Compare January 13, 2022 12:28
@p-se p-se requested a review from a team as a code owner January 13, 2022 12:28
@p-se
Copy link
Contributor Author

p-se commented Jan 14, 2022

jenkins test dashboard cephadm

1 similar comment
@callithea
Copy link
Member

jenkins test dashboard cephadm

@epuertat epuertat merged commit f7cd4b2 into ceph:master Jan 14, 2022
Dashboard automation moved this from Ready-to-merge to Done Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Dashboard
  
Done
7 participants