Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DRBD metrics for disk-state #26

Merged
merged 13 commits into from
Oct 14, 2019
Merged

Implement DRBD metrics for disk-state #26

merged 13 commits into from
Oct 14, 2019

Conversation

MalloZup
Copy link
Contributor

@MalloZup MalloZup commented Oct 11, 2019

Description

This pr will implement this metric:


# HELP ha_cluster_drbd_resource_disk_state show per resource name, its role, the volume and disk_state (Diskless,Attaching, Failed, Negotiating, Inconsistent, Outdated, DUnknown, Consistent, UpToDate)
# TYPE ha_cluster_drbd_resource_disk_state gauge
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="1-single-0",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="1-single-1",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="vg1",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="vg2",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="vg3",role="Secondary",volume="0"} 1

Usage:

  • Check if the state of DRBD disk are UptoDate, out of sync etc/
  • monitor or alert drbd resource disks

what is it is missing:

  • parse and populated types.
  • Tests
  • set the prometheus exporter metric according to metric

Add function to get raw JSON
doesn't have any SBD_DEVICE set. In this case just catch the error and
continue.
The rationale behind is that in some systems user could forget to set
this in config file so we don't want to panic the exporter because an
index error
We need to sleep the same timeout if a X metric encounter an error, so
they metrics are executed always at same time.

Example: if sbd metric fail and we continue without timeout, the
execution will be 10X or more faster then a normal metric with timeout.
- add reset for drbd metric, this is needed in case we lost a disk and
since a disk is a label, if we wouldn't destroy/reset a metric at each
time, we could contain a zombie disk metric

- implement map from value to number
@MalloZup MalloZup changed the title WIP: Implement DRBD metrics Implement DRBD metrics for disk-state Oct 14, 2019
@stefanotorresi
Copy link
Member

Just a nitpicky remark about naming (yeah, I might be kind of a naming freak):
The _state part in ha_cluster_drbd_resource_disk_state feels somewhat redundant. I mean, do we track anything other than the disks state?
Also, are there other disks other than "resource disks"? If not, could this be just ha_cluster_drbd_disk?
In general, what other ha_cluster_drbd_* metrics we have in our future plans? So that we can plan their naming accordingly.

@MalloZup
Copy link
Contributor Author

Good point. I think we can remove the last 2 words

drbd_metrics.go Outdated Show resolved Hide resolved
ha_cluster_exporter.go Outdated Show resolved Hide resolved
@MalloZup
Copy link
Contributor Author

@storresi can web merge?

Copy link
Member

@stefanotorresi stefanotorresi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry @MalloZup I actually forgot to hit the green button 😝

@MalloZup MalloZup merged commit d97a702 into master Oct 14, 2019
@MalloZup
Copy link
Contributor Author

OK THX! I have added it to the release draft. We will wait a bit this time before releasing a new rpm since we need to do some refactoring etc

@MalloZup MalloZup deleted the drbd-metric branch October 16, 2019 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants