Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added md disks in down state #3007

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

Finomosec
Copy link

@Finomosec Finomosec commented Apr 30, 2024

Added missing mdadm stats:

  • node_md_disks # added {state="down"}
  • node_md_sync_time_remaining (seconds)
  • node_md_blocks_synced_speed
  • node_md_blocks_synced_pct

Notes:

  • One drive was not being shown, as it was in state="down" (recovering), which was not reported in the output.
  • Using node_md_blocks_synced / node_md_blocks as progress percentage created wrong results on my system, as the total-blocks differed from the total-to-be-synced-blocks. This may be due to the raid-level being used (raid5).
md0 : active raid5 sdf1[4] sde1[1] sdc1[2] sdb1[0]
      14650718208 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
      [===================>.]  recovery = 99.9% (4882207424/4883572736) finish=7.8min speed=2908K/sec
      bitmap: 2/37 pages [8KB], 65536KB chunk

Signed-off-by: Finomosec <github@meinebasis.de>
Finomosec and others added 7 commits May 1, 2024 11:51
… node_md_blocks_synced_pct

Signed-off-by: Finomosec <github@meinebasis.de>
Signed-off-by: Finomosec <github@meinebasis.de>
Signed-off-by: Finomosec <github@meinebasis.de>
changed sync_minutes_remaining to sync_time_remaining (in seconds)

Signed-off-by: Frederic <fredleitenb@yahoo.de>
fixed code formatting

Signed-off-by: Frederic <fredleitenb@yahoo.de>
Signed-off-by: Finomosec <github@meinebasis.de>
Signed-off-by: Finomosec <github@meinebasis.de>
@Finomosec
Copy link
Author

@SuperQ
I'm done for now.
Feel free to merge it at any time.

collector/mdadm_linux.go Outdated Show resolved Hide resolved
collector/mdadm_linux.go Outdated Show resolved Hide resolved
Comment on lines +114 to +119
blockSyncedSpeedDesc = prometheus.NewDesc(
prometheus.BuildFQName(namespace, "md", "blocks_synced_speed"),
"current sync speed (in Kilobytes/sec)",
[]string{"device"},
nil,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary, we should be able to compute this from something like rate(node_md_blocks_synced[1m]) * <blocksize>.

Suggested change
blockSyncedSpeedDesc = prometheus.NewDesc(
prometheus.BuildFQName(namespace, "md", "blocks_synced_speed"),
"current sync speed (in Kilobytes/sec)",
[]string{"device"},
nil,
)

Copy link
Author

@Finomosec Finomosec May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is usefull. It is the CURRENT speed, as it is shown in /stat/proc/mdstat
I have it showing in my Grafana board.
Plus i guess <blocksize> is not included in the data, so it would require additional configuration for each md-device.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the groundwork has already been laid for #1085, and we probably should not add any new parsing functionality relating to /proc/mdstat.

@SuperQ
Copy link
Member

SuperQ commented May 7, 2024

Maybe instead of exposing the sync percent, we should expose the "TODO" blocks value. This way the completion ratio can be correctly calculated as node_md_blocks_synced / node_md_blocks_synced_todo.

Finomosec and others added 2 commits May 9, 2024 11:23
renamed pct to percent

Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Frederic <fredleitenb@yahoo.de>
added unit "seconds"

Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Frederic <fredleitenb@yahoo.de>
@Finomosec
Copy link
Author

Finomosec commented May 9, 2024

Maybe instead of exposing the sync percent, we should expose the "TODO" blocks value. This way the completion ratio can be correctly calculated as node_md_blocks_synced / node_md_blocks_synced_todo.

That was my first idea, too, but the data-source (https://github.com/prometheus/procfs/blob/master/mdstat.go) does not (yet) capture/expose this value.

Also node_md_blocks_synced_todo is not a good name. todo sound like remaining, which is not correct.
Maybe to_be_synced would suffice.

But hey! We could calculate it using blocks_synced and the percentage.
What do you think, should we do this?

... but it might be imprecise, especially for low percentage values plus it might yield slightly different results over time, which would be kind of akward.

So maybe better not after all.

I added a request to add it:
prometheus/procfs#636

Signed-off-by: Finomosec <github@meinebasis.de>
@Finomosec Finomosec requested a review from SuperQ May 9, 2024 09:39
@discordianfish
Copy link
Member

Yeah I agree, let's add the TODO blocks to procfs

@SuperQ
Copy link
Member

SuperQ commented May 14, 2024

Released updated procfs: https://github.com/prometheus/procfs/releases/tag/v0.15.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants