Skip to content

mds: add new performance and subvolume utilization metrics#66551

Merged
vshankar merged 6 commits intoceph:mainfrom
salieri11:wip-igolikov-new-metrics-74135-73700
Mar 3, 2026
Merged

mds: add new performance and subvolume utilization metrics#66551
vshankar merged 6 commits intoceph:mainfrom
salieri11:wip-igolikov-new-metrics-74135-73700

Conversation

@salieri11
Copy link
Copy Markdown
Contributor

@salieri11 salieri11 commented Dec 8, 2025

Summary

This PR adds new observability metrics for CephFS MDS:

  1. MDS Rank Performance Metrics - Per-rank CPU utilization and open requests count
  2. Subvolume Utilization Metrics - Quota limit and current space usage for subvolumes

Fixes: https://tracker.ceph.com/issues/73700
Fixes: https://tracker.ceph.com/issues/74135

Changes

New per-rank counters exposed via mds_rank_perf labeled perf counters:

Metric Description
cpu_usage Sum of per-core CPU utilization (100 = one fully saturated core; can exceed 100 on multi-core)
open_requests Number of metadata requests currently in flight

These metrics are sampled periodically and aggregated by rank 0 for cluster-wide visibility.

Subvolume Utilization Metrics

New fields added to subvolume metrics:

Metric Description
quota_bytes Configured quota limit in bytes (0 if no quota/unlimited)
used_bytes Current space usage based on recursive inode statistics (rstat.rbytes)

Key implementation details:

  • Quota info is cached in MetricsHandler::subvolume_quota map, updated via MDCache::broadcast_quota_to_client
  • Cache entries are evicted after 2 × subv_metrics_window_interval of inactivity to prevent unbounded memory growth
  • used_bytes is sourced from cached broadcast values with fallback to dynamic rstat fetch

Important behavioral note:

  • Only data I/O (reads/writes to file contents) triggers metric updates
  • Metadata operations (mkdir, rmdir, unlink, rename, chmod, etc.) do NOT generate I/O metrics
  • After file deletions, used_bytes updates on next data I/O or quota broadcast

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@salieri11 salieri11 marked this pull request as ready for review December 8, 2025 13:00
@salieri11 salieri11 requested review from a team as code owners December 8, 2025 13:00
@salieri11
Copy link
Copy Markdown
Contributor Author

jenkins test make check

@salieri11
Copy link
Copy Markdown
Contributor Author

jenkins test make check arm64

@salieri11 salieri11 force-pushed the wip-igolikov-new-metrics-74135-73700 branch from a298bcc to 0e296e4 Compare December 9, 2025 09:34
@sumabai sumabai added the wip-suma-testing upstream new feature testing label Dec 16, 2025
@salieri11 salieri11 force-pushed the wip-igolikov-new-metrics-74135-73700 branch from 0e296e4 to a589dcf Compare December 17, 2025 13:33
@salieri11
Copy link
Copy Markdown
Contributor Author

jenkins test windows

1 similar comment
@salieri11
Copy link
Copy Markdown
Contributor Author

jenkins test windows

@salieri11
Copy link
Copy Markdown
Contributor Author

jenkins test api

@salieri11 salieri11 force-pushed the wip-igolikov-new-metrics-74135-73700 branch from a589dcf to 9e64628 Compare December 30, 2025 17:46
@salieri11 salieri11 force-pushed the wip-igolikov-new-metrics-74135-73700 branch 2 times, most recently from 1589259 to 6ec5eb9 Compare January 14, 2026 10:57
@salieri11
Copy link
Copy Markdown
Contributor Author

jenkins test api

@cloudbehl cloudbehl mentioned this pull request Feb 9, 2026
14 tasks
Copy link
Copy Markdown
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping some more comments. I'm still reviewing this closely and would be done by tomorrow.

Copy link
Copy Markdown
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits. Otherwise LGTM and would be ready to run through QA tests once the nits are addressed. Nice work @salieri11

@vshankar
Copy link
Copy Markdown
Contributor

jenkins retest this please

@vshankar
Copy link
Copy Markdown
Contributor

jenkins test make check

@anthonyeleven
Copy link
Copy Markdown
Contributor

Nice. Can we expect a backport to Tentacle or Squid?

@vshankar
Copy link
Copy Markdown
Contributor

Nice. Can we expect a backport to Tentacle or Squid?

Sorry! This is based on a biggish feature that will be available for Umbrella.

@vshankar
Copy link
Copy Markdown
Contributor

This PR is under test in https://tracker.ceph.com/issues/75073.

Igor Golikov and others added 6 commits February 24, 2026 20:28
Perf merics: CPU% and number of open requests
Subvolume utilization metrics: quota info and current size

Signed-off-by: Igor Golikov <igolikov@redhat.com>
Fixes: https://tracker.ceph.com/issues/74135
Fixes: https://tracker.ceph.com/issues/73700
Signed-off-by: Igor Golikov <igolikov@redhat.com>
Fixes: https://tracker.ceph.com/issues/74135
test for CPU utilizationa and number of open requests

Signed-off-by: Igor Golikov <igolikov@redhat.com>
Fixes: https://tracker.ceph.com/issues/73700
Add comperehensive tests to validate correct quota and current size
metrics for subvolumes

Signed-off-by: Igor Golikov <igolikov@redhat.com>
Fixes: https://tracker.ceph.com/issues/74135
docs for subvolume utilization and MDS perf metrics

Signed-off-by: Igor Golikov <igolikov@redhat.com>
Fixes: https://tracker.ceph.com/issues/74135
Fixes: https://tracker.ceph.com/issues/73700
…ifecycle()

Signed-off-by: Venky Shankar <vshankar@redhat.com>
vshankar added a commit to vshankar/ceph that referenced this pull request Feb 25, 2026
@vshankar vshankar force-pushed the wip-igolikov-new-metrics-74135-73700 branch from a09dbb5 to 8b9eddf Compare February 25, 2026 07:06
@vshankar
Copy link
Copy Markdown
Contributor

added a fix to use safe_while to avoid racy check and squashed commits.

@vshankar
Copy link
Copy Markdown
Contributor

jenkins test make check arm64

@vshankar
Copy link
Copy Markdown
Contributor

jenkins test make check

@vshankar vshankar force-pushed the wip-igolikov-new-metrics-74135-73700 branch from 8b9eddf to 2850b3f Compare February 25, 2026 12:28
@vshankar
Copy link
Copy Markdown
Contributor

force pushed -- no changes

Copy link
Copy Markdown
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar vshankar merged commit d962d5e into ceph:main Mar 3, 2026
13 checks passed
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 3, 2026

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/22619802892

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 3, 2026

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/22619802892

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants