Skip to content

HDDS-14960. OM Web UI dashboard for Ozone Snapshot#10027

Open
jojochuang wants to merge 11 commits intoapache:masterfrom
jojochuang:snapshotui
Open

HDDS-14960. OM Web UI dashboard for Ozone Snapshot#10027
jojochuang wants to merge 11 commits intoapache:masterfrom
jojochuang:snapshotui

Conversation

@jojochuang
Copy link
Copy Markdown
Contributor

@jojochuang jojochuang commented Apr 2, 2026

What changes were proposed in this pull request?

HDDS-14960. OM Web UI dashboard for Ozone Snapshot

Please describe your PR in detail:

  • Create a Ozone Snapshot dashboard tab in OM Web UI, that displays:
  1. Usage statstics – number of active, deleted snapshots, size of current snapshot cache.
  2. List of snapshot diff jobs – status text, progress, creation time; search&sort capabilities.
  3. Snapshot internal metrics in OmSnapshotInternalMetrics.

The implementation includes:

  1. Frontend code to display usage statistics, snapshot diff jobs, internal metrics, code to support search and sort capabilities.
  2. Backend code: SnapshotDiffManager implements SnapshotDiffManagerMXBean and register it with the runtime.
  3. Test code to ensure the MXBean is registered correctly, and the values is expected.

SnapshotDiffJob getCodec() renamed to codec() to avoid clashes with JMXBean conventions.

Gemini CLI model gemini-3-flash-preview

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14960

How was this patch tested?

Screenshot 2026-04-01 at 5 53 56 PM

Unit tests, manually inspected frontend web UI after executing this command (screen shot attached)

ozone sh volume create vol1
ozone sh bucket create vol1/bucket1
ozone sh snapshot create vol1/bucket1 snap1
ozone sh key put vol1/bucket1/key1 /etc/krb5.conf
ozone sh snapshot create vol1/bucket1 snap2
ozone sh snapshot diff vol1/bucket1 snap1 snap2
ozone sh snapshot lsDiff vol1/bucket1 -a
ozone sh key delete vol1/bucket1/key1
ozone sh snapshot create vol1/bucket1 snap3
ozone sh snapshot diff vol1/bucket1 snap2 snap3
ozone sh snapshot diff vol1/bucket1 snap1 snap3
ozone sh snapshot lsDiff vol1/bucket1 -a```

@jojochuang jojochuang added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Apr 2, 2026
@jojochuang jojochuang requested review from sadanand48 and smengcl April 2, 2026 01:24
@jojochuang jojochuang marked this pull request as ready for review April 2, 2026 22:32
Copy link
Copy Markdown
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jojochuang for the patch, One more metric is the time taken to compute the diff. This is not part of the SnapDiffJob class currently , we can add it in a separate jira as well

<tbody>
<tr>
<td>Number of Active Snapshots</td>
<td>{{$ctrl.snapshotUsageMetrics.NumSnapshotActive}}</td>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics capture the num of active/deleted snapshots created since the last restart of OM. I think we should indicate that accordingly

@errose28
Copy link
Copy Markdown
Contributor

errose28 commented Apr 6, 2026

Hi @jojochuang I don't think we should duplicate metrics on the web UIs. The page will become overloaded with too much information which Grafana can display better. I suggest making a Grafana dashboard for snapdiff (which AI can also do easily) which contains Usage statstics, Snapshot internal metrics, and any other relevant metrics.

The list of snapdiff jobs is categorical information so that makes sense to put in the Web UI for now, since Recon is not pulling this information.

@jojochuang
Copy link
Copy Markdown
Contributor Author

don't think we should duplicate metrics on the web UIs. The page will become overloaded with too much information which Grafana can display better. I suggest making a Grafana dashboard for snapdiff (which AI can also do easily) which contains Usage statstics, Snapshot internal metrics, and any other relevant metrics.

The list of snapdiff jobs is categorical information so that makes sense to put in the Web UI for now, since Recon is not pulling this information.

I'll keep the list of snapshot diff jobs. We don't yet have the Grafana dashboards tracking Snapshot internal metrics, which is why I put it here initially. That being said, these internal metrics aren't that self explanatory. I would like to make a page that integrates these metrics and present in a way that helps administrators understand, because the internals of Snapshots are asynchronous.

Also reminded me there are a few snapshot related metrics within DeletingServiceMetrics that should be included.

Change-Id: Iea21514269d37a233c816a2cc55be44fe788382a
Change-Id: I93ba5bf171938338f00af6e1f07f27b9cee2bfe2
Change-Id: I5b5a2dd86eb12555e74146bf63f461642ceba058
Change-Id: Ibe1e0c84ec68936f7aa8837d9aea969b2bcc5701
Change-Id: I89b3585bbd5e34364a8a4567f81de485b737d69e
Change-Id: I96660319acfc3b135bd58e592e9d3ff8ebd4a05c
Change-Id: I7e742e806d493ecca8b543f6ad19266b10247cd7
Change-Id: I9f5b5040fe8835c7cc7aae98732368b12a053961
Change-Id: I4a2c082ddacd1868934179b4ec47beb64e046417
Change-Id: If87353d4cdc540b271c183f339dd704e42cb8b46
….html

Change-Id: Ib3521a90ebcdfd3e162fe99c8e2f48f1a4a09e80
@jojochuang
Copy link
Copy Markdown
Contributor Author

Updated. This is the current screenshot.
Screenshot 2026-04-06 at 1 08 00 PM

@jojochuang
Copy link
Copy Markdown
Contributor Author

https://issues.apache.org/jira/browse/HDDS-13181 added the OM Snapshot internal metrics to Grafana.

It doesn't include numSnapshotSetProperties, numSnapshotSetPropertyFails, and no defragmentation metrics.

@smengcl
Copy link
Copy Markdown
Contributor

smengcl commented Apr 9, 2026

Is this a subset of #10055 ? Or should this be merged first before that one. Looks to be the latter

@smengcl
Copy link
Copy Markdown
Contributor

smengcl commented Apr 9, 2026

On a side note, the search box does not quite work as one would expect. It only filters the currently displayed entries, not the entire set. At least this is the case on SCM Web UI Node Status section as I tried earlier. Filed HDDS-15007

@jojochuang
Copy link
Copy Markdown
Contributor Author

Is this a subset of #10055 ? Or should this be merged first before that one. Looks to be the latter

Yes I want to implement these in steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI-gen snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants