Skip to content

HDDS-7506. [Snapshot] Expose more snapshot metrics under OMMetrics#4164

Merged
smengcl merged 5 commits intoapache:HDDS-6517-Snapshotfrom
xBis7:HDDS-7506
Jan 20, 2023
Merged

HDDS-7506. [Snapshot] Expose more snapshot metrics under OMMetrics#4164
smengcl merged 5 commits intoapache:HDDS-6517-Snapshotfrom
xBis7:HDDS-7506

Conversation

@xBis7
Copy link
Contributor

@xBis7 xBis7 commented Jan 9, 2023

What changes were proposed in this pull request?

This PR is adding more snapshot metrics in OMMetrics to keep track of the count of each snapshot status.
Right now there is no implementation for snapshot delete or reclaim and therefore these metrics are not getting a value anywhere in the code.

We are going over the snapshot table on OM start(), restart() and reloadOMState() and get a count for every snapshot status. Also we are incrementing the number of active snapshots during every create request. In the future we might want to decrement the number of active when incrementing the number of delete and similarly decrement number of delete when incrementing number of reclaimed.

I've checked the snapshot metrics from HDFS and couldn't find something applicable here that's missing.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7506

How was this patch tested?

A new test was added in TestOmMetrics under integration-test package. Also we might want to expand this test method in the future as more snapshot features are made available.

This patch was also tested manually in a docker cluster like so

create a snapshot

❯ docker-compose up --scale datanode=3 -d
Creating network "ozone_default" with the default driver
Creating ozone_recon_1    ... done
Creating ozone_om_1       ... done
Creating ozone_s3g_1      ... done
Creating ozone_scm_1      ... done
Creating ozone_datanode_1 ... done
Creating ozone_datanode_2 ... done
Creating ozone_datanode_3 ... done

❯ docker exec -it ozone_om_1 bash
bash-4.2$ ozone sh volume create /vol1
bash-4.2$ ozone sh bucket create /vol1/bucket1 
bash-4.2$ ozone sh key put /vol1/bucket1/key1 README.md
bash-4.2$ ozone sh snapshot create /vol1/bucket1 snap1

on 0.0.0.0:9874/jmx

    "NumSnapshotActive" : 1,
    "NumSnapshotCreateFails" : 0,
    "NumSnapshotCreates" : 1,

create another snapshot

bash-4.2$ ozone sh key put /vol1/bucket1/key2 /etc/hosts
bash-4.2$ ozone sh snapshot create /vol1/bucket1 snap2
bash-4.2$ exit
exit

check 0.0.0.0:9874/jmx again

    "NumSnapshotActive" : 2,
    "NumSnapshotCreateFails" : 0,
    "NumSnapshotCreates" : 2,

restart the OM

❯ docker restart ozone_om_1 
ozone_om_1

check 0.0.0.0:9874/jmx right after restarting om

    "NumSnapshotActive" : 0,
    "NumSnapshotCreateFails" : 0,
    "NumSnapshotCreates" : 0,

after a minute, on 0.0.0.0:9874/jmx

    "NumSnapshotActive" : 2,
    "NumSnapshotCreateFails" : 0,
    "NumSnapshotCreates" : 0,

@xBis7
Copy link
Contributor Author

xBis7 commented Jan 9, 2023

@GeorgeJahad @smengcl Can you take a look at this PR? I'm not that familiar with snapshot code, so I don't know if there is something I missed or another metric you would like me to add. If there is nothing else to be added here, I can convert this to an actual PR. BTW, I have a green workflow build on my fork.

@xBis7 xBis7 changed the title HDDS-7506. Expose more snapshot metrics under OMMetrics HDDS-7506. [Snapshot] Expose more snapshot metrics under OMMetrics Jan 9, 2023
@smengcl smengcl marked this pull request as ready for review January 12, 2023 17:39
@smengcl
Copy link
Contributor

smengcl commented Jan 12, 2023

@xBis7 Would you rebase this patch as well? Somehow the CI build is failing, even after I retrigger the whole CI job run:

https://github.com/apache/ozone/actions/runs/3904654043/jobs/6672248017

Error:  COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
Error:  /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOmMetrics.java:[394,33] method createBucketInfo in class org.apache.hadoop.ozone.om.TestOmMetrics cannot be applied to given types;

Just do a git remote update apache && git merge apache/HDDS-6517-Snapshot on your PR branch then push.

@xBis7
Copy link
Contributor Author

xBis7 commented Jan 12, 2023

@smengcl There was a conflict in TestOmMetrics. I rebased and resolved it.

@smengcl smengcl added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Jan 18, 2023
@xBis7
Copy link
Contributor Author

xBis7 commented Jan 19, 2023

@GeorgeJahad Thanks for looking into this. I've addressed your comments.

@prashantpogde prashantpogde self-assigned this Jan 19, 2023
@smengcl smengcl merged commit 01762a3 into apache:HDDS-6517-Snapshot Jan 20, 2023
@smengcl
Copy link
Contributor

smengcl commented Jan 20, 2023

Thanks @xBis7 for the metrics addition. Thanks @GeorgeJahad for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants