Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3336. Metrics for Recon OzoneManager DB sync. #761

Closed
wants to merge 10 commits into from

Conversation

avijayanhwx
Copy link
Contributor

What changes were proposed in this pull request?

Added some useful metrics for tracking how Recon's OM DB requests are going on.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3336

How was this patch tested?

Manually tested on top of #715.
Added unit tests.

@avijayanhwx avijayanhwx changed the title Hdds 3336 master HDDS-3336. Metrics for Recon OzoneManager DB sync. Apr 3, 2020
@avijayanhwx
Copy link
Contributor Author

cc @vivekratnavel / @swagle

Copy link
Contributor

@swagle swagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, overall LGTM.

@avijayanhwx
Copy link
Contributor Author

Thanks for the review @swagle.

Copy link
Contributor

@vivekratnavel vivekratnavel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

@elek
Copy link
Member

elek commented Apr 7, 2020

Thanks the patch @avijayanhwx

FYI: HDDS-3239 introduced message level metrics for all the RPC endpoints. I think some of the metrics can be removed from this patch (eg. generic counter / latency)

@avijayanhwx
Copy link
Contributor Author

Thanks the patch @avijayanhwx

FYI: HDDS-3239 introduced message level metrics for all the RPC endpoints. I think some of the metrics can be removed from this patch (eg. generic counter / latency)

Thank you for the information Marton! I am wondering if I should still keep the counter and time to keep it specific per recon client instance. In the future, more clients may use the same request with different input arguments, and the metrics will be averaged out across all clients.

@elek
Copy link
Member

elek commented Apr 8, 2020

Thank you for the information Marton! I am wondering if I should still keep the counter and time to keep it specific per recon client instance. In the future, more clients may use the same request with different input arguments, and the metrics will be averaged out across all clients.

I am not sure if it will ever happen related to the OM db download RPC, but how knows....

But if you need a client side metrics: what about adding a ProtocolMessageMetrics to the OzoneManagerProtocolClientSideTranslatorPB (with some client specific tag, if needed). In that case you can have well-defined, unified metrics for all the OM calls on the recon side (without increasing the complexity anywhere else in the Recon code).

@avijayanhwx
Copy link
Contributor Author

Thank you for the information Marton! I am wondering if I should still keep the counter and time to keep it specific per recon client instance. In the future, more clients may use the same request with different input arguments, and the metrics will be averaged out across all clients.

I am not sure if it will ever happen related to the OM db download RPC, but how knows....

But if you need a client side metrics: what about adding a ProtocolMessageMetrics to the OzoneManagerProtocolClientSideTranslatorPB (with some client specific tag, if needed). In that case you can have well-defined, unified metrics for all the OM calls on the recon side (without increasing the complexity anywhere else in the Recon code).

@elek I think that is a good idea. I have removed the generic counter and timer metric for the RPC call, and I will do the client specific protocol message metrics in a future JIRA (since there is no other client who is going to use the getDBUpdates API in the near future).

@vivekratnavel
Copy link
Contributor

@elek Please review

Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elek I think that is a good idea. I have removed the generic counter...

Thanks the answer. My feeling is that we can remove even more counters from these in the future, when we will have some generic metrics on the client side (especially with flexible tag support).

But I am fine even with the original patch. I am more interested about having a consensus about the middle-term plans and have a plan how it can be achieved.

I assume you need the numbers right now, and the proposed generic solution is not yet there...

I am merging it now, thanks the contribution.

@elek elek closed this in f58cae9 Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants