Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

healthchecks : Retry monitoringPing when InvalidArgument error. #1415

Merged
merged 2 commits into from
Sep 15, 2023

Conversation

franciscovalentecastro
Copy link
Contributor

@franciscovalentecastro franciscovalentecastro commented Sep 13, 2023

Description

The API Check from the healthchecks package returns an InvalidArgument error when the Ops Agent is restarted very quickly :

[API Check] Result: ERROR, Detail: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.

This is due to the maximum rate of "one point each 5 seconds" at which data can be written to a single time series. This PR retries monitoringPing 2 times with a 6 second backoff to avoid going above the rate.

Related issue

b/291631906

How has this been tested?

Checklist:

  • Unit tests
    • Unit tests do not apply.
    • Unit tests have been added/modified and passed for this PR.
  • Integration tests
    • Integration tests do not apply.
    • Integration tests have been added/modified and passed for this PR.
  • Documentation
    • This PR introduces no user visible changes.
    • This PR introduces user visible changes and the corresponding documentation change has been made.
  • Minor version bump
    • This PR introduces no new features.
    • This PR introduces new features, and there is a separate PR to bump the minor version since the last release already.
    • This PR bumps the version.

@franciscovalentecastro franciscovalentecastro requested review from a team and igorpeshansky and removed request for a team September 15, 2023 15:50
func monitoringPing(ctx context.Context, client monitoring.MetricClient, gceMetadata resourcedetector.GCEResource) error {
func isInvalidArgumentErr(err error) bool {
apiErr, ok := err.(*apierror.APIError)
return ok && apiErr.GRPCStatus().Code() == codes.InvalidArgument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to consider matching the description as well as the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidArgument can appear in a malformed metric write request. But in this case, the monitoringPing is generated to be correct. So it may have value checking for the specific error One or more points were written more frequently ... messaged.

The error APIError has the method Reason() that could match the text. But I'm unsure if that would be maintainable if the message changes in the future. We are also retrying once, so this won't extend long in an edge case.

@franciscovalentecastro franciscovalentecastro merged commit 71fc731 into master Sep 15, 2023
51 of 56 checks passed
@franciscovalentecastro franciscovalentecastro deleted the fcovalente-retry-monitoring-ping branch September 15, 2023 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants