Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

healthchecks : Reduce Network Check request max time. #1452

Merged

Conversation

franciscovalentecastro
Copy link
Contributor

@franciscovalentecastro franciscovalentecastro commented Oct 2, 2023

Description

The recent execution of the TestNetworkHealthCheck show that the Startup Checks take 7/8 minutes to complete when a Firewall is blocking egress attempts to connect. This is due to every request, 5 in total, in Network Check take at most 1 minute (with exponential backoff added in #1376) and API Check now (#1433) has a one retry after 6 sec of a failure.

Reducing max request time in Network Check to 30 secs.

Sample of log showing 7 minute difference when first Startup Check result shows

Oct 02 16:03:06 github-test-20231002-cb925-c9115a6e-d2a9-4b5f-9628-d11bab79b502 google_cloud_ops_agent_engine[1929]:         processors: [metrics_filter]
Oct 02 16:10:26 github-test-20231002-cb925-c9115a6e-d2a9-4b5f-9628-d11bab79b502 google_cloud_ops_agent_engine[1929]: 2023/10/02 16:10:26 [Ports Check] Result: PASS
Oct 02 16:10:26 github-test-20231002-cb925-c9115a6e-d2a9-4b5f-9628-d11bab79b502 google_cloud_ops_agent_engine[1929]: 2023/10/02 16:10:26 [Network Check] Result: FAIL, Error code: LogApiConnErr, Failure: Request to Logging API failed., Solution: Check your internet connection and firewall rules., Resource: https://cloud.google.com/logging/docs/agent/ops-agent/troubleshooting

Related issue

How has this been tested?

The resulting time of the TestNetworkHealthCheck.

Checklist:

  • Unit tests
    • Unit tests do not apply.
    • Unit tests have been added/modified and passed for this PR.
  • Integration tests
    • Integration tests do not apply.
    • Integration tests have been added/modified and passed for this PR.
  • Documentation
    • This PR introduces no user visible changes.
    • This PR introduces user visible changes and the corresponding documentation change has been made.
  • Minor version bump
    • This PR introduces no new features.
    • This PR introduces new features, and there is a separate PR to bump the minor version since the last release already.
    • This PR bumps the version.

@franciscovalentecastro
Copy link
Contributor Author

franciscovalentecastro commented Oct 2, 2023

Now the TestNetworkHealthCheck takes 5 minutes :

Oct 02 17:47:07 github-test-20231002-672a8-b98af40d-444f-4851-885b-d82d6c6ffa38 google_cloud_ops_agent_engine[1943]:         processors: [metrics_filter]
Oct 02 17:52:27 github-test-20231002-672a8-b98af40d-444f-4851-885b-d82d6c6ffa38 google_cloud_ops_agent_engine[1943]: 2023/10/02 17:52:27 [Ports Check] Result: PASS
Oct 02 17:52:27 github-test-20231002-672a8-b98af40d-444f-4851-885b-d82d6c6ffa38 google_cloud_ops_agent_engine[1943]: 2023/10/02 17:52:27 [Network Check] Result: FAIL, Error code: LogApiConnErr, Failure: Request to Logging API failed., Solution: Check your internet connection and firewall rules., Resource: https://cloud.google.com/logging/docs/agent/ops-agent/troubleshooting

@franciscovalentecastro franciscovalentecastro merged commit 36d8890 into master Oct 2, 2023
60 of 62 checks passed
@franciscovalentecastro franciscovalentecastro deleted the fcovalente-reduce-exp-backoff-max-time branch October 2, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants