Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] azure-init should add retries around IMDS and Wireserver operations #63

Closed
anhvoms opened this issue Mar 19, 2024 · 3 comments · Fixed by #107 or #119
Closed

[RFE] azure-init should add retries around IMDS and Wireserver operations #63

anhvoms opened this issue Mar 19, 2024 · 3 comments · Fixed by #107 or #119
Labels
feature New feature or request

Comments

@anhvoms
Copy link
Contributor

anhvoms commented Mar 19, 2024

Current situation

There's no retry when REST API calls to IMDS or wireserver (goal_state, report_health)

Impact

Without retry, if there's a transient issue from Azure platform, provisioning will fail

Additional information

When to retry and how many times/how long to retry is a complex topic, especially when IMDS/Wireserver does not provide any guidance. This is the current behavior from cloud-init (ref, ref), which we can use as a reference (or perhaps we can provide this as a config that can be configured within the image? e.g., /etc/azure-init/azure-init.conf)

Total retrying time for IMDS should total no more than 5 minutes, for Wireserver 20 minutes.
Retry around Connection timeout/Read timeout: timeout for rest call should be set at 30s
Retry around non-200 http error codes (410, 404, 503, 400, 500, 429): timeout should be set at 2s, with backoff of 1s

@anhvoms anhvoms added the feature New feature or request label Mar 19, 2024
@jeremycline
Copy link
Member

One thing we might want to consider is replacing the hand-written IMDS client with https://crates.io/crates/azure_svc_imds from https://github.com/Azure/azure-sdk-for-rust/ which, based on the documentation, already implements retries.

@anhvoms
Copy link
Contributor Author

anhvoms commented Mar 20, 2024

One thing we might want to consider is replacing the hand-written IMDS client with https://crates.io/crates/azure_svc_imds from https://github.com/Azure/azure-sdk-for-rust/ which, based on the documentation, already implements retries.

Oh, this is nice. I wasn't aware of this crate. We should check it out and see if the retry policy is easy to customize to our need.

@anhvoms
Copy link
Contributor Author

anhvoms commented Aug 21, 2024

This one should not be closed yet as we have only added retries to IMDS, and not wireserver. Communication to wireserver, similarly, need to be retried upon failure (goalstate.rs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
2 participants