You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's no retry when REST API calls to IMDS or wireserver (goal_state, report_health)
Impact
Without retry, if there's a transient issue from Azure platform, provisioning will fail
Additional information
When to retry and how many times/how long to retry is a complex topic, especially when IMDS/Wireserver does not provide any guidance. This is the current behavior from cloud-init (ref, ref), which we can use as a reference (or perhaps we can provide this as a config that can be configured within the image? e.g., /etc/azure-init/azure-init.conf)
Total retrying time for IMDS should total no more than 5 minutes, for Wireserver 20 minutes.
Retry around Connection timeout/Read timeout: timeout for rest call should be set at 30s
Retry around non-200 http error codes (410, 404, 503, 400, 500, 429): timeout should be set at 2s, with backoff of 1s
The text was updated successfully, but these errors were encountered:
This one should not be closed yet as we have only added retries to IMDS, and not wireserver. Communication to wireserver, similarly, need to be retried upon failure (goalstate.rs)
Current situation
There's no retry when REST API calls to IMDS or wireserver (goal_state, report_health)
Impact
Without retry, if there's a transient issue from Azure platform, provisioning will fail
Additional information
When to retry and how many times/how long to retry is a complex topic, especially when IMDS/Wireserver does not provide any guidance. This is the current behavior from cloud-init (ref, ref), which we can use as a reference (or perhaps we can provide this as a config that can be configured within the image? e.g., /etc/azure-init/azure-init.conf)
Total retrying time for IMDS should total no more than 5 minutes, for Wireserver 20 minutes.
Retry around Connection timeout/Read timeout: timeout for rest call should be set at 30s
Retry around non-200 http error codes (410, 404, 503, 400, 500, 429): timeout should be set at 2s, with backoff of 1s
The text was updated successfully, but these errors were encountered: