New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataSourceOpenStack: retry also when first waiting #1529
Conversation
There's a first step to wait and filter out non responsive metadata services, and the `retries` option is not taken into account there. This patch adds the retrying logic to that first step too. LP: #1979049 Signed-off-by: David Caro <me@dcaro.es> Signed-off-by: David Caro <dcaro@wikimedia.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please! This would help with overloaded metadata servers and could also help if we're rotating through a set of servers some of which are unresponsive.
Depends on #1527 |
@david-caro , We should already be retrying until we hit the On some level, I think that if the datasource isn't ready within 10 seconds, that's really an issue that should be fixed on the datasource side, but I understand why longer timeouts can help. Historically, we've been hesitant to increase the default timeouts in openstack because on certain architectures that could lead to longer timeouts for non-openstack datasources. See #501 for context. That said, those concerns aren't as relevant anymore as most other datasources do get reported correctly during early boot. Can the issues you're seeing be fixed by updating the configuration for your particular openstack installation? Configuration can be set in /etc/cloud/cloud.cfg, /etc/cloud/cloud.cfg.d, or as vendordata to all of your instances. |
Hmm, I think I understand the logic now, tell me if I'm right: So when doing the If so, I can try tweaking the configuration yes,
I agree with that, but we are trying to minimize the impact until we fix it properly.
|
Yes, but note that timeout is only relevant if the server is blocking on returning a response. If the request returns with error immediately then the |
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
Oh yes, changing the config seemed to work, thanks! I'll close the pr/issue |
Proposed Commit Message
There's a first step to wait and filter out non responsive metadata
services, and the
retries
option is not taken into account there.This patch adds the retrying logic to that first step too.
Test Steps
Checklist:
This depends on #1527 (CLA signing)