-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default machineCreateAttempts
to 3
#487
Conversation
@drigodwin interesting. I'm not sure who we should optimise for though, with the defaults. The problem is if someone has misconfigured their location. For example, It's a particular concern because of problems like https://issues.apache.org/jira/browse/JCLOUDS-1165, which can hit someone using the defaults in a common cloud (e.g. aws-ec2). Maybe we compromise with Much longer term, we could try to make the activities view clearer, to show that it's moved onto attempt number 2 (cc @m4rkmckenna @tbouron). That probably would require some significant restructuring of the low-level tasks executed for provisioning though. |
@aledsage From a UI perspective, I think the best way would be to have one subtask per attempt. It would then be obvious that attempt 1 failed, and for which reason. If we cannot do that, we could imagine introducing a new status @m4rkmckenna thoughts? |
@tbouron A task per try would be make it clear what happened An hour to fail provisioning (due to misconfiguration) seems like terrible UX so 2 attempts seems like a good compromise |
What happens to the machines in case provisioning fails? I've seen cases where it would actually create the machine but would fail for some other reason and the machines would be left running, never managed by brooklyn. |
I presume you mean problems such as BROOKLYN-264 @neykov? I agree that hanging instances are a problem but I'm not sure it's a good reason to not make this change here. I agree with @aledsage that this could cause a long wait when a location is incorrectly specified so I've reduce it to |
Yes it's a similar problem. Agree. |
What are the cases when you had to increase this parameter? If there is a cloud API which fails to create a VM then this should be fixed on jclouds level. |
@bostko clouds will sometimes return a machine that is dead-on-arrival (DOA). It is impossible to ssh to that machine, no matter how long you wait. Sometimes (less often) the VM being provisioned in that cloud might fail, and will report a VM status of "starting" and then "error". It's not as simple as the cloud API having failed to create the VM - a VM exists, but it's unusable. In those cases, jclouds will not retry. That's fine - we don't want jclouds to do the retry logic, and instead to tell us about the failure. We'll get back the VM id that was provisioned, and can then call jclouds to destroy the VM. Within Brooklyn, we can decide whether to retry. It varies how often these DOA VMs happen. From my experience, it used to be about 1 in 50 in AWS, but is even less common now. In some other clouds, it's more common. This param is particularly important if provisioning bigger clusters, or many apps, as obviously it's more likely to happen if provisioning many things. |
I didn't meet DOA machines. |
@neykov @drigodwin for VMs being "orphaned", not managed by Brooklyn, I think it's good enough for now. The only times I know of it happening are:
For (2), there's not much we can do - but we could improve things a bit:
Of course if someone also sets |
In testing I've found that increasing the
machineCreateAttempts
to 3 makes the provisioning of machines more robust. I think it should be made default.