Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider availability zone when picking an instance type #25

Closed
nmeierpolys opened this issue Oct 29, 2016 · 6 comments
Closed

Consider availability zone when picking an instance type #25

nmeierpolys opened this issue Oct 29, 2016 · 6 comments

Comments

@nmeierpolys
Copy link
Contributor

nmeierpolys commented Oct 29, 2016

Our instances all run in the us-east-1d availability zone. When the autospotting process runs, it's finding the cg1.4xlarge instance type as the cheapest option and trying to use that to request spot instances. Unfortunately, the cg1.4xlarge instance type isn't available in that us-east-1d AZ, only us-east-1c. We get this error on the spot request "capacity-not-available: There is no Spot capacity available that matches your request. "

For use cases like ours that are limited to a specific AZ, it would be really helpful to consider the AZ when retrieving spot pricing info, and only pick an instance type if it's available in the AZ.

When this happens, it continues to request a new spot instance each time the process runs, which fairly quickly uses up the maximum number of open spot instance requests that AWS allows, preventing other spot instance requests.

@nmeierpolys
Copy link
Contributor Author

I was able to get past this for our instances by changing the code to skip that instance type and hosting our own binaries. I'll see if I can figure out why it's not handling the case better in general and hopefully put in a PR when I get a chance.

@cristim
Copy link
Member

cristim commented Oct 30, 2016

Good catch @nmeierpolys, thanks for reporting this!

Off-topic: I'd like to learn more about your use case for only running in a single region, for reliability concerns that's not something I would normally do, especially with spot instances, and I'm always curious to learn about such interesting use cases happening 'in the wild'.

Regarding the issue you noticed, that sounds like a real problem. As far as I can see in the spot instance pricing history in my AWS console, that instance type is indeed only available in a single AZ in US-East-1(in my account that is labeled as us-east-1a), so I guess the algorithm is somehow not checking against this edge case.

I will try to reproduce this in my own AWS account and I will have a look at the source code to see why does that unavailable instance type appear to be the cheapest.

Somehow related to this issue, I think I could implement a way to track instance launch failures for a given instance type/AZ combination, and somehow temporarily blacklist them for a few hours if the instance fails to launch over multiple autospotting runs in a row. I will create a new issue for implementing this kind of feature.

Since you are the first user to report self-hosting binaries, please consider submitting a PR documenting how you did it, I'd really like to have a bit of documentation for this.

@nmeierpolys
Copy link
Contributor Author

Thanks @cristim. Right now, we're running in a single AZ because our system and deploys expect everything to be in a single subnet. There's nothing technically preventing us from changing this, it just hasn't been a big enough priority to commit time to it yet.

I'd be happy to add some documentation for self-hosting the binaries. Hopefully I can get a PR for that your way this week sometime.

@cristim
Copy link
Member

cristim commented Aug 5, 2017

@nmeierpolys, is this still happening on your environment with the latest version?

@nmeierpolys
Copy link
Contributor Author

Sorry, but I'm no longer working on the project that used autospotting, so I'm not able to test it out with the latest version.

@cristim
Copy link
Member

cristim commented Aug 7, 2017

Thanks, in this case I'm closing this issue.

@cristim cristim closed this as completed Aug 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants