Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use max of instance type per ASG #333

Open
thorro opened this issue Mar 25, 2019 · 11 comments
Open

Use max of instance type per ASG #333

thorro opened this issue Mar 25, 2019 · 11 comments

Comments

@thorro
Copy link

thorro commented Mar 25, 2019

Issue type

  • Feature Idea

At the moment, cheapest spot instance is used, which is fine. Issue arises when AWS reclaims one instance type, then all of those instances can go away at the same time.

We are willing to sacrifice some cost savings by diversifying instances more.

A new setting could be defined as:

  • max_instances_of_same_type = number or percentage

So AutoSpotting would only launch instances up to that number. After that, it would look for second cheapest option and so on.

@cristim
Copy link
Member

cristim commented Mar 25, 2019

@thorro Thanks for reporting this issue.

We used to have such a feature hardcoded into the logic, automatically switching the instance type if more than 20% of the instances were in the same AZ/type combination and would be outbid and then terminated if the spot price increased.

About a year ago we removed that because AWS is no longer terminating all the instances at once, but randomly claiming instances from a given instance type over time, regardless of the spot price and bid, which hopefully allows us to launch another instance with a different instance type.

Ever since this was changed I haven't seen anyone complain that all their instances are gone.

Have you actually seen this happen in practice?

I would be open to re-add this, and make it configurable as you suggested, as long as

  1. there are enough people who report this issue
  2. someone would contribute a PR for implementing it

Alternatively I can also implement it if at least a couple of Patrons are asking for it.

@thorro
Copy link
Author

thorro commented Mar 25, 2019

Hi @cristim

I don't have that much experience with this, as we don't run that many spot instances yet.

About five days ago AWS terminated all 3 instances of the same type in one ASG. I think they were all t3.2xlarge.

But I've noticed just on friday AWS terminated one spot instance of i3.4xlarge type, other four kept running. So this looks more like the case you describe.

I think it may depend on the instance count they need. If they need a lot of them, all or almost all could go down. If not they take one here and there, not to upset a single customer too much.

Could you post the removed hardcoded code or a link to a commit as it would be a good starting point for our own mods. Thanks.

@cristim
Copy link
Member

cristim commented Mar 25, 2019

Considering how the spot market works we can't exclude such scenarios, especially for popular instance types where there may be a lot of churn. Did your group lose all the capacity before any new instances were started?

It's definitely better to be prepared for this if possible, and as I said we can have this brought back if enough people complain about it.

As for the code, have a look around here:

https://github.com/AutoSpotting/AutoSpotting/blob/20fced19162c4ee1de87852fc7297e1bcf6c8353/core/instance.go#L147-L160

@thorro
Copy link
Author

thorro commented Mar 25, 2019

That ASG lost all capacity, luckily for us it was not a production workload.

Hope some more people chime in. Thanks for the code pointer, will brush up on my Go skills. :)

@cristim
Copy link
Member

cristim commented Apr 12, 2019

@ChienHuey just volunteered on Github to implement this as part of a hackathon.
A few things I mentioned that may make it a bit more challenging

I'd love for it to be configurable similarly with how AWS does it for the mixed spot ASGs
maybe that configurability work may need some more time than a full day of work
basically to be able to toggle it on/off using stack parameters and override using tags like we have for other config options, but also to control the level of instance type spread per type/AZ combination maybe defaulting to 2 when enabled
but the value 2 to be configurable to more if wanted so
also via stack params and overrideable by tags

@cristim
Copy link
Member

cristim commented Sep 18, 2019

@ChienHuey do you have any progress on this work?

@cristim
Copy link
Member

cristim commented Mar 6, 2023

@thorro is this issue still of interest to you?

@ChienHuey let me know if you're still interested to work on this.

@thorro
Copy link
Author

thorro commented Mar 6, 2023

@cristim no, we don't use Autospotting at the moment.

@cristim
Copy link
Member

cristim commented Mar 6, 2023

Thanks @thorro!

I'd love to learn your reasons why and what you're using instead, as well as any other feedback about AutoSpotting you may have.

BTW, Last week I released this open source Spot savings estimator tool https://github.com/LeanerCloud/savings-estimator/


I hope you find it useful and I'd also love to hear some honest feedback about it.

@thorro
Copy link
Author

thorro commented Mar 7, 2023

We've moved to EKS Managed Node Groups, don't know if Autospotting would work with that at all?

Will check out the tool, thanks!

@cristim
Copy link
Member

cristim commented Mar 7, 2023

Yes, it should work but not if you configured them to use Spot. I'm intentionally skipping those in order to not interfere or cause race conditions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants