New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Fallback queues #493
Comments
Thanks @H4dr1en , this sounds like a great idea. |
yes, this is exactly what I have in mind 👍 |
Hi @bmartinn , do you have any update on the GCP autosccaler? |
Thanks for the ping @tienduccao ! |
Great news, thanks Martin
…On Sat, 8 Jan 2022, 02:31 Martin.B, ***@***.***> wrote:
Thanks for the ping @tienduccao <https://github.com/tienduccao> !
Things were delayed a bit, but I can update that the GCP is ready and will
be released to the community (SaaS) version and then sync back to
repository. I'm hoping it will not take more than a couple of weeks :)
—
Reply to this email directly, view it on GitHub
<#493 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADBNCU4WTEQ3J2UCQZBIABTUU6HXRANCNFSM5IJ57ZTA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi,
Often I queue an experiment in a queue that uses on-demand GPU instances in aws and the clearml aws autoscaler keeps failing with the following error:
I wonder if there is an easy way of extending the aws autoscaler to detect such errors of
InsufficientInstanceCapacity
and use a different availability zone. Given that this would mean that some other aws properties (eg. subnet, security groups, etc.) should be different, we could think of having a "fallback to queue" mechanism in the aws autoscaler. This mechism would work as follows:In practice, this would allow to have one queue configuration per availability zone. The autoscaler could then spin up an instance faster.
The text was updated successfully, but these errors were encountered: