New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU node-pool with nvidia.com/gpu taint: missing ExtendedResourceToleration admission controller? #1449
Comments
Any progress on this issue? |
This issue has been automatically marked as stale because it has not had activity in 90 days. It will be closed in 30 days if no further activity occurs. Thank you! |
Thanks for the feedback, it's a great ask. Added to the backlog |
@palma21 Is there any update on this issue ? |
I have been scratching my head from last 2 days on why it isn't working where as it used to work on GCP without adding toleration block to each pod explicitly. Any update on when can we expect this feature in AKS, looks like the ticket is in opened 6 months back and it's still in backlog :( , it's a quite old feature , we have been using this on GCP from more than a year and right now we are migrating some of our workloads from gcp to azure and we require this feature to make sure non gpu workloads doesn't land on gpu Nodes and we have quite a lot of them, it would be a blocker issue for our migration |
Issue is fairly old but was only added to the backlog 27 days ago though 😄 (as of this comment) This is something we actively working on and plan to have for next month. Thanks for raising the difference callout, it might be something other folks encounter too as they move. We'll prioritize accordingly. |
Thank you for the feature request. I'm closing this issue as this feature has shipped and it hasn't had activity for 7 days. |
What happened:
The ExtendedResourceToleration admission controller seems to be missing: Pods using
nvidia.com/gpu
extended resources don't have thenvidia.com/gpu
toleration automatically added.What you expected to happen:
I would expect that requesting a
nvidia.com/gpu
would add anvidia.com/gpu
toleration (via the ExtendedResourceToleration admission controller), so I can properly and easily use multi node-pools.How to reproduce it (as minimally and precisely as possible):
nvidia.com/gpu=present:NoSchedule
nvidia.com/gpu
Resources are available on the nodeAnything else we need to know?:
ExtendedResourceToleration
Environment:
kubectl version
): AKS v1.15.7Standard_NC6_Promo
nvidia.com/gpu
The text was updated successfully, but these errors were encountered: