Conversation
Builds the controller/manager container for both amd64 and arm64, in a single multi-platform image. This allows control plane nodes running arm64/aarch64 CPUs to be able to manage clusters with AMD GPUs. Not changes: - Build host still has to be amd64/x86_64 - GPU node still has to be amd64/x86_64 - e2e still only tests amd64/x86_64 Benefits: - Heterogeneous/multi-arch clusters do not need dedicated amd64/x86_64 control plane nodes just to control AMD GPU scheduling. Resolves ROCm#331
Member
|
Hi @cyanidium , thanks for the PR, really appreciate the contribution, this commit needs to go through internal review and CI test on our side, will get back to you once we decide to move forward with this change. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This allows control plane nodes running arm64/aarch64 CPUs to be able to manage clusters with AMD GPUs. Heterogeneous/multi-arch clusters do not need dedicated amd64/x86_64 control plane nodes just to control AMD GPU scheduling.
Technical Details
Builds the controller/manager container for both amd64 and arm64, in a single multi-platform image.
Not changed:
Resolves #331
Test Plan
Tested working with a cluster like this:
Helm values:
Test Result
gpu-operatorpod schedules correctly and runs on the arm64/aarch64 control plane nodes.workflowpod schedules and runs correctly on an amd64/x86_64 worker node. GPU resources are available to pods that request them.Submission Checklist