Skip to content

arm64 controller#549

Open
cyanidium wants to merge 1 commit intoROCm:mainfrom
cyanidium:feat/multi-arch
Open

arm64 controller#549
cyanidium wants to merge 1 commit intoROCm:mainfrom
cyanidium:feat/multi-arch

Conversation

@cyanidium
Copy link
Copy Markdown

Motivation

This allows control plane nodes running arm64/aarch64 CPUs to be able to manage clusters with AMD GPUs. Heterogeneous/multi-arch clusters do not need dedicated amd64/x86_64 control plane nodes just to control AMD GPU scheduling.

Technical Details

Builds the controller/manager container for both amd64 and arm64, in a single multi-platform image.

Not changed:

  • Build host still has to be amd64/x86_64
  • GPU node still has to be amd64/x86_64
  • e2e still only tests amd64/x86_64

Resolves #331

Test Plan

Tested working with a cluster like this:

control-plane   arm64/aarch64   no_gpu
control-plane   arm64/aarch64   no_gpu
control-plane   arm64/aarch64   no_gpu
worker          amd64/x86_64    no_gpu
worker          amd64/x86_64    no_gpu
worker          amd64/x86_64    amd_gpu
worker          amd64/x86_64    amd_gpu
worker          amd64/x86_64    amd_gpu

Helm values:

node-feature-discovery.enabled=false
deviceConfig.spec.driver.enabled=false

Test Result

gpu-operator pod schedules correctly and runs on the arm64/aarch64 control plane nodes. workflow pod schedules and runs correctly on an amd64/x86_64 worker node. GPU resources are available to pods that request them.

Submission Checklist

Builds the controller/manager container for both amd64 and arm64, in a
single multi-platform image. This allows control plane nodes running
arm64/aarch64 CPUs to be able to manage clusters with AMD GPUs.

Not changes:
- Build host still has to be amd64/x86_64
- GPU node still has to be amd64/x86_64
- e2e still only tests amd64/x86_64

Benefits:
- Heterogeneous/multi-arch clusters do not need dedicated amd64/x86_64
  control plane nodes just to control AMD GPU scheduling.

Resolves ROCm#331
@yansun1996 yansun1996 self-requested a review May 10, 2026 02:20
@yansun1996
Copy link
Copy Markdown
Member

Hi @cyanidium , thanks for the PR, really appreciate the contribution, this commit needs to go through internal review and CI test on our side, will get back to you once we decide to move forward with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support arm64 for gpu-operator containers

2 participants