Add AMD ROCm (gfx942) support for the image→3D generation stack#72
Conversation
Swap the CUDA-only generation stack for verified ROCm builds (spconv_rocm, nvdiffrast@rocm, amd_gsplat, pytorch3d ROCm wheel, FA2-Triton) plus two runtime shims: a kaolin sitecustomize bypass (texture-stage only) and a spconv KRSC->Native checkpoint-load bridge. All additive under docker/; CUDA paths unchanged. Verified e2e on AMD Instinct MI300X / ROCm 6.4.3 / torch 2.6: SAM3D image->3D produces splat.ply (28.9s, 9.74GB VRAM).
|
Hi @ZJLi2013 , |
hi, thanks for replying. I'd love to support more works from HR on AMD GPUs future, both functional level and performance level if any interested |
Sure, you are more than welcome. Currently, our main focus remains on NVIDIA GPU CUDA, and AMD GPUs are undoubtedly an important complement. |
…n core dep errors
|
hi @ZJLi2013 , |
Summary
Enable EmbodiedGen's image→3D generation to run on AMD GPUs (ROCm/HIP), by swapping the
CUDA-only libraries for verified ROCm builds plus two small runtime shims. All changes are
additive (new files under
docker/); no existing CUDA code path is modified.Verified end-to-end on an AMD Instinct MI300X:
python -m embodied_gen.models.sam3d(SAM3D backend, no GPT, no texture-bake) produces
outputs/splat.ply(6.5 MB 3D GaussianSplat) from the bundled
sample_00.jpg.Changes (all new files)
docker/install_rocm.sh— one-shot ROCm install: requirements minus CUDA libs,numpy<2pin, the ROCm dependency swaps (table below), deploys the two shims as
sitecustomize,and runs an import smoke (PASS/FAIL map).
docker/Dockerfile.rocm— full-generation ROCm image (rocm/pytorch:rocm6.4.3...2.6.0)that runs
install_rocm.sh.docker/spconv_rocm_compat.py— converts spconv KRSC checkpoints to the Native layout atload time (see Related issue).
docker/kaolin_stub.py—sitecustomizebypass for the CUDA-onlykaolin(used only inthe texture-backprojection / mesh-IO stage; core geometry path only calls
kaolin.utils.testing.check_tensor).docker/README.rocm.md— user-facing run-through.CUDA → ROCm dependency map
spconv-cu120/121ZJLi2013/spconv_rocm(2.3.8+rocm1, source)nvdiffrastZJLi2013/nvdiffrast@rocmgsplatamd_gsplat(pypi.amd.com/rocm-6.4.3; import name staysgsplat)pytorch3dflash-attnFLASH_ATTENTION_TRITON_AMD_ENABLE=TRUEat install + runtime)xformerssdpanumpy(base = 2.x)<2(diffusers/transformers requirement)kaolin(no ROCm wheel)sitecustomizestub (docker/kaolin_stub.py)diff-gaussian-rasterizationgsplatis the defaultTested on
rocm/pytorch:rocm6.4.3_ubuntu24.04_py3.12_pytorch_release_2.6.0Results
outputs/splat.ply(6.5 MB) fromapps/assets/example_image/sample_00.jpgNotes / scope
Backward-compatible: only adds files under
docker/; CUDA users are unaffected.Out of scope (documented gaps, not regressions): texture-backprojection (
kaolinisCUDA-only and stubbed), GPT quality-checkers (need an API key). Core image→3D
(segmentation → SAM3D geometry + gaussian + mesh export) runs without them.
Optional follow-up (happy to include if desired): make the
kaolinimports inembodied_gen/data/utils.pylazy/optional so the stub isn't needed.Depends on / related: spconv KRSC checkpoint loading on ROCm —
ZJLi2013/spconv_rocm#<pr>.Until merged,
docker/spconv_rocm_compat.pyprovides the equivalent fix consumer-side.License: this PR is for study/research purposes only and adds ROCm build/integration
scripts; it ships no model weights. Any models used (e.g. SAM-3D-Objects, TRELLIS, Kolors,
SD3.5, etc.) remain governed by their own respective licenses — please refer to each model's
license before use.