Skip to content

Feat/amd rocm support#70

Closed
Justin Darnell (justindarnell) wants to merge 3 commits intoLightricks:mainfrom
justindarnell:feat/amd-rocm-support
Closed

Feat/amd rocm support#70
Justin Darnell (justindarnell) wants to merge 3 commits intoLightricks:mainfrom
justindarnell:feat/amd-rocm-support

Conversation

@justindarnell
Copy link
Copy Markdown

No description provided.

Tests all critical capabilities needed for the LTX pipeline:
- ROCm/HIP detection via torch.version.hip
- bfloat16 support (critical — LTX uses bf16 globally)
- Core ops: SDPA, Conv3d, GroupNorm, LayerNorm, Generator seeding
- torch.compile / triton availability
- Memory management (empty_cache, synchronize, large alloc)
- FP8 guard logic validation
- Transformer-scale stress test (SDPA + FFN + 3D VAE conv)
- LTX pipeline import test (ltx-core, ltx-pipelines, transformers)

Run with:
  python scripts/test-rocm-feasibility.py --skip-ltx
  python scripts/test-rocm-feasibility.py --verbose

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Based on feasibility test results (33/33 critical tests pass on RDNA 3.5):
- VRAM: 87.9 GB detected (well above 31 GB LTX threshold)
- bfloat16: fully supported on RDNA 3.5
- SDPA, Conv3d, all core ops: pass

Backend changes:
- services/services_utils.py: add is_rocm_device() helper; fix
  device_supports_fp8() to return False for ROCm (FP8 not hardware-
  accelerated on RDNA 3.x) and check sm_89+ for NVIDIA
- handlers/pipelines_handler.py: skip torch.compile for ROCm builds
  (triton not available on ROCm Windows) alongside existing MPS skip
- ltx2_server.py: auto-set TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
  on ROCm to enable optimized Flash Attention (SDPA falls back to slow
  math-attention path without this env var)
- pyproject.toml: bump transformers to >=4.55.5 (AMD requirement)

Build system:
- scripts/prepare-python.ps1: add -GpuBackend parameter (cuda|rocm)
  ROCm path: forces Python 3.12, filters CUDA-only packages
  (torch/sageattention/triton-windows), installs ROCm SDK and PyTorch
  wheels from repo.radeon.com, skips Triton JIT headers step

ROCm maps HIP to the CUDA PyTorch API so torch.cuda.is_available()
returns True — no frontend or Electron changes needed.

Usage:
  scripts/prepare-python.ps1 -GpuBackend rocm

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- electron-builder.yml: use ${env.LTX_ARTIFACT_SUFFIX} in win artifactName
  so ROCm builds produce '*-ROCm-Setup.exe' (empty suffix = default CUDA name)
- scripts/create-installer.ps1: add -GpuBackend param; sets LTX_ARTIFACT_SUFFIX
- scripts/local-build.ps1: add -GpuBackend param; passed to prepare-python.ps1
  and create-installer.ps1
- backend/tests/test_device_utils.py: unit tests for is_rocm_device() and
  device_supports_fp8() covering ROCm, NVIDIA pre-Ada, Ada, Hopper, CPU, MPS
- README.md: add AMD ROCm row to compatibility table; add Windows AMD system
  requirements section with driver, BIOS, and security prerequisites

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lokkju
Copy link
Copy Markdown

Why was this closed out? is there some hard block preventing it from working on an AMD machine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants