Skip to content

feat: add npu dockerfile and useage#428

Merged
HuangJoJo merged 2 commits intoalibaba:mainfrom
UsernameFull:npu_docker
Apr 30, 2026
Merged

feat: add npu dockerfile and useage#428
HuangJoJo merged 2 commits intoalibaba:mainfrom
UsernameFull:npu_docker

Conversation

@UsernameFull
Copy link
Copy Markdown
Contributor

@UsernameFull UsernameFull commented Apr 27, 2026

PR: feat: add NPU Dockerfile and Ascend documentation

Summary

Add Docker build files and documentation for running ROLL on Huawei Ascend NPU (Atlas 900 A2/A3 PODc), enabling users to deploy RLVR and other pipelines on Ascend hardware.

Changes

Docker

  • Add Dockerfile.A2 for Atlas 900 A2 PODc (Ascend 910B1, CANN 8.5.1)
  • Add Dockerfile.A3 for Atlas 900 A3 PODc (Ascend 910_9391, CANN 8.5.1)
  • Both images include: PyTorch 2.8.0+cpu, vLLM 0.13.0, vLLM-Ascend 0.13.0, DeepSpeed 0.16.4, Transformers 4.57.6, triton-ascend 3.2.0

Documentation (English + Chinese)

  • Add ascend_docker_usage.md — Docker image build/pull, container startup, environment verification, and RLVR pipeline example
  • Update ascend_usage.md — Revise Ascend support status and installation instructions

Example Config

  • Add examples/ascend_examples/qwen3_8b_rlvr_deepspeed.yaml — Qwen3-8B RLVR config using DeepSpeed ZeRO-3 + CPU offloading on NPU

Files Changed (7 files, +808 / -68)

File Change
docker/Dockerfile.A2 New
docker/Dockerfile.A3 New
docs_roll/docs/User Guides/Hardware Support/ascend_docker_usage.md New
docs_roll/docs/User Guides/Hardware Support/ascend_usage.md Updated
docs_roll/i18n/.../ascend_docker_usage.md New (Chinese)
docs_roll/i18n/.../ascend_usage.md Updated (Chinese)
examples/ascend_examples/qwen3_8b_rlvr_deepspeed.yaml New

Key Notes

  • NPU does not support Megatron-LM; DeepSpeed is the only supported training backend
  • NPU does not support colocated mode; training and inference must use separate NPU devices
  • flash_attn package is not supported; use fa2 via transformers instead

@UsernameFull UsernameFull changed the title [WIP]feat: add npu dockerfile and useage feat: add npu dockerfile and useage Apr 29, 2026
@HuangJoJo HuangJoJo merged commit 034d38e into alibaba:main Apr 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants