在昇腾上，运行 GRPO 报错：AssertionError: Torch not compiled with CUDA enabled #3594

SipingXu · 2025-03-20T12:09:06Z

Describe the bug
在昇腾上，运行 GRPO 报错：AssertionError: Torch not compiled with CUDA enabled

Your hardware and system info

NPU DRIVER 24.1.0.3
CANN 8.1.RC1
torch 2.5.1
torch-npu 2.5.1.dev20250228
vllm 0.7.3+empty
vllm_ascend 0.7.3rc1

ms_swift 3.2.1

Additional context

文件 swift/llm/infer/infer_engine/utils.py

函数 restore_torch_device_after_vllm_init 里，存在硬编码的 torch.cuda 导致的。我改成 torch.npu 后，可以正常在昇腾卡上运行

hjh0119 · 2025-03-24T03:34:04Z

fixed

angrySquirrel · 2025-03-27T08:39:50Z

你好，用修复后的代码，目前按照官网npu的示例跑infer和sft任务，都仍然会报错：Torch not compiled with CUDA

是否还有别的部分需要修改？
是离线下载的模型，指定路径

Jintao-Huang · 2025-03-27T08:52:54Z

swift版本是3.2.2嘛

angrySquirrel · 2025-03-27T09:18:15Z

是的

angrySquirrel · 2025-03-28T03:59:27Z

补充一下，报错的版本：torch是2.1， CANN 8.0.RC2.2

更换torch版本后 torch2.4.0 正常了

hjh0119 mentioned this issue Mar 20, 2025

fix grpo npu context #3597

Merged

4 tasks

hjh0119 closed this as completed Mar 24, 2025

Provide feedback