Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在昇腾上,运行 GRPO 报错:AssertionError: Torch not compiled with CUDA enabled #3594

Closed
SipingXu opened this issue Mar 20, 2025 · 5 comments

Comments

@SipingXu
Copy link

Describe the bug
在昇腾上,运行 GRPO 报错:AssertionError: Torch not compiled with CUDA enabled

Your hardware and system info

NPU DRIVER 24.1.0.3
CANN 8.1.RC1
torch 2.5.1
torch-npu 2.5.1.dev20250228
vllm 0.7.3+empty
vllm_ascend 0.7.3rc1

ms_swift 3.2.1

Additional context

文件 swift/llm/infer/infer_engine/utils.py

函数 restore_torch_device_after_vllm_init 里,存在硬编码的 torch.cuda 导致的。我改成 torch.npu 后,可以正常在昇腾卡上运行

@hjh0119 hjh0119 mentioned this issue Mar 20, 2025
4 tasks
@hjh0119
Copy link
Collaborator

hjh0119 commented Mar 24, 2025

fixed

@hjh0119 hjh0119 closed this as completed Mar 24, 2025
@angrySquirrel
Copy link

你好,用修复后的代码,目前按照官网npu的示例跑infer和sft任务,都仍然会报错:Torch not compiled with CUDA

Image

Image

是否还有别的部分需要修改?
是离线下载的模型,指定路径

@Jintao-Huang
Copy link
Collaborator

swift版本是3.2.2嘛

@angrySquirrel
Copy link

是的

@angrySquirrel
Copy link

补充一下,报错的版本:torch是2.1, CANN 8.0.RC2.2

Image

更换torch版本后 torch2.4.0 正常了

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants