[BugFix] fix cuda-python requirement#5261
[BugFix] fix cuda-python requirement#5261Jiang-Jia-Jun merged 3 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
此PR旨在移除对cuda-python固定版本的依赖,以支持多个CUDA版本。主要变更包括从requirements.txt中删除固定版本,将cuda-python导入改为延迟加载,并在配置检查时添加安装提示。
- 移除requirements.txt中的cuda-python固定版本要求
- 将cuda-python导入从模块级别移至函数内部(延迟导入)
- 在EPLBConfig.check()中添加cuda-python的导入检查和用户友好的错误提示
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| requirements.txt | 移除了固定的cuda-python==12.8版本要求,允许用户根据CUDA版本自行安装 |
| fastdeploy/eplb/async_expert_loader.py | 将from cuda import cudart从模块顶部移至create_mmap函数内部,实现延迟导入 |
| fastdeploy/config.py | 在EPLBConfig.check()中添加cuda-python导入检查,当启用eplb但未安装cuda-python时提供清晰的错误信息和安装指导 |
fastdeploy/config.py
Outdated
| if self.eplb_config is not None and self.eplb_config.enable_eplb: | ||
| try: | ||
| import cuda # noqa | ||
| except Exception: |
There was a problem hiding this comment.
捕获所有异常(Exception)过于宽泛,可能会掩盖真正的错误。建议只捕获ImportError:
try:
import cuda # noqa
except ImportError:
raise ImportError(
"cuda-python not installed. Install the version matching your CUDA toolkit:\n"
" CUDA 12.x → pip install cuda-python==12.*\n"
" CUDA 11.x → pip install cuda-python==11.*"
)这样可以避免捕获到其他非导入相关的异常(如内存错误、键盘中断等)。
| except Exception: | |
| except ImportError: |
| "cuda-python not installed. Install the version matching your CUDA toolkit:\n" | ||
| " CUDA 12.x → pip install cuda-python==12.*\n" |
There was a problem hiding this comment.
错误信息缺少如何确定用户CUDA版本的指导。建议添加查看CUDA版本的命令:
raise ImportError(
"cuda-python not installed. Install the version matching your CUDA toolkit:\n"
" Check your CUDA version: nvcc --version or nvidia-smi\n"
" CUDA 12.x → pip install cuda-python==12.*\n"
" CUDA 11.x → pip install cuda-python==11.*"
)| "cuda-python not installed. Install the version matching your CUDA toolkit:\n" | |
| " CUDA 12.x → pip install cuda-python==12.*\n" | |
| "cuda-python not installed. Install the version matching your CUDA toolkit:\n" | |
| " Check your CUDA version: nvcc --version or nvidia-smi\n" | |
| " CUDA 12.x → pip install cuda-python==12.*\n" | |
| " CUDA 11.x → pip install cuda-python==11.*" |
| shm_ptr = ctypes.cast(shm_ptr, ctypes.POINTER(ctypes.c_int8)) | ||
| addr = ctypes.addressof(shm_ptr.contents) | ||
|
|
||
| from cuda import cudart |
There was a problem hiding this comment.
将from cuda import cudart移到函数内部会破坏现有的单元测试。测试文件tests/eplb/test_async_expert_loader.py第55行正在patchfastdeploy.eplb.async_expert_loader.cudart,但现在cudart是函数内的局部导入,无法在模块级别被patch。
有两个解决方案:
方案1 (推荐): 将导入保留在模块顶部,但使用条件导入处理:
# 在文件顶部
try:
from cuda import cudart
except ImportError:
cudart = None
# 在create_mmap函数内
if cudart is None:
raise ImportError(
"cuda-python not installed. Install the version matching your CUDA toolkit:\n"
" CUDA 12.x → pip install cuda-python==12.*\n"
" CUDA 11.x → pip install cuda-python==11.*"
)方案2: 修改测试文件中的patch路径为patch("cuda.cudart"),直接mock cuda-python包本身,但这会使测试依赖于cuda-python的存在性。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5261 +/- ##
==========================================
Coverage ? 60.54%
==========================================
Files ? 317
Lines ? 38812
Branches ? 5849
==========================================
Hits ? 23500
Misses ? 13456
Partials ? 1856
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
为支持多cuda版本,取消固定 cuda-python 版本,在使用eplb时,让用户手动下载对应版本的cuda-python
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.