[None][infra] enable CUDA line info by default for Debug/RelWithDebInfo#13334
[None][infra] enable CUDA line info by default for Debug/RelWithDebInfo#13334bobboli merged 2 commits intoNVIDIA:mainfrom
Conversation
Emit `--generate-line-info` for both Debug and RelWithDebInfo builds so nsys/ncu can map samples back to source out of the box, and additionally emit `-G` for Debug so cuda-gdb can step through kernels. Because these flags inflate section sizes enough that linking against every supported CUDA architecture overflows ELF section limits, `build_wheel.py` now rejects Debug/RelWithDebInfo builds that don't pass an explicit `--cuda_architectures`. Release builds are unaffected. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughTwo files modified to enhance CUDA build configuration and validation. CMakeLists.txt activates CUDA debug line-info and device debug flags for Release-with-DebInfo and Debug builds. build_wheel.py adds a validation guard requiring explicit CUDA architectures for Debug/RelWithDebInfo builds instead of defaulting to "all". Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/build_wheel.py`:
- Around line 553-563: The guard currently checks "cuda_architectures is None"
which lets an empty string (e.g. --cuda_architectures "") slip through; update
the check in the build_type/ cuda_architectures guard so it rejects empty
strings as well (e.g. use a falsy check like "if build_type in
('Debug','RelWithDebInfo') and not cuda_architectures") to raise the
RuntimeError when cuda_architectures is missing or empty; adjust any related
message or tests that assume a None-only check accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 25e7f029-f32e-4944-b1fc-86222453df74
📒 Files selected for processing (2)
cpp/CMakeLists.txtscripts/build_wheel.py
|
PR_Github #44955 [ run ] triggered by Bot. Commit: |
Tighten the guard so an explicit `--cuda_architectures ""` is also rejected. Without this, an empty string bypassed the `is None` check and silently fell back to `'all'` via the `or 'all'` default below, defeating the purpose of the guard. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #45234 [ run ] triggered by Bot. Commit: |
|
PR_Github #45234 [ run ] completed with state
|
|
/bot run --disable-fail-fast --reuse-test |
|
PR_Github #45338 [ run ] triggered by Bot. Commit: |
|
PR_Github #45338 [ run ] completed with state
|
|
/bot run --disable-fail-fast --reuse-test |
|
PR_Github #45386 [ run ] triggered by Bot. Commit: |
|
PR_Github #45386 [ run ] completed with state |
juney-nvidia
left a comment
There was a problem hiding this comment.
Approved from OSS compliance perspective.
Summary
--generate-line-infoby default for bothDebugandRelWithDebInfoCUDA flags, sonsys/ncucan map samples back to source without needing to hand-tweak CMake.-Gby default forDebugsocuda-gdbcan step through kernels.scripts/build_wheel.pynow rejectsDebug/RelWithDebInfobuilds that don't pass an explicit--cuda_architectures.Releasebuilds are unaffected.Motivation
Previously the project shipped with
--generate-line-infoand-Gcommented out incpp/CMakeLists.txtbecause enabling them caused link-time failures when building for all CUDA archs. In practice the only builds that compileRelWithDebInfo/Debugare developers working on specific GPUs — they don't need every arch, and they do need line info/device debug info. Trading a default-all-archs developer build (which almost no one actually uses with debug info) for usable debug/profile builds is a clear win.Test plan
python scripts/build_wheel.py -b Releasestill succeeds with defaultcuda_architectures=all.python scripts/build_wheel.py -b RelWithDebInfonow fails fast with an informative error telling the developer to pass--cuda_architectures.python scripts/build_wheel.py -b RelWithDebInfo --cuda_architectures 90-realsucceeds and produces alibtensorrt_llm.sowith CUDA line info resolvable bynsys/ncu.python scripts/build_wheel.py -b Debug --cuda_architectures 90-realsucceeds and produces a binary debuggable withcuda-gdb.Summary by CodeRabbit