Skip to content

[None][infra] enable CUDA line info by default for Debug/RelWithDebInfo#13334

Merged
bobboli merged 2 commits intoNVIDIA:mainfrom
bobboli:lbo/enable-cuda-line-info
Apr 28, 2026
Merged

[None][infra] enable CUDA line info by default for Debug/RelWithDebInfo#13334
bobboli merged 2 commits intoNVIDIA:mainfrom
bobboli:lbo/enable-cuda-line-info

Conversation

@bobboli
Copy link
Copy Markdown
Collaborator

@bobboli bobboli commented Apr 22, 2026

Summary

  • Emit --generate-line-info by default for both Debug and RelWithDebInfo CUDA flags, so nsys/ncu can map samples back to source without needing to hand-tweak CMake.
  • Emit -G by default for Debug so cuda-gdb can step through kernels.
  • Because these flags inflate device-side section sizes enough that linking against every supported CUDA architecture overflows ELF section limits, scripts/build_wheel.py now rejects Debug/RelWithDebInfo builds that don't pass an explicit --cuda_architectures. Release builds are unaffected.

Motivation

Previously the project shipped with --generate-line-info and -G commented out in cpp/CMakeLists.txt because enabling them caused link-time failures when building for all CUDA archs. In practice the only builds that compile RelWithDebInfo/Debug are developers working on specific GPUs — they don't need every arch, and they do need line info/device debug info. Trading a default-all-archs developer build (which almost no one actually uses with debug info) for usable debug/profile builds is a clear win.

Test plan

  • python scripts/build_wheel.py -b Release still succeeds with default cuda_architectures=all.
  • python scripts/build_wheel.py -b RelWithDebInfo now fails fast with an informative error telling the developer to pass --cuda_architectures.
  • python scripts/build_wheel.py -b RelWithDebInfo --cuda_architectures 90-real succeeds and produces a libtensorrt_llm.so with CUDA line info resolvable by nsys/ncu.
  • python scripts/build_wheel.py -b Debug --cuda_architectures 90-real succeeds and produces a binary debuggable with cuda-gdb.

Summary by CodeRabbit

  • Chores
    • Updated CUDA compilation flags for Debug and RelWithDebInfo builds to include source line mappings and device debug information.
    • Build process now requires explicit CUDA architecture specification when using Debug or RelWithDebInfo configurations instead of using defaults.

Emit `--generate-line-info` for both Debug and RelWithDebInfo builds so
nsys/ncu can map samples back to source out of the box, and additionally
emit `-G` for Debug so cuda-gdb can step through kernels.

Because these flags inflate section sizes enough that linking against
every supported CUDA architecture overflows ELF section limits,
`build_wheel.py` now rejects Debug/RelWithDebInfo builds that don't pass
an explicit `--cuda_architectures`. Release builds are unaffected.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
@bobboli bobboli requested a review from a team as a code owner April 22, 2026 10:01
@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 22, 2026

/bot run --disable-fail-fast

@bobboli bobboli enabled auto-merge (squash) April 22, 2026 10:03
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

Two files modified to enhance CUDA build configuration and validation. CMakeLists.txt activates CUDA debug line-info and device debug flags for Release-with-DebInfo and Debug builds. build_wheel.py adds a validation guard requiring explicit CUDA architectures for Debug/RelWithDebInfo builds instead of defaulting to "all".

Changes

Cohort / File(s) Summary
CUDA Build Flags
cpp/CMakeLists.txt
Activated CMake logic to append --generate-line-info to CMAKE_CUDA_FLAGS_RELWITHDEBINFO and both --generate-line-info and -G to CMAKE_CUDA_FLAGS_DEBUG. Removed previously commented-out disabled flag hints and added cmake-format directives.
Build Validation
scripts/build_wheel.py
Added guard in main() that raises RuntimeError when build_type is Debug or RelWithDebInfo and cuda_architectures is not explicitly provided, preventing automatic defaulting to "all".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: enabling CUDA line info by default for Debug/RelWithDebInfo builds.
Description check ✅ Passed The description comprehensively covers the summary, motivation, and test plan sections required by the template, clearly explaining what changed and why.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/build_wheel.py`:
- Around line 553-563: The guard currently checks "cuda_architectures is None"
which lets an empty string (e.g. --cuda_architectures "") slip through; update
the check in the build_type/ cuda_architectures guard so it rejects empty
strings as well (e.g. use a falsy check like "if build_type in
('Debug','RelWithDebInfo') and not cuda_architectures") to raise the
RuntimeError when cuda_architectures is missing or empty; adjust any related
message or tests that assume a None-only check accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 25e7f029-f32e-4944-b1fc-86222453df74

📥 Commits

Reviewing files that changed from the base of the PR and between 36fb5f0 and 0c5e30d.

📒 Files selected for processing (2)
  • cpp/CMakeLists.txt
  • scripts/build_wheel.py

Comment thread scripts/build_wheel.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44955 [ run ] triggered by Bot. Commit: 0c5e30d Link to invocation

Tighten the guard so an explicit `--cuda_architectures ""` is also
rejected. Without this, an empty string bypassed the `is None` check
and silently fell back to `'all'` via the `or 'all'` default below,
defeating the purpose of the guard.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 23, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45234 [ run ] triggered by Bot. Commit: 8f25fe1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45234 [ run ] completed with state SUCCESS. Commit: 8f25fe1
/LLM/main/L0_MergeRequest_PR pipeline #35495 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 24, 2026

/bot run --disable-fail-fast --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45338 [ run ] triggered by Bot. Commit: 8f25fe1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45338 [ run ] completed with state SUCCESS. Commit: 8f25fe1
/LLM/main/L0_MergeRequest_PR pipeline #35587 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 24, 2026

/bot run --disable-fail-fast --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45386 [ run ] triggered by Bot. Commit: 8f25fe1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45386 [ run ] completed with state SUCCESS. Commit: 8f25fe1
/LLM/main/L0_MergeRequest_PR pipeline #35627 completed with status: 'SUCCESS'

CI Report

Link to invocation

Copy link
Copy Markdown
Collaborator

@juney-nvidia juney-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved from OSS compliance perspective.

@bobboli bobboli merged commit f3270f9 into NVIDIA:main Apr 28, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants