[None][infra] enable CUDA line info by default for Debug/RelWithDebInfo by bobboli · Pull Request #13334 · NVIDIA/TensorRT-LLM

bobboli · 2026-04-22T10:01:42Z

Summary

Emit --generate-line-info by default for both Debug and RelWithDebInfo CUDA flags, so nsys/ncu can map samples back to source without needing to hand-tweak CMake.
Emit -G by default for Debug so cuda-gdb can step through kernels.
Because these flags inflate device-side section sizes enough that linking against every supported CUDA architecture overflows ELF section limits, scripts/build_wheel.py now rejects Debug/RelWithDebInfo builds that don't pass an explicit --cuda_architectures. Release builds are unaffected.

Motivation

Previously the project shipped with --generate-line-info and -G commented out in cpp/CMakeLists.txt because enabling them caused link-time failures when building for all CUDA archs. In practice the only builds that compile RelWithDebInfo/Debug are developers working on specific GPUs — they don't need every arch, and they do need line info/device debug info. Trading a default-all-archs developer build (which almost no one actually uses with debug info) for usable debug/profile builds is a clear win.

Test plan

python scripts/build_wheel.py -b Release still succeeds with default cuda_architectures=all.
python scripts/build_wheel.py -b RelWithDebInfo now fails fast with an informative error telling the developer to pass --cuda_architectures.
python scripts/build_wheel.py -b RelWithDebInfo --cuda_architectures 90-real succeeds and produces a libtensorrt_llm.so with CUDA line info resolvable by nsys/ncu.
python scripts/build_wheel.py -b Debug --cuda_architectures 90-real succeeds and produces a binary debuggable with cuda-gdb.

Summary by CodeRabbit

Chores
- Updated CUDA compilation flags for Debug and RelWithDebInfo builds to include source line mappings and device debug information.
- Build process now requires explicit CUDA architecture specification when using Debug or RelWithDebInfo configurations instead of using defaults.

Emit `--generate-line-info` for both Debug and RelWithDebInfo builds so nsys/ncu can map samples back to source out of the box, and additionally emit `-G` for Debug so cuda-gdb can step through kernels. Because these flags inflate section sizes enough that linking against every supported CUDA architecture overflows ELF section limits, `build_wheel.py` now rejects Debug/RelWithDebInfo builds that don't pass an explicit `--cuda_architectures`. Release builds are unaffected. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli · 2026-04-22T10:02:45Z

/bot run --disable-fail-fast

coderabbitai · 2026-04-22T10:05:54Z

📝 Walkthrough

Walkthrough

Two files modified to enhance CUDA build configuration and validation. CMakeLists.txt activates CUDA debug line-info and device debug flags for Release-with-DebInfo and Debug builds. build_wheel.py adds a validation guard requiring explicit CUDA architectures for Debug/RelWithDebInfo builds instead of defaulting to "all".

Changes

Cohort / File(s)	Summary
CUDA Build Flags `cpp/CMakeLists.txt`	Activated CMake logic to append `--generate-line-info` to `CMAKE_CUDA_FLAGS_RELWITHDEBINFO` and both `--generate-line-info` and `-G` to `CMAKE_CUDA_FLAGS_DEBUG`. Removed previously commented-out disabled flag hints and added cmake-format directives.
Build Validation `scripts/build_wheel.py`	Added guard in `main()` that raises `RuntimeError` when `build_type` is `Debug` or `RelWithDebInfo` and `cuda_architectures` is not explicitly provided, preventing automatic defaulting to `"all"`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: enabling CUDA line info by default for Debug/RelWithDebInfo builds.
Description check	✅ Passed	The description comprehensively covers the summary, motivation, and test plan sections required by the template, clearly explaining what changed and why.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/build_wheel.py`:
- Around line 553-563: The guard currently checks "cuda_architectures is None"
which lets an empty string (e.g. --cuda_architectures "") slip through; update
the check in the build_type/ cuda_architectures guard so it rejects empty
strings as well (e.g. use a falsy check like "if build_type in
('Debug','RelWithDebInfo') and not cuda_architectures") to raise the
RuntimeError when cuda_architectures is missing or empty; adjust any related
message or tests that assume a None-only check accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 25e7f029-f32e-4944-b1fc-86222453df74

📥 Commits

Reviewing files that changed from the base of the PR and between 36fb5f0 and 0c5e30d.

📒 Files selected for processing (2)

cpp/CMakeLists.txt
scripts/build_wheel.py

tensorrt-cicd · 2026-04-22T10:35:11Z

PR_Github #44955 [ run ] triggered by Bot. Commit: 0c5e30d Link to invocation

Tighten the guard so an explicit `--cuda_architectures ""` is also rejected. Without this, an empty string bypassed the `is None` check and silently fell back to `'all'` via the `or 'all'` default below, defeating the purpose of the guard. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli · 2026-04-23T17:35:44Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-23T17:41:56Z

PR_Github #45234 [ run ] triggered by Bot. Commit: 8f25fe1 Link to invocation

tensorrt-cicd · 2026-04-23T22:26:14Z

PR_Github #45234 [ run ] completed with state SUCCESS. Commit: 8f25fe1
/LLM/main/L0_MergeRequest_PR pipeline #35495 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-24T05:47:04Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-24T05:52:43Z

PR_Github #45338 [ run ] triggered by Bot. Commit: 8f25fe1 Link to invocation

tensorrt-cicd · 2026-04-24T10:12:58Z

PR_Github #45338 [ run ] completed with state SUCCESS. Commit: 8f25fe1
/LLM/main/L0_MergeRequest_PR pipeline #35587 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-24T10:14:41Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-24T10:20:57Z

PR_Github #45386 [ run ] triggered by Bot. Commit: 8f25fe1 Link to invocation

tensorrt-cicd · 2026-04-24T13:13:36Z

PR_Github #45386 [ run ] completed with state SUCCESS. Commit: 8f25fe1
/LLM/main/L0_MergeRequest_PR pipeline #35627 completed with status: 'SUCCESS'

CI Report

Link to invocation

juney-nvidia

Approved from OSS compliance perspective.

bobboli requested a review from a team as a code owner April 22, 2026 10:01

github-actions Bot assigned bobboli Apr 22, 2026

bobboli enabled auto-merge (squash) April 22, 2026 10:03

yuxianq approved these changes Apr 22, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread scripts/build_wheel.py

juney-nvidia approved these changes Apr 28, 2026

View reviewed changes

bobboli merged commit f3270f9 into NVIDIA:main Apr 28, 2026
4 of 5 checks passed

bobboli mentioned this pull request Apr 29, 2026

[None][infra] disable -G in default Debug CUDA flags to fix CI OOM #13598

Merged

4 tasks

Conversation

bobboli commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Summary by CodeRabbit

Uh oh!

bobboli commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

bobboli commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

bobboli commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

bobboli commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

juney-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bobboli commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading