Skip to content

[None][chore] Use affinity rather than CPU count for default build parallelism#12167

Merged
tburt-nv merged 3 commits intoNVIDIA:mainfrom
achartier:affinity-build
Mar 17, 2026
Merged

[None][chore] Use affinity rather than CPU count for default build parallelism#12167
tburt-nv merged 3 commits intoNVIDIA:mainfrom
achartier:affinity-build

Conversation

@achartier
Copy link
Collaborator

@achartier achartier commented Mar 12, 2026

Summary by CodeRabbit

Release Notes

  • Improvements
    • Build process now intelligently respects system CPU constraints when determining default parallel jobs, resulting in better resource utilization across different environments.
    • Updated build command help text to reflect improved CPU detection behavior.

Description

On some clusters, the build can have its affinity limited to a subset of CPU cores, as well as a portion of RAM. Launching more processes than available CPU cores is wasteful and raises the risk of running into OOM nvcc crashes.

Test Coverage

N/A

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
@achartier
Copy link
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

📝 Walkthrough

Walkthrough

The change introduces CPU affinity awareness to build parallelism defaults by adding a new helper function that retrieves the number of CPUs available to the current process, honoring Linux CPU affinity constraints. This function replaces direct cpu_count() calls in the job count default handling and argument specification, with corresponding help text updates.

Changes

Cohort / File(s) Summary
CPU Affinity Support
scripts/build_wheel.py
Added get_available_cpu_count() function that respects CPU affinity on Linux before falling back to system CPU count. Updated main job_count default and --job_count argument default to use this new function, with corresponding help text refinements.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description includes a clear motivation for the changes and addresses the core issue, but lacks a proper PR title with ticket/type format and omits specific test coverage details. Add a properly formatted PR title following [type] convention (e.g., [chore]) and provide more specific details about test coverage or explain why N/A is appropriate for this infrastructure change.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: using CPU affinity instead of total CPU count for build parallelism defaults.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan for PR comments
  • Generate coding plan

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/build_wheel.py (1)

2-2: Update copyright year to 2026.

The file has been meaningfully modified. As per coding guidelines, the copyright header should reflect the year of the latest meaningful modification.

Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build_wheel.py` at line 2, Update the SPDX header year from
"2022-2025" to "2022-2026" in the top-of-file copyright comment (the line
starting with "# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA
CORPORATION & AFFILIATES. All rights reserved."); replace the year range so the
header reflects the latest meaningful modification year 2026.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/build_wheel.py`:
- Around line 42-52: The get_available_cpu_count function must handle
cpu_count() possibly returning None; update the except-path in
get_available_cpu_count to coerce the result to an int with a safe default
(e.g., 1) so the function always returns an int. Specifically, when catching
AttributeError and calling cpu_count(), treat a None result as 1 (or use max(1,
cpu_count() or 1)) and return that value. Ensure the change is made inside
get_available_cpu_count so downstream callers (e.g., parallel flags) never
receive None.

---

Nitpick comments:
In `@scripts/build_wheel.py`:
- Line 2: Update the SPDX header year from "2022-2025" to "2022-2026" in the
top-of-file copyright comment (the line starting with "# SPDX-FileCopyrightText:
Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.");
replace the year range so the header reflects the latest meaningful modification
year 2026.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b146b208-88c3-4864-967d-8883e376164e

📥 Commits

Reviewing files that changed from the base of the PR and between f11eea7 and b64a634.

📒 Files selected for processing (1)
  • scripts/build_wheel.py

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
@achartier
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38762 [ run ] triggered by Bot. Commit: 9f44425 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38762 [ run ] completed with state SUCCESS. Commit: 9f44425
/LLM/main/L0_MergeRequest_PR pipeline #30076 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@achartier
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38783 [ run ] triggered by Bot. Commit: 9f44425 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38783 [ run ] completed with state SUCCESS. Commit: 9f44425
/LLM/main/L0_MergeRequest_PR pipeline #30097 completed with status: 'SUCCESS'

CI Report

Link to invocation

@tburt-nv tburt-nv merged commit 0eab5b6 into NVIDIA:main Mar 17, 2026
5 checks passed
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
…rallelism (NVIDIA#12167)

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants