Skip to content

[#12257][fix] Use the first non-None result returned by hf download workers#12259

Merged
Superjomn merged 2 commits intoNVIDIA:mainfrom
kev-bi:kbi/fix-12257
Apr 4, 2026
Merged

[#12257][fix] Use the first non-None result returned by hf download workers#12259
Superjomn merged 2 commits intoNVIDIA:mainfrom
kev-bi:kbi/fix-12257

Conversation

@kev-bi
Copy link
Copy Markdown
Contributor

@kev-bi kev-bi commented Mar 16, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced robustness of model directory selection in distributed model download operations.

Description

This PR addresses the issue reported in #12257.

There seems to be an edge case when running the TensorRT-LLM backend in dynamo where model_dirs[0] does not hold the rank-0 result. So rather than always use the model_dirs[0] result change the code to iterate through non-None values in model_dirs and use the first is encountered. Since only the rank-0 worker is still downloading the model and all other workers will return None there should be only one non-None value in model_dirs to use. Now we should handle both the assumed case where model_dirs[0] does hold the result and the edge case where some other entry in the list holds the result.

Test Coverage

The existing tests should cover, the model_dir is still fetched and assigned, it's just that the manner it is fetched is different. Let me know if it would be preferable to add a test where the edge case is triggered.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@kev-bi kev-bi requested a review from a team as a code owner March 16, 2026 23:47
@kev-bi kev-bi requested a review from hchings March 16, 2026 23:47
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

Enhanced robustness in model directory selection by filtering out None values when distributing models across nodes, and simplified the accompanying docstring for clarity.

Changes

Cohort / File(s) Summary
Model directory selection robustness
tensorrt_llm/llmapi/llm_utils.py
Updated _download_hf_model_if_needed to select the first non-None model directory instead of always using the first element, preventing potential None value handling issues. Simplified docstring by removing rank-specific note.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the specific fix: using the first non-None result from hf download workers instead of assuming the first element. It's concise, references the issue #12257, and accurately reflects the main change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description adequately explains the issue, solution, test coverage, and includes all required checklist items from the template.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can enforce grammar and style rules using `languagetool`.

Configure the reviews.tools.languagetool setting to enable/disable rules and categories. Refer to the LanguageTool Community to learn more.

@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Mar 17, 2026
Copy link
Copy Markdown
Collaborator

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contribution.

@Superjomn Superjomn enabled auto-merge (squash) March 20, 2026 05:37
@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39724 [ run ] triggered by Bot. Commit: 233d8f9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39724 [ run ] completed with state SUCCESS. Commit: 233d8f9
/LLM/main/L0_MergeRequest_PR pipeline #30920 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39739 [ run ] triggered by Bot. Commit: 233d8f9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39739 [ run ] completed with state SUCCESS. Commit: 233d8f9
/LLM/main/L0_MergeRequest_PR pipeline #30934 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39742 [ run ] triggered by Bot. Commit: 233d8f9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39742 [ run ] completed with state FAILURE. Commit: 233d8f9
/LLM/main/L0_MergeRequest_PR pipeline #30937 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39745 [ run ] triggered by Bot. Commit: 233d8f9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39745 [ run ] completed with state FAILURE. Commit: 233d8f9
/LLM/main/L0_MergeRequest_PR pipeline #30940 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39748 [ run ] triggered by Bot. Commit: 233d8f9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39748 [ run ] completed with state FAILURE. Commit: 233d8f9
/LLM/main/L0_MergeRequest_PR pipeline #30943 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39752 [ run ] triggered by Bot. Commit: 233d8f9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39752 [ run ] completed with state FAILURE. Commit: 233d8f9
/LLM/main/L0_MergeRequest_PR pipeline #30947 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39755 [ run ] triggered by Bot. Commit: 233d8f9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39755 [ run ] completed with state FAILURE. Commit: 233d8f9
/LLM/main/L0_MergeRequest_PR pipeline #30950 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39758 [ run ] triggered by Bot. Commit: 233d8f9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41656 [ run ] completed with state SUCCESS. Commit: bc28949
/LLM/main/L0_MergeRequest_PR pipeline #32561 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41660 [ run ] triggered by Bot. Commit: bc28949 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41660 [ run ] completed with state SUCCESS. Commit: bc28949
/LLM/main/L0_MergeRequest_PR pipeline #32565 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41662 [ run ] triggered by Bot. Commit: bc28949 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41662 [ run ] completed with state SUCCESS. Commit: bc28949
/LLM/main/L0_MergeRequest_PR pipeline #32567 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41669 [ run ] triggered by Bot. Commit: bc28949 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41669 [ run ] completed with state SUCCESS. Commit: bc28949
/LLM/main/L0_MergeRequest_PR pipeline #32574 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41676 [ run ] triggered by Bot. Commit: bc28949 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41676 [ run ] completed with state SUCCESS. Commit: bc28949
/LLM/main/L0_MergeRequest_PR pipeline #32580 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Superjomn
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41682 [ run ] triggered by Bot. Commit: bc28949 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41682 [ run ] completed with state SUCCESS. Commit: bc28949
/LLM/main/L0_MergeRequest_PR pipeline #32584 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@karljang
Copy link
Copy Markdown
Collaborator

karljang commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41716 [ run ] triggered by Bot. Commit: bc28949 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41716 [ run ] completed with state SUCCESS. Commit: bc28949
/LLM/main/L0_MergeRequest_PR pipeline #32618 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@karljang
Copy link
Copy Markdown
Collaborator

karljang commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41720 [ run ] triggered by Bot. Commit: c1a440e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41720 [ run ] completed with state SUCCESS. Commit: c1a440e
/LLM/main/L0_MergeRequest_PR pipeline #32622 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@karljang
Copy link
Copy Markdown
Collaborator

karljang commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41756 [ run ] triggered by Bot. Commit: c1a440e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41756 [ run ] completed with state SUCCESS. Commit: c1a440e
/LLM/main/L0_MergeRequest_PR pipeline #32656 completed with status: 'SUCCESS'

CI Report

Link to invocation

@Superjomn Superjomn merged commit 9a9bb80 into NVIDIA:main Apr 4, 2026
5 checks passed
@kev-bi kev-bi deleted the kbi/fix-12257 branch April 5, 2026 01:15
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026
…load workers (NVIDIA#12259)

Signed-off-by: Kevin <kevinbi@modular.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
…load workers (NVIDIA#12259)

Signed-off-by: Kevin <kevinbi@modular.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants