Skip to content

[DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image#1270

Open
Oseltamivir wants to merge 3 commits intomainfrom
dsv4-trt-mhc7168-image
Open

[DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image#1270
Oseltamivir wants to merge 3 commits intomainfrom
dsv4-trt-mhc7168-image

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir commented May 3, 2026

blockers on review/merge NVIDIA/TensorRT-LLM#13710

Summary

  • Update dsv4-fp4-b300-trt to the TensorRT-LLM image with the fused MHC hidden-size 7168 fix.
  • Re-enable TRTLLM fused MHC by default for the B300 DeepSeek-V4-Pro TRT recipe.

TensorRT-LLM fork

Validation

  • bash -n benchmarks/single_node/dsv4_fp4_b300_trt.sh
  • YAML load for .github/configs/nvidia-master.yaml and perf-changelog.yaml
  • python3 utils/matrix_logic/generate_sweep_configs.py test-config --config-keys dsv4-fp4-b300-trt --config-files .github/configs/nvidia-master.yaml

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread perf-changelog.yaml Outdated
description:
- "Update the TensorRT-LLM DeepSeek-V4-Pro image to ghcr.io/semianalysisai/trtllm-deepseek-v4:fix-mhc7168-eb20e9e"
- "Enable TRTLLM fused MHC by default now that the image includes the hidden-size 7168 fused-HC fix"
pr-link: TBD
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry uses 'pr-link: TBD' instead of a real URL. Every other entry uses a fully-qualified https://github.com/SemiAnalysisAI/InferenceX/pull/NNN URL, and the PR number (1270) is already known — please replace TBD with #1270 before merge. Validation accepts any string so CI passes, but the placeholder degrades traceability for downstream tooling.

Extended reasoning...

What this is: The new perf-changelog.yaml entry added at line 2157 has pr-link: TBD as a placeholder rather than a real PR URL.

Why it stands out: A grep confirms this is the only pr-link: TBD in the entire perf-changelog.yaml — every other entry uses a fully-qualified https://github.com/SemiAnalysisAI/InferenceX/pull/NNN URL. The PR number is already known (#1270), so this is a straightforward fill-in. AGENTS.md documents the convention of using a placeholder while the PR is being prepared and replacing it with the real URL once the PR exists.

Why it does not break CI: utils/matrix_logic/validation.py:483 declares pr_link: str = Field(alias="pr-link") — a plain string with no URL pattern validation. process_changelog.py imports ChangelogEntry but never references the pr_link field itself for sweep generation, so nothing breaks at runtime. That is why this is a nit (traceability/process oversight) rather than a normal bug.

Impact: Downstream tooling and humans reading perf-changelog.yaml lose the link back to the PR that introduced this image bump. Anyone investigating a regression on dsv4-fp4-b300-trt to the fix-mhc7168-eb20e9e image (or trying to understand why fused MHC was re-enabled by default) cannot click through to the discussion and diff.

Step-by-step proof:

  1. Open perf-changelog.yaml line 2157: the new entry's last line is pr-link: TBD.
  2. Search the file for other entries — all use pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/N (e.g. line 2151 immediately above is pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1233).
  3. Confirm the PR number: this PR is [DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image #1270 (per the PR metadata).
  4. Confirm CI won't fail: utils/matrix_logic/validation.py:483 defines pr_link as a plain str, with no HttpUrl or regex constraint.

Fix: Replace pr-link: TBD on line 2157 with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1270 before merge.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@functionstackx functionstackx changed the title Update DSV4 TRT fused MHC image [DNM - test PR] Update DSV4 TRT fused MHC image May 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@functionstackx functionstackx changed the title [DNM - test PR] Update DSV4 TRT fused MHC image [DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant