Skip to content

[https://nvbugs/5808603][fix] Add bias support to WeightOnlyQuantLinearMethod#12317

Merged
stnie merged 1 commit intoNVIDIA:mainfrom
stnie:bugfix/5808603
Mar 19, 2026
Merged

[https://nvbugs/5808603][fix] Add bias support to WeightOnlyQuantLinearMethod#12317
stnie merged 1 commit intoNVIDIA:mainfrom
stnie:bugfix/5808603

Conversation

@stnie
Copy link
Collaborator

@stnie stnie commented Mar 18, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Fixed bias handling in linear layer operations to ensure bias is correctly applied to outputs across all execution paths.
  • Tests

    • Enhanced test coverage for quantized linear operations with parameterized bias configuration validation.

Description

WeightOnlyQuantLinearMethods did not apply bias, which is for example used in Qwen2.5 models

This PR:

  • Modified WeightOnlyQuantLinearMethod to include bias in the output calculation.
  • Updated test_weight_only_quant_linear to parameterize bias and validate its effect on output.

Test Coverage

  • Updated test_weight_only_quant_linear to parameterize bias and validate its effect on output.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…arMethod and update corresponding tests

- Modified WeightOnlyQuantLinearMethod to include bias in the output calculation.
- Updated test_weight_only_quant_linear to parameterize bias and validate its effect on output.

Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
@stnie
Copy link
Collaborator Author

stnie commented Mar 18, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39446 [ run ] triggered by Bot. Commit: 5763719 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39446 [ run ] completed with state SUCCESS. Commit: 5763719
/LLM/main/L0_MergeRequest_PR pipeline #30673 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@stnie stnie marked this pull request as ready for review March 18, 2026 14:40
@stnie stnie requested a review from a team as a code owner March 18, 2026 14:40
@stnie stnie requested a review from yuxianq March 18, 2026 14:40
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Adds bias support to the Linear module's apply method by conditionally adding a bias tensor to the output. Extends the weight-only quantized linear test with parameterized bias cases, including bias tensor creation, loading, and reference output adjustment.

Changes

Cohort / File(s) Summary
Linear Module Bias Support
tensorrt_llm/_torch/modules/linear.py
Adds three lines to conditionally apply bias in the Linear.apply path: checks if bias tensor is provided and adds it to the computed output.
Quantized Linear Test Extension
tests/unittest/_torch/thop/parallel/test_weight_only_quant_linear.py
Introduces bias as a pytest parameterized test parameter (True/False). Creates bias tensor when enabled, passes it to Linear constructor and load_weights payload, and adjusts reference output computation to include bias term conditionally.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding bias support to WeightOnlyQuantLinearMethod, which aligns perfectly with the code changes in the PR.
Description check ✅ Passed The PR description includes a clear problem statement, explains the changes made, and lists test coverage, following the required template structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable the changed files summary in the walkthrough.

Disable the reviews.changed_files_summary setting to disable the changed files summary in the walkthrough.

@stnie stnie merged commit eb5b602 into NVIDIA:main Mar 19, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants