Add required keys to attention pruning config by grzegorz-k-karch · Pull Request #1360 · NVIDIA/Model-Optimizer

grzegorz-k-karch · 2026-04-28T11:13:57Z

What does this PR do?

Type of change: ? Bug fix

The config examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/attn_pruning.yaml didn't have required keys to use attention pruning in the example examples/puzzletron/main.py

Usage

Testing

In examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml
change ffn_pruning to attn_pruning

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: N/A

Additional Information

Summary by CodeRabbit

Chores
- Updated pruning configuration for improved KV-head pruning support, including enhanced importance hook settings and attention output handling for memory optimization.

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

copy-pr-bot · 2026-04-28T11:14:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-28T11:14:13Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ce0ba0c5-75ed-424e-9b2a-15abc7a16531

📥 Commits

Reviewing files that changed from the base of the PR and between 6e08b13 and d8c2f8c.

📒 Files selected for processing (1)

examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/attn_pruning.yaml

📝 Walkthrough

Walkthrough

A configuration file for LLaMA pruning is updated to explicitly specify the importance hook class, add KV-head pruning settings via a pruning mixin, define the LLaMA KV-head layer descriptor, and designate the attention output projection target layer for pruning operations.

Changes

Cohort / File(s)	Summary
Pruning Configuration `examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/attn_pruning.yaml`	Added explicit importance hook class (`IndependentKvHeadContributionHook`), KV-head pruning mixin configuration (`KVHeadsPruningMixIn`), LLaMA KV-head layer descriptor, and attention output projection target layer specification (`self_attn.o_proj`).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add required keys to attention pruning config' directly and specifically describes the main change: adding required keys to the attention pruning configuration file.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR only modifies YAML configuration file with no Python code changes; no security anti-patterns detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gkarch/fix-attn-pruning-config

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-28T11:29:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.48%. Comparing base (6e08b13) to head (d8c2f8c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1360      +/-   ##
==========================================
+ Coverage   76.93%   77.48%   +0.54%     
==========================================
  Files         471      471              
  Lines       50401    50401              
==========================================
+ Hits        38777    39052     +275     
+ Misses      11624    11349     -275

Flag	Coverage Δ
examples	`41.59% <ø> (+0.92%)`	⬆️
unit	`52.74% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2026-04-28T12:20:15Z

/ok to test d8c2f8c

github-actions · 2026-04-28T13:05:28Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-28 13:05 UTC

#1368 #1373 #1359 #1361 #1325 #1369 #1370 #1371 #1375 #1386 #1353 #1356 #1390 (#1385) ## Cherry-picked PRs - #1352 - #1351 - #1330 - #1354 - #1355 - #1360 - #1342 - #1324 - #1340 - #1368 - #1373 - #1359 - #1361 - #1325 - #1369 - #1370 - #1371 - #1375 - #1386 - #1353 - #1356 - #1390  ## Summary by CodeRabbit * **New Features** * Added Python 3.14 support (basic unit tests verified; production defaults on Python 3.12) * Added Windows CUDA 13.x installation guidance * Introduced LLM ONNX export utilities with quantization support * Extended Medusa mode support in speculative decoding pipeline * **Bug Fixes** * Fixed FP8 quantization for vision transformer multi-head attention * Improved MoE expert handling in quantization calibration and inference * Enhanced ONNX graph utilities for FP8 weight transformation * **Documentation** * Comprehensive Minitron pruning + distillation + quantization + vLLM tutorials with ablation studies * Megatron data preparation guide for tokenization workflows * Puzzletron distillation results and cross-reference updates  --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com> Signed-off-by: ynankani <ynankani@nvidia.com> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: vipandya <vipandya@nvidia.com> Signed-off-by: dmoodie <dmoodie@nvidia.com> Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com> Signed-off-by: Ye Yu <yeyu@nvidia.com> Signed-off-by: Kai Xu <kaix@nvidia.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com> Co-authored-by: Jenny Chen <jennifchen@nvidia.com> Co-authored-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com> Co-authored-by: ynankani <ynankani@nvidia.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Co-authored-by: vishalpandya1990 <vishalpandya1990@gmail.com> Co-authored-by: dthienan-nv <dmoodie@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Hrishith Thadicherla <99313418+hthadicherla@users.noreply.github.com> Co-authored-by: yeyu-nvidia <yeyu@nvidia.com> Co-authored-by: kaix-nv <kaix@nvidia.com> Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>

added required keys

d8c2f8c

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

grzegorz-k-karch self-assigned this Apr 28, 2026

grzegorz-k-karch added the bug Something isn't working label Apr 28, 2026

grzegorz-k-karch requested a review from a team as a code owner April 28, 2026 11:13

grzegorz-k-karch requested review from chochowski and danielkorzekwa April 28, 2026 11:16

kevalmorabia97 approved these changes Apr 28, 2026

View reviewed changes

kevalmorabia97 enabled auto-merge (squash) April 28, 2026 12:20

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 28, 2026

kevalmorabia97 merged commit 6d33078 into main Apr 28, 2026
38 checks passed

kevalmorabia97 deleted the gkarch/fix-attn-pruning-config branch April 28, 2026 13:04

kevalmorabia97 mentioned this pull request May 4, 2026

[Cherry-pick] PRs #1352 #1351 #1330 #1354 #1355 #1360 #1342 #1324 #1340 #1368 #1373 #1359 #1361 #1325 #1369 #1370 #1371 #1375 #1386 #1353 #1356 #1390 #1385

Merged

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add required keys to attention pruning config#1360

Add required keys to attention pruning config#1360
kevalmorabia97 merged 1 commit intomainfrom
gkarch/fix-attn-pruning-config

grzegorz-k-karch commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

kevalmorabia97 commented Apr 28, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

grzegorz-k-karch commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 commented Apr 28, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

grzegorz-k-karch commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

codecov Bot commented Apr 28, 2026 •

edited

Loading