Fix compute_hidden_states_hf.py: handle BatchEncoding from apply_chat_template by yeyu-nvidia · Pull Request #1225 · NVIDIA/Model-Optimizer

yeyu-nvidia · 2026-04-09T17:50:44Z

Summary

apply_chat_template(..., return_tensors="pt") returns a BatchEncoding in transformers 4.46+, which no longer subclasses dict
The old guard isinstance(tokenized, dict) evaluates to False for BatchEncoding, so input_ids was set to the whole BatchEncoding object
Calling .shape[1] on a BatchEncoding triggers __getattr__("shape") → AttributeError
Fix: check isinstance(tokenized, torch.Tensor) instead, which correctly handles both old transformers (plain tensor) and new transformers (BatchEncoding)

This is causing test_collect_hidden_states to fail in the speculative decoding CI for all open PRs (#1207, #1210, #1221).

Test plan

torch-pr (speculative_decoding, 26.01) CI passes
Verify fix handles both torch.Tensor return (old transformers) and BatchEncoding return (new transformers 4.46+)

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Streamlined tokenization in speculative decoding examples to enforce a single, consistent token output format. This reduces runtime variability, prevents token-processing errors during example runs, and improves stability across Transformer library versions.

…emplate In transformers 4.46+, apply_chat_template() with return_tensors="pt" returns a BatchEncoding object that no longer subclasses dict. The previous isinstance(tokenized, dict) guard evaluated to False and fell through to tokenized (the BatchEncoding), causing input_ids.shape[1] to call BatchEncoding.__getattr__("shape") and raise AttributeError. Fix by checking isinstance(tokenized, torch.Tensor) instead, which correctly handles both old transformers (plain tensor return) and new transformers (BatchEncoding return). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

coderabbitai · 2026-04-09T17:50:58Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f80439d1-bf42-4862-ac93-89e10705c517

📥 Commits

Reviewing files that changed from the base of the PR and between f762336 and 4be7f6f.

📒 Files selected for processing (1)

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py

✅ Files skipped from review due to trivial changes (1)

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py

📝 Walkthrough

Walkthrough

Tokenization in a speculative decoding example was simplified: submit_generates() now requests a mapping-style return from tokenizer.apply_chat_template(..., return_dict=True) and directly reads ["input_ids"], removing the previous runtime branch that handled both tensor and mapping return types.

Changes

Cohort / File(s)	Summary
Tokenization Handling `examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py`	Removed conditional branching on `tokenizer.apply_chat_template()` return type; `input_ids` are now extracted unconditionally from the mapping-style return (`...["input_ids"]`) by using `return_dict=True`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: handling BatchEncoding returns from apply_chat_template in compute_hidden_states_hf.py, which is the core fix described in the changeset.
Security Anti-Patterns	✅ Passed	Pull request introduces no security anti-patterns; trust_remote_code is properly exposed as configurable argument defaulting to False.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch yeyu/fix-compute-hidden-states-transformers-compat

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kevalmorabia97 · 2026-04-09T17:51:58Z

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py

+            # apply_chat_template return type varies by transformers version:
+            # - older versions return a plain torch.Tensor (input_ids directly)
+            # - newer versions (4.46+) return a BatchEncoding which no longer
+            #   subclasses dict, so isinstance(tokenized, dict) is False


Is this comment correct? We use 4.56+ so the older version should not need to be supported

Good point, simplified to just ["input_ids"] directly since transformers 4.46+ is always assumed.

Since transformers 4.46+ is required, apply_chat_template always returns a BatchEncoding. Drop the torch.Tensor fallback and just index ["input_ids"] directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

github-actions · 2026-04-09T17:59:12Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-09 19:28 UTC

codecov · 2026-04-09T18:06:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.39%. Comparing base (14e9bea) to head (4be7f6f).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1225      +/-   ##
==========================================
+ Coverage   75.68%   77.39%   +1.70%     
==========================================
  Files         353      353              
  Lines       40491    40491              
==========================================
+ Hits        30644    31336     +692     
+ Misses       9847     9155     -692

Flag	Coverage Δ
examples	`44.49% <ø> (+2.76%)`	⬆️
unit	`55.25% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Without return_dict=True, apply_chat_template returns a raw torch.Tensor on transformers <5.0 (default return_dict=False) and a BatchEncoding on transformers >=5.0 (default changed to True). Subscripting a Tensor with ["input_ids"] raises TypeError on <5.0. Passing return_dict=True explicitly forces BatchEncoding on all versions (verified locally on 4.57.1 and 5.0.0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

yeyu-nvidia · 2026-04-09T18:21:09Z

Tested locally across versions:

Version	`["input_ids"]` without `return_dict`	With `return_dict=True`
4.57.1	❌ Returns raw `Tensor` → `TypeError`	✅ `BatchEncoding`
5.0.0	✅ Returns `BatchEncoding` by default	✅ `BatchEncoding`
5.3 (CI)	✅ Returns `BatchEncoding` by default	✅ `BatchEncoding`

The ["input_ids"] without return_dict=True would break on 4.57 since the default changed from False to True at the 5.0 boundary. Updated the fix to pass return_dict=True explicitly so it works on all versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

…_template (#1225) ## Summary - `apply_chat_template(..., return_tensors="pt")` returns a `BatchEncoding` in transformers 4.46+, which no longer subclasses `dict` - The old guard `isinstance(tokenized, dict)` evaluates to `False` for `BatchEncoding`, so `input_ids` was set to the whole `BatchEncoding` object - Calling `.shape[1]` on a `BatchEncoding` triggers `__getattr__("shape")` → `AttributeError` - Fix: check `isinstance(tokenized, torch.Tensor)` instead, which correctly handles both old transformers (plain tensor) and new transformers (BatchEncoding) This is causing `test_collect_hidden_states` to fail in the speculative decoding CI for all open PRs (#1207, #1210, #1221). ## Test plan - [ ] `torch-pr (speculative_decoding, 26.01)` CI passes - [ ] Verify fix handles both `torch.Tensor` return (old transformers) and `BatchEncoding` return (new transformers 4.46+) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Ye Yu <yeyu@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

yeyu-nvidia requested a review from a team as a code owner April 9, 2026 17:50

yeyu-nvidia requested a review from h-guo18 April 9, 2026 17:50

kevalmorabia97 reviewed Apr 9, 2026

View reviewed changes

yeyu-nvidia enabled auto-merge (squash) April 9, 2026 17:52

ajrasane approved these changes Apr 9, 2026

View reviewed changes

kevalmorabia97 approved these changes Apr 9, 2026

View reviewed changes

Fix ruff format: collapse apply_chat_template kwargs to single line

4be7f6f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

yeyu-nvidia merged commit c901814 into main Apr 9, 2026
42 checks passed

yeyu-nvidia deleted the yeyu/fix-compute-hidden-states-transformers-compat branch April 9, 2026 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix compute_hidden_states_hf.py: handle BatchEncoding from apply_chat_template#1225

Fix compute_hidden_states_hf.py: handle BatchEncoding from apply_chat_template#1225
yeyu-nvidia merged 4 commits intomainfrom
yeyu/fix-compute-hidden-states-transformers-compat

yeyu-nvidia commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

kevalmorabia97 Apr 9, 2026

Uh oh!

yeyu-nvidia Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

yeyu-nvidia commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yeyu-nvidia commented Apr 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

kevalmorabia97 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

yeyu-nvidia Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yeyu-nvidia commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yeyu-nvidia commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

codecov bot commented Apr 9, 2026 •

edited

Loading