Skip to content

Fix compute_hidden_states_hf.py: handle BatchEncoding from apply_chat_template#1225

Merged
yeyu-nvidia merged 4 commits intomainfrom
yeyu/fix-compute-hidden-states-transformers-compat
Apr 9, 2026
Merged

Fix compute_hidden_states_hf.py: handle BatchEncoding from apply_chat_template#1225
yeyu-nvidia merged 4 commits intomainfrom
yeyu/fix-compute-hidden-states-transformers-compat

Conversation

@yeyu-nvidia
Copy link
Copy Markdown
Contributor

@yeyu-nvidia yeyu-nvidia commented Apr 9, 2026

Summary

  • apply_chat_template(..., return_tensors="pt") returns a BatchEncoding in transformers 4.46+, which no longer subclasses dict
  • The old guard isinstance(tokenized, dict) evaluates to False for BatchEncoding, so input_ids was set to the whole BatchEncoding object
  • Calling .shape[1] on a BatchEncoding triggers __getattr__("shape")AttributeError
  • Fix: check isinstance(tokenized, torch.Tensor) instead, which correctly handles both old transformers (plain tensor) and new transformers (BatchEncoding)

This is causing test_collect_hidden_states to fail in the speculative decoding CI for all open PRs (#1207, #1210, #1221).

Test plan

  • torch-pr (speculative_decoding, 26.01) CI passes
  • Verify fix handles both torch.Tensor return (old transformers) and BatchEncoding return (new transformers 4.46+)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Streamlined tokenization in speculative decoding examples to enforce a single, consistent token output format. This reduces runtime variability, prevents token-processing errors during example runs, and improves stability across Transformer library versions.

…emplate

In transformers 4.46+, apply_chat_template() with return_tensors="pt"
returns a BatchEncoding object that no longer subclasses dict. The
previous isinstance(tokenized, dict) guard evaluated to False and fell
through to tokenized (the BatchEncoding), causing input_ids.shape[1] to
call BatchEncoding.__getattr__("shape") and raise AttributeError.

Fix by checking isinstance(tokenized, torch.Tensor) instead, which
correctly handles both old transformers (plain tensor return) and new
transformers (BatchEncoding return).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
@yeyu-nvidia yeyu-nvidia requested a review from a team as a code owner April 9, 2026 17:50
@yeyu-nvidia yeyu-nvidia requested a review from h-guo18 April 9, 2026 17:50
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f80439d1-bf42-4862-ac93-89e10705c517

📥 Commits

Reviewing files that changed from the base of the PR and between f762336 and 4be7f6f.

📒 Files selected for processing (1)
  • examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py
✅ Files skipped from review due to trivial changes (1)
  • examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py

📝 Walkthrough

Walkthrough

Tokenization in a speculative decoding example was simplified: submit_generates() now requests a mapping-style return from tokenizer.apply_chat_template(..., return_dict=True) and directly reads ["input_ids"], removing the previous runtime branch that handled both tensor and mapping return types.

Changes

Cohort / File(s) Summary
Tokenization Handling
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py
Removed conditional branching on tokenizer.apply_chat_template() return type; input_ids are now extracted unconditionally from the mapping-style return (...["input_ids"]) by using return_dict=True.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: handling BatchEncoding returns from apply_chat_template in compute_hidden_states_hf.py, which is the core fix described in the changeset.
Security Anti-Patterns ✅ Passed Pull request introduces no security anti-patterns; trust_remote_code is properly exposed as configurable argument defaulting to False.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch yeyu/fix-compute-hidden-states-transformers-compat

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +212 to +215
# apply_chat_template return type varies by transformers version:
# - older versions return a plain torch.Tensor (input_ids directly)
# - newer versions (4.46+) return a BatchEncoding which no longer
# subclasses dict, so isinstance(tokenized, dict) is False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct? We use 4.56+ so the older version should not need to be supported

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, simplified to just ["input_ids"] directly since transformers 4.46+ is always assumed.

@yeyu-nvidia yeyu-nvidia enabled auto-merge (squash) April 9, 2026 17:52
Since transformers 4.46+ is required, apply_chat_template always
returns a BatchEncoding. Drop the torch.Tensor fallback and just
index ["input_ids"] directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-09 19:28 UTC

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.39%. Comparing base (14e9bea) to head (4be7f6f).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1225      +/-   ##
==========================================
+ Coverage   75.68%   77.39%   +1.70%     
==========================================
  Files         353      353              
  Lines       40491    40491              
==========================================
+ Hits        30644    31336     +692     
+ Misses       9847     9155     -692     
Flag Coverage Δ
examples 44.49% <ø> (+2.76%) ⬆️
unit 55.25% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Without return_dict=True, apply_chat_template returns a raw torch.Tensor
on transformers <5.0 (default return_dict=False) and a BatchEncoding on
transformers >=5.0 (default changed to True). Subscripting a Tensor with
["input_ids"] raises TypeError on <5.0.

Passing return_dict=True explicitly forces BatchEncoding on all versions
(verified locally on 4.57.1 and 5.0.0).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
@yeyu-nvidia
Copy link
Copy Markdown
Contributor Author

Tested locally across versions:

Version ["input_ids"] without return_dict With return_dict=True
4.57.1 ❌ Returns raw TensorTypeError BatchEncoding
5.0.0 ✅ Returns BatchEncoding by default BatchEncoding
5.3 (CI) ✅ Returns BatchEncoding by default BatchEncoding

The ["input_ids"] without return_dict=True would break on 4.57 since the default changed from False to True at the 5.0 boundary. Updated the fix to pass return_dict=True explicitly so it works on all versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
@yeyu-nvidia yeyu-nvidia merged commit c901814 into main Apr 9, 2026
42 checks passed
@yeyu-nvidia yeyu-nvidia deleted the yeyu/fix-compute-hidden-states-transformers-compat branch April 9, 2026 19:27
Edwardf0t1 pushed a commit that referenced this pull request Apr 9, 2026
…_template (#1225)

## Summary

- `apply_chat_template(..., return_tensors="pt")` returns a
`BatchEncoding` in transformers 4.46+, which no longer subclasses `dict`
- The old guard `isinstance(tokenized, dict)` evaluates to `False` for
`BatchEncoding`, so `input_ids` was set to the whole `BatchEncoding`
object
- Calling `.shape[1]` on a `BatchEncoding` triggers
`__getattr__("shape")` → `AttributeError`
- Fix: check `isinstance(tokenized, torch.Tensor)` instead, which
correctly handles both old transformers (plain tensor) and new
transformers (BatchEncoding)

This is causing `test_collect_hidden_states` to fail in the speculative
decoding CI for all open PRs (#1207, #1210, #1221).

## Test plan

- [ ] `torch-pr (speculative_decoding, 26.01)` CI passes
- [ ] Verify fix handles both `torch.Tensor` return (old transformers)
and `BatchEncoding` return (new transformers 4.46+)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Ye Yu <yeyu@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants