Skip to content

fix: transformers v5 compatibility for AUTOMODEL_CAUSALLM VLMs#3276

Merged
PeterStaar-IBM merged 1 commit intodocling-project:mainfrom
geoHeil:ft5
Apr 13, 2026
Merged

fix: transformers v5 compatibility for AUTOMODEL_CAUSALLM VLMs#3276
PeterStaar-IBM merged 1 commit intodocling-project:mainfrom
geoHeil:ft5

Conversation

@geoHeil
Copy link
Copy Markdown
Contributor

@geoHeil geoHeil commented Apr 12, 2026

Summary

  • Fixes VlmPipeline TransformersVlmEngine breaks on transformers v5 for AUTOMODEL_CAUSALLM models (TokenizersBackend has no attribute 'tokenizer') #3273: VlmPipeline + TransformersVlmEngine crashes on transformers v5 when loading AUTOMODEL_CAUSALLM VLMs (e.g. tiiuae/Falcon-OCR) with AttributeError: TokenizersBackend has no attribute tokenizer.
  • In transformers v5, AutoProcessor.from_pretrained returns a TokenizersBackend directly for pure-tokenizer processors — it exposes _tokenizer, not tokenizer.
  • Introduce _get_tokenizer() that returns processor.tokenizer when present and otherwise falls back to the processor itself, so both v4 wrapper-processors and v5 TokenizersBackend shapes work. Guard padding_side / pad_token accesses with getattr/hasattr.

Test plan

  • VlmPipeline with falcon_ocr preset initializes and converts a sample PDF on transformers v5
  • AUTOMODEL_IMAGETEXTTOTEXT presets (e.g. GLM-OCR, LightOnOCR, GraniteVision) still initialize and generate correctly (v4 + v5)
  • Existing VLM unit tests pass

🤖 Generated with Claude Code

…ng-project#3273)

In transformers v5, AutoProcessor.from_pretrained returns a
TokenizersBackend directly for pure-tokenizer processors (e.g.
Falcon-OCR with AUTOMODEL_CAUSALLM), which has no .tokenizer
attribute. Resolve the tokenizer via a helper that falls back to
the processor itself so both v4 wrapper processors and v5
TokenizersBackend shapes are supported.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
@geoHeil geoHeil marked this pull request as ready for review April 12, 2026 06:19
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 12, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Copy link
Copy Markdown
Member

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@github-actions
Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @geoHeil, all your commits are properly signed off. 🎉

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...odels/inference_engines/vlm/transformers_engine.py 81.81% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@PeterStaar-IBM PeterStaar-IBM merged commit d431224 into docling-project:main Apr 13, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VlmPipeline TransformersVlmEngine breaks on transformers v5 for AUTOMODEL_CAUSALLM models (TokenizersBackend has no attribute 'tokenizer')

2 participants