fix: transformers v5 compatibility for AUTOMODEL_CAUSALLM VLMs by geoHeil · Pull Request #3276 · docling-project/docling

geoHeil · 2026-04-12T06:18:33Z

Summary

Fixes VlmPipeline TransformersVlmEngine breaks on transformers v5 for AUTOMODEL_CAUSALLM models (TokenizersBackend has no attribute 'tokenizer') #3273: VlmPipeline + TransformersVlmEngine crashes on transformers v5 when loading AUTOMODEL_CAUSALLM VLMs (e.g. tiiuae/Falcon-OCR) with AttributeError: TokenizersBackend has no attribute tokenizer.
In transformers v5, AutoProcessor.from_pretrained returns a TokenizersBackend directly for pure-tokenizer processors — it exposes _tokenizer, not tokenizer.
Introduce _get_tokenizer() that returns processor.tokenizer when present and otherwise falls back to the processor itself, so both v4 wrapper-processors and v5 TokenizersBackend shapes work. Guard padding_side / pad_token accesses with getattr/hasattr.

Test plan

VlmPipeline with falcon_ocr preset initializes and converts a sample PDF on transformers v5
AUTOMODEL_IMAGETEXTTOTEXT presets (e.g. GLM-OCR, LightOnOCR, GraniteVision) still initialize and generate correctly (v4 + v5)
Existing VLM unit tests pass

🤖 Generated with Claude Code

…ng-project#3273) In transformers v5, AutoProcessor.from_pretrained returns a TokenizersBackend directly for pure-tokenizer processors (e.g. Falcon-OCR with AUTOMODEL_CAUSALLM), which has no .tokenizer attribute. Resolve the tokenizer via a helper that falls back to the processor itself so both v4 wrapper processors and v5 TokenizersBackend shapes are supported. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

mergify · 2026-04-12T06:19:10Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

PeterStaar-IBM

nice!

github-actions · 2026-04-12T06:59:03Z

✅ DCO Check Passed

Thanks @geoHeil, all your commits are properly signed off. 🎉

codecov · 2026-04-12T07:20:14Z

Codecov Report

❌ Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...odels/inference_engines/vlm/transformers_engine.py	81.81%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

geoHeil marked this pull request as ready for review April 12, 2026 06:19

geoHeil mentioned this pull request Apr 12, 2026

fix(vlm): add explicit MLX support for OCR presets #3272

Merged

PeterStaar-IBM self-requested a review April 12, 2026 06:51

PeterStaar-IBM approved these changes Apr 12, 2026

View reviewed changes

PeterStaar-IBM approved these changes Apr 13, 2026

View reviewed changes

PeterStaar-IBM merged commit d431224 into docling-project:main Apr 13, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: transformers v5 compatibility for AUTOMODEL_CAUSALLM VLMs#3276

fix: transformers v5 compatibility for AUTOMODEL_CAUSALLM VLMs#3276
PeterStaar-IBM merged 1 commit intodocling-project:mainfrom
geoHeil:ft5

geoHeil commented Apr 12, 2026

Uh oh!

mergify bot commented Apr 12, 2026

Uh oh!

PeterStaar-IBM left a comment

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

codecov bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

geoHeil commented Apr 12, 2026

Summary

Test plan

Uh oh!

mergify bot commented Apr 12, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

PeterStaar-IBM left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

codecov bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Apr 12, 2026 •

edited

Loading