Summary
Two complementary Apple Silicon improvements that together deliver 14–17x speedups on MPS hardware:
- TableFormer MPS acceleration: Remove the hard-coded MPS→CPU fallback in
TableStructureModel (V1) and TableStructureModelV2 that silently disables GPU acceleration on Apple Silicon.
- VLM legacy API auto-selection: Add auto-selecting constants (
GRANITEDOCLING, SMOLDOCLING) that detect Apple Silicon + mlx-vlm at import time, following the established pattern from asr_model_specs.py.
Problem
TableFormer MPS disabled
Both table_structure_model.py (lines 82–84) and table_structure_model_v2.py (lines 62–63) contain:
# Disable MPS here, until we know why it makes things slower.
if device == AcceleratorDevice.MPS.value:
device = AcceleratorDevice.CPU.value
This was added in commit 19fad926 (Dec 2024) during initial GPU accelerator support. Since then, PyTorch MPS support has improved significantly. Our benchmarks show MPS is now 14x faster than CPU for the standard pipeline on Apple Silicon — the original concern no longer applies.
VLM legacy API requires explicit model selection
The modern preset system (VlmConvertOptions.from_preset("granite_docling")) already auto-selects MLX via AutoInlineVlmEngine. However, the legacy module-level constants in vlm_model_specs.py require users to explicitly choose between GRANITEDOCLING_TRANSFORMERS and GRANITEDOCLING_MLX. The ASR module (asr_model_specs.py) already has auto-selecting constants (e.g., WHISPER_TINY), but the VLM module does not.
Benchmarks
Tested on Apple Silicon (M-series), PyTorch 2.11.0, mlx-vlm 0.3.9, using tests/data/pdf/2206.01062.pdf (9 pages, 5 tables):
Standard Pipeline (TableFormer MPS fix)
| Device |
Time |
Tables |
Text Elements |
| CPU |
145.9s |
5 |
597 |
| MPS |
10.4s |
5 |
597 |
| Speedup |
14.0x |
✅ Match |
✅ Match |
VLM Pipeline — GraniteDocling-258M Framework Comparison
| Framework |
Time |
Tables |
Texts |
Total Chars |
| MLX (auto-selected) |
41.1s |
5 |
136 |
41,814 |
| Transformers (MPS) |
715.4s |
5 |
118 |
26,478 |
| Speedup |
17.4x |
✅ Match |
MLX better |
MLX +58% |
VLM Pipeline — Model Comparison (both MLX)
| Model |
Time |
Tables |
Texts |
Chars |
| GraniteDocling-258M-mlx |
41.7s |
5 |
136 |
41,814 |
| SmolDocling-256M-mlx |
104.4s |
4 |
154 |
42,115 |
GraniteDocling-258M-mlx is faster (2.5x) and finds all 5 tables vs SmolDocling's 4.
Proposed Changes
1. TableFormer: Remove MPS→CPU override
Replace the hard-coded fallback with explicit supported_devices declarations:
# Before
device = decide_device(accelerator_options.device)
if device == AcceleratorDevice.MPS.value:
device = AcceleratorDevice.CPU.value
# After
device = decide_device(
accelerator_options.device,
supported_devices=[
AcceleratorDevice.CPU,
AcceleratorDevice.CUDA,
AcceleratorDevice.MPS,
AcceleratorDevice.XPU,
],
)
_log.debug(f"TableStructureModel using device: {device}")
This follows the pattern used by code_formula_model.py and adds debug logging so users can verify which device is active (addressing feedback in #1972).
2. VLM: Add auto-selecting constants
Add a shared hardware detection helper and auto-selecting factory functions:
def _has_apple_silicon_mlx() -> bool:
"""Return True if MPS is available and mlx-vlm is installed."""
...
def _get_granitedocling_model():
if _has_apple_silicon_mlx():
return GRANITEDOCLING_MLX
else:
return GRANITEDOCLING_TRANSFORMERS
GRANITEDOCLING = _get_granitedocling_model()
This mirrors the established ASR pattern (_get_whisper_tiny_model() → WHISPER_TINY) in asr_model_specs.py.
Files Changed
docling/models/stages/table_structure/table_structure_model.py — Remove MPS override, add supported_devices
docling/models/stages/table_structure/table_structure_model_v2.py — Same
docling/datamodel/vlm_model_specs.py — Add _has_apple_silicon_mlx(), _get_granitedocling_model(), _get_smoldocling_model(), GRANITEDOCLING, SMOLDOCLING
docling/datamodel/pipeline_options.py — Re-export new auto-selecting constants
Related Issues
Testing
- All pre-commit checks pass (Ruff formatter, Ruff linter, MyPy, uv-lock)
- Existing test suite passes with no regressions
- On non-Apple hardware, behavior is identical (auto-selecting constants return Transformers variants,
decide_device() falls back to CPU/CUDA as before)
Summary
Two complementary Apple Silicon improvements that together deliver 14–17x speedups on MPS hardware:
TableStructureModel(V1) andTableStructureModelV2that silently disables GPU acceleration on Apple Silicon.GRANITEDOCLING,SMOLDOCLING) that detect Apple Silicon + mlx-vlm at import time, following the established pattern fromasr_model_specs.py.Problem
TableFormer MPS disabled
Both
table_structure_model.py(lines 82–84) andtable_structure_model_v2.py(lines 62–63) contain:This was added in commit
19fad926(Dec 2024) during initial GPU accelerator support. Since then, PyTorch MPS support has improved significantly. Our benchmarks show MPS is now 14x faster than CPU for the standard pipeline on Apple Silicon — the original concern no longer applies.VLM legacy API requires explicit model selection
The modern preset system (
VlmConvertOptions.from_preset("granite_docling")) already auto-selects MLX viaAutoInlineVlmEngine. However, the legacy module-level constants invlm_model_specs.pyrequire users to explicitly choose betweenGRANITEDOCLING_TRANSFORMERSandGRANITEDOCLING_MLX. The ASR module (asr_model_specs.py) already has auto-selecting constants (e.g.,WHISPER_TINY), but the VLM module does not.Benchmarks
Tested on Apple Silicon (M-series), PyTorch 2.11.0, mlx-vlm 0.3.9, using
tests/data/pdf/2206.01062.pdf(9 pages, 5 tables):Standard Pipeline (TableFormer MPS fix)
VLM Pipeline — GraniteDocling-258M Framework Comparison
VLM Pipeline — Model Comparison (both MLX)
GraniteDocling-258M-mlx is faster (2.5x) and finds all 5 tables vs SmolDocling's 4.
Proposed Changes
1. TableFormer: Remove MPS→CPU override
Replace the hard-coded fallback with explicit
supported_devicesdeclarations:This follows the pattern used by
code_formula_model.pyand adds debug logging so users can verify which device is active (addressing feedback in #1972).2. VLM: Add auto-selecting constants
Add a shared hardware detection helper and auto-selecting factory functions:
This mirrors the established ASR pattern (
_get_whisper_tiny_model()→WHISPER_TINY) inasr_model_specs.py.Files Changed
docling/models/stages/table_structure/table_structure_model.py— Remove MPS override, addsupported_devicesdocling/models/stages/table_structure/table_structure_model_v2.py— Samedocling/datamodel/vlm_model_specs.py— Add_has_apple_silicon_mlx(),_get_granitedocling_model(),_get_smoldocling_model(),GRANITEDOCLING,SMOLDOCLINGdocling/datamodel/pipeline_options.py— Re-export new auto-selecting constantsRelated Issues
Testing
decide_device()falls back to CPU/CUDA as before)