Skip to content

feat: enable MPS acceleration for TableFormer and add VLM auto-selection on Apple Silicon #3202

@kensteele

Description

@kensteele

Summary

Two complementary Apple Silicon improvements that together deliver 14–17x speedups on MPS hardware:

  1. TableFormer MPS acceleration: Remove the hard-coded MPS→CPU fallback in TableStructureModel (V1) and TableStructureModelV2 that silently disables GPU acceleration on Apple Silicon.
  2. VLM legacy API auto-selection: Add auto-selecting constants (GRANITEDOCLING, SMOLDOCLING) that detect Apple Silicon + mlx-vlm at import time, following the established pattern from asr_model_specs.py.

Problem

TableFormer MPS disabled

Both table_structure_model.py (lines 82–84) and table_structure_model_v2.py (lines 62–63) contain:

# Disable MPS here, until we know why it makes things slower.
if device == AcceleratorDevice.MPS.value:
    device = AcceleratorDevice.CPU.value

This was added in commit 19fad926 (Dec 2024) during initial GPU accelerator support. Since then, PyTorch MPS support has improved significantly. Our benchmarks show MPS is now 14x faster than CPU for the standard pipeline on Apple Silicon — the original concern no longer applies.

VLM legacy API requires explicit model selection

The modern preset system (VlmConvertOptions.from_preset("granite_docling")) already auto-selects MLX via AutoInlineVlmEngine. However, the legacy module-level constants in vlm_model_specs.py require users to explicitly choose between GRANITEDOCLING_TRANSFORMERS and GRANITEDOCLING_MLX. The ASR module (asr_model_specs.py) already has auto-selecting constants (e.g., WHISPER_TINY), but the VLM module does not.

Benchmarks

Tested on Apple Silicon (M-series), PyTorch 2.11.0, mlx-vlm 0.3.9, using tests/data/pdf/2206.01062.pdf (9 pages, 5 tables):

Standard Pipeline (TableFormer MPS fix)

Device Time Tables Text Elements
CPU 145.9s 5 597
MPS 10.4s 5 597
Speedup 14.0x ✅ Match ✅ Match

VLM Pipeline — GraniteDocling-258M Framework Comparison

Framework Time Tables Texts Total Chars
MLX (auto-selected) 41.1s 5 136 41,814
Transformers (MPS) 715.4s 5 118 26,478
Speedup 17.4x ✅ Match MLX better MLX +58%

VLM Pipeline — Model Comparison (both MLX)

Model Time Tables Texts Chars
GraniteDocling-258M-mlx 41.7s 5 136 41,814
SmolDocling-256M-mlx 104.4s 4 154 42,115

GraniteDocling-258M-mlx is faster (2.5x) and finds all 5 tables vs SmolDocling's 4.

Proposed Changes

1. TableFormer: Remove MPS→CPU override

Replace the hard-coded fallback with explicit supported_devices declarations:

# Before
device = decide_device(accelerator_options.device)
if device == AcceleratorDevice.MPS.value:
    device = AcceleratorDevice.CPU.value

# After
device = decide_device(
    accelerator_options.device,
    supported_devices=[
        AcceleratorDevice.CPU,
        AcceleratorDevice.CUDA,
        AcceleratorDevice.MPS,
        AcceleratorDevice.XPU,
    ],
)
_log.debug(f"TableStructureModel using device: {device}")

This follows the pattern used by code_formula_model.py and adds debug logging so users can verify which device is active (addressing feedback in #1972).

2. VLM: Add auto-selecting constants

Add a shared hardware detection helper and auto-selecting factory functions:

def _has_apple_silicon_mlx() -> bool:
    """Return True if MPS is available and mlx-vlm is installed."""
    ...

def _get_granitedocling_model():
    if _has_apple_silicon_mlx():
        return GRANITEDOCLING_MLX
    else:
        return GRANITEDOCLING_TRANSFORMERS

GRANITEDOCLING = _get_granitedocling_model()

This mirrors the established ASR pattern (_get_whisper_tiny_model()WHISPER_TINY) in asr_model_specs.py.

Files Changed

  • docling/models/stages/table_structure/table_structure_model.py — Remove MPS override, add supported_devices
  • docling/models/stages/table_structure/table_structure_model_v2.py — Same
  • docling/datamodel/vlm_model_specs.py — Add _has_apple_silicon_mlx(), _get_granitedocling_model(), _get_smoldocling_model(), GRANITEDOCLING, SMOLDOCLING
  • docling/datamodel/pipeline_options.py — Re-export new auto-selecting constants

Related Issues

Testing

  • All pre-commit checks pass (Ruff formatter, Ruff linter, MyPy, uv-lock)
  • Existing test suite passes with no regressions
  • On non-Apple hardware, behavior is identical (auto-selecting constants return Transformers variants, decide_device() falls back to CPU/CUDA as before)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions