feat: enable MPS acceleration for TableFormer and add VLM auto-selection on Apple Silicon

## Summary

Two complementary Apple Silicon improvements that together deliver **14–17x speedups** on MPS hardware:

1. **TableFormer MPS acceleration**: Remove the hard-coded MPS→CPU fallback in `TableStructureModel` (V1) and `TableStructureModelV2` that silently disables GPU acceleration on Apple Silicon.
2. **VLM legacy API auto-selection**: Add auto-selecting constants (`GRANITEDOCLING`, `SMOLDOCLING`) that detect Apple Silicon + mlx-vlm at import time, following the established pattern from `asr_model_specs.py`.

## Problem

### TableFormer MPS disabled

Both `table_structure_model.py` (lines 82–84) and `table_structure_model_v2.py` (lines 62–63) contain:

```python
# Disable MPS here, until we know why it makes things slower.
if device == AcceleratorDevice.MPS.value:
    device = AcceleratorDevice.CPU.value
```

This was added in commit `19fad926` (Dec 2024) during initial GPU accelerator support. Since then, PyTorch MPS support has improved significantly. Our benchmarks show **MPS is now 14x faster than CPU** for the standard pipeline on Apple Silicon — the original concern no longer applies.

### VLM legacy API requires explicit model selection

The modern preset system (`VlmConvertOptions.from_preset("granite_docling")`) already auto-selects MLX via `AutoInlineVlmEngine`. However, the legacy module-level constants in `vlm_model_specs.py` require users to explicitly choose between `GRANITEDOCLING_TRANSFORMERS` and `GRANITEDOCLING_MLX`. The ASR module (`asr_model_specs.py`) already has auto-selecting constants (e.g., `WHISPER_TINY`), but the VLM module does not.

## Benchmarks

Tested on Apple Silicon (M-series), PyTorch 2.11.0, mlx-vlm 0.3.9, using `tests/data/pdf/2206.01062.pdf` (9 pages, 5 tables):

### Standard Pipeline (TableFormer MPS fix)

| Device | Time | Tables | Text Elements |
|--------|------|--------|---------------|
| CPU | 145.9s | 5 | 597 |
| **MPS** | **10.4s** | **5** | **597** |
| **Speedup** | **14.0x** | ✅ Match | ✅ Match |

### VLM Pipeline — GraniteDocling-258M Framework Comparison

| Framework | Time | Tables | Texts | Total Chars |
|-----------|------|--------|-------|-------------|
| **MLX (auto-selected)** | **41.1s** | **5** | **136** | **41,814** |
| Transformers (MPS) | 715.4s | 5 | 118 | 26,478 |
| **Speedup** | **17.4x** | ✅ Match | MLX better | MLX +58% |

### VLM Pipeline — Model Comparison (both MLX)

| Model | Time | Tables | Texts | Chars |
|-------|------|--------|-------|-------|
| **GraniteDocling-258M-mlx** | **41.7s** | **5** | 136 | 41,814 |
| SmolDocling-256M-mlx | 104.4s | 4 | 154 | 42,115 |

GraniteDocling-258M-mlx is faster (2.5x) and finds all 5 tables vs SmolDocling's 4.

## Proposed Changes

### 1. TableFormer: Remove MPS→CPU override

Replace the hard-coded fallback with explicit `supported_devices` declarations:

```python
# Before
device = decide_device(accelerator_options.device)
if device == AcceleratorDevice.MPS.value:
    device = AcceleratorDevice.CPU.value

# After
device = decide_device(
    accelerator_options.device,
    supported_devices=[
        AcceleratorDevice.CPU,
        AcceleratorDevice.CUDA,
        AcceleratorDevice.MPS,
        AcceleratorDevice.XPU,
    ],
)
_log.debug(f"TableStructureModel using device: {device}")
```

This follows the pattern used by `code_formula_model.py` and adds debug logging so users can verify which device is active (addressing feedback in #1972).

### 2. VLM: Add auto-selecting constants

Add a shared hardware detection helper and auto-selecting factory functions:

```python
def _has_apple_silicon_mlx() -> bool:
    """Return True if MPS is available and mlx-vlm is installed."""
    ...

def _get_granitedocling_model():
    if _has_apple_silicon_mlx():
        return GRANITEDOCLING_MLX
    else:
        return GRANITEDOCLING_TRANSFORMERS

GRANITEDOCLING = _get_granitedocling_model()
```

This mirrors the established ASR pattern (`_get_whisper_tiny_model()` → `WHISPER_TINY`) in `asr_model_specs.py`.

## Files Changed

- `docling/models/stages/table_structure/table_structure_model.py` — Remove MPS override, add `supported_devices`
- `docling/models/stages/table_structure/table_structure_model_v2.py` — Same
- `docling/datamodel/vlm_model_specs.py` — Add `_has_apple_silicon_mlx()`, `_get_granitedocling_model()`, `_get_smoldocling_model()`, `GRANITEDOCLING`, `SMOLDOCLING`
- `docling/datamodel/pipeline_options.py` — Re-export new auto-selecting constants

## Related Issues

- #1972 — CPU vs MPS takes the same time (user reports zero speedup with Transformers on MPS)
- #1968 — ImportError: mlx-vlm is not installed (MLX detection failures)
- #2504 — Documentation of GPU setup
- #1847 — How do you use docling GPU accelerator?

## Testing

- All pre-commit checks pass (Ruff formatter, Ruff linter, MyPy, uv-lock)
- Existing test suite passes with no regressions
- On non-Apple hardware, behavior is identical (auto-selecting constants return Transformers variants, `decide_device()` falls back to CPU/CUDA as before)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable MPS acceleration for TableFormer and add VLM auto-selection on Apple Silicon #3202

Summary

Problem

TableFormer MPS disabled

VLM legacy API requires explicit model selection

Benchmarks

Standard Pipeline (TableFormer MPS fix)

VLM Pipeline — GraniteDocling-258M Framework Comparison

VLM Pipeline — Model Comparison (both MLX)

Proposed Changes

1. TableFormer: Remove MPS→CPU override

2. VLM: Add auto-selecting constants

Files Changed

Related Issues

Testing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Device	Time	Tables	Text Elements
CPU	145.9s	5	597
MPS	10.4s	5	597
Speedup	14.0x	✅ Match	✅ Match

Framework	Time	Tables	Texts	Total Chars
MLX (auto-selected)	41.1s	5	136	41,814
Transformers (MPS)	715.4s	5	118	26,478
Speedup	17.4x	✅ Match	MLX better	MLX +58%

Model	Time	Tables	Texts	Chars
GraniteDocling-258M-mlx	41.7s	5	136	41,814
SmolDocling-256M-mlx	104.4s	4	154	42,115

feat: enable MPS acceleration for TableFormer and add VLM auto-selection on Apple Silicon #3202

Description

Summary

Problem

TableFormer MPS disabled

VLM legacy API requires explicit model selection

Benchmarks

Standard Pipeline (TableFormer MPS fix)

VLM Pipeline — GraniteDocling-258M Framework Comparison

VLM Pipeline — Model Comparison (both MLX)

Proposed Changes

1. TableFormer: Remove MPS→CPU override

2. VLM: Add auto-selecting constants

Files Changed

Related Issues

Testing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions