Skip to content

Error with blank png: Make sure that the channel dimension of the pixel values match with the one set in the configuration. #1501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Fogapod opened this issue Apr 30, 2025 · 2 comments
Assignees
Labels
bug Something isn't working layout

Comments

@Fogapod
Copy link
Contributor

Fogapod commented Apr 30, 2025

Bug

I have an image without text, it fails docling conversion.
This happens both on mac and nvidia GPU.

https://github.com/user-attachments/assets/1518021f-70c7-4a98-8994-09b1b305e3e0
Note: its named svg but its a PNG image

Steps to reproduce

uvx docling==2.31.0 vec.svg --pdf-backend dlparse_v4 --to md

The channel dimension is ambiguous. Got image shape (1, 1, 3). Assuming channels are the first dimension.
WARNING:docling.pipeline.base_pipeline:Encountered an error during conversion of document 90a2134105ce90eb548541bc22129b7d2766d7a83877d56622c345d73fa6863e:
Traceback (most recent call last):

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 160, in _build_document
    for p in pipeline_pages:  # Must exhaust!
             ^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 126, in _apply_on_pages
    yield from page_batch

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/models/page_assemble_model.py", line 69, in __call__
    for page in page_batch:
                ^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/models/table_structure_model.py", line 181, in __call__
    for page in page_batch:
                ^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling/models/layout_model.py", line 157, in __call__
    for ix, pred_item in enumerate(
                         ^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/docling_ibm_models/layoutmodel/layout_predictor.py", line 143, in predict
    outputs = self._model(**inputs)
              ^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr.py", line 2003, in forward
    outputs = self.model(
              ^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr.py", line 1719, in forward
    features = self.backbone(pixel_values, pixel_mask)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr.py", line 535, in forward
    features = self.model(pixel_values).feature_maps
               ^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr_resnet.py", line 413, in forward
    embedding_output = self.embedder(pixel_values)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/e3/.cache/uv/archive-v0/cEqeRawLw0sRI8m6TeB2x/lib/python3.12/site-packages/transformers/models/rt_detr/modeling_rt_detr_resnet.py", line 108, in forward
    raise ValueError(

ValueError: Make sure that the channel dimension of the pixel values match with the one set in the configuration.

Docling version

Docling version: 2.31.0
Docling Core version: 2.28.1
Docling IBM Models version: 3.4.2
Docling Parse version: 4.0.1
Python: cpython-312 (3.12.7)
Platform: macOS-15.3.1-arm64-arm-64bit

Python version

3.12

@Fogapod Fogapod added the bug Something isn't working label Apr 30, 2025
@cau-git cau-git added the layout label May 21, 2025
@cau-git cau-git removed their assignment May 21, 2025
@cau-git
Copy link
Contributor

cau-git commented May 21, 2025

@Fogapod I agree we should safeguard the docling code against this edge case. Still, this appears to be a 1x1 pixel PNG. Do you see this problem also on a real world document?

@Fogapod
Copy link
Contributor Author

Fogapod commented May 21, 2025

No, this is the only file I've found triggering this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working layout
Projects
None yet
Development

No branches or pull requests

3 participants