Skip to content

IndexError in MsWordDocumentBackend when processing text without equations #1284

@yssAI

Description

@yssAI

Bug

When processing certain Word documents (.docx), the handle_text_elements method in msword_backend.py throws an IndexError: list index out of range when attempting to split text that doesn't contain equation markers.

Steps to reproduce

  1. Process a Word document containing paragraphs without equation markers (EQ)
  2. The error occurs in docling/backend/msword_backend.py, line 380
  3. The code assumes all text elements contain equations and tries to split on "EQ"

Docling version

docling 2.28.2
docling-core 2.24.0
docling-ibm-models 3.4.1
docling-parse 4.0.0

Python version

python==3.12.9

Image

Image

In some cases, an extra space in the equation causes an error

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdocxissue related to docx backend

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions