Skip to content

fix(pipeline): distinct status for document_timeout vs failed pages#3352

Draft
Bojun-Vvibe wants to merge 1 commit intodocling-project:mainfrom
Bojun-Vvibe:fix/docling-project-docling-3205
Draft

fix(pipeline): distinct status for document_timeout vs failed pages#3352
Bojun-Vvibe wants to merge 1 commit intodocling-project:mainfrom
Bojun-Vvibe:fix/docling-project-docling-3205

Conversation

@Bojun-Vvibe
Copy link
Copy Markdown

Closes #3205

Repo

docling-project/docling

Issue

#3205

Root cause

PR #2939 made StandardPdfPipeline emit ConversionStatus.PARTIAL_SUCCESS for
both document_timeout reached and per-page failures. Downstream consumers can
no longer distinguish a truncated document (last N pages missing due to timeout)
from one where individual pages failed.

Fix

Added a new enum value ConversionStatus.PARTIAL_SUCCESS_TIMEOUT and changed
the timeout branch in StandardPdfPipeline._integrate_results to emit it.
Per-page failures continue to use PARTIAL_SUCCESS. Allow-lists in
document_converter.py and service_client/client.py accept the new status.

Regression test

tests/test_options.py::test_document_timeout updated to assert
ConversionStatus.PARTIAL_SUCCESS_TIMEOUT for both pipelines. Fails on HEAD
(returns PARTIAL_SUCCESS), passes after fix.

Risk

low

Verification

skipped: full test run requires heavy native deps (docling_core, docling_parse,
docling_ibm_models). Syntax-checked all modified files with python -m py_compile
(OK). The change is a single new enum value plus 4 surgical call-site updates.

Introduce ConversionStatus.PARTIAL_SUCCESS_TIMEOUT, emitted by the
StandardPdfPipeline only when document_timeout is reached. Per-page
failures continue to yield PARTIAL_SUCCESS, so downstream consumers
can distinguish a truncated document (timeout) from one with a few
failed pages.
@github-actions
Copy link
Copy Markdown
Contributor

DCO Check Failed

Hi @Bojun-Vvibe, your pull request has failed the Developer Certificate of Origin (DCO) check.

This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format.


🛠 Quick Fix: Add a remediation commit

Run this command:

git commit --allow-empty -s -m "DCO Remediation Commit for bugfix-mission <bugfix-mission@local>

I, bugfix-mission <bugfix-mission@local>, hereby add my Signed-off-by to this commit: b2c448434bc7114af6b7092857bb15173ffdccb2"
git push

🔧 Advanced: Sign off each commit directly

For the latest commit:

git commit --amend --signoff
git push --force-with-lease

For multiple commits:

git rebase --signoff origin/main
git push --force-with-lease

More info: DCO check report

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 23, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Differentiate between conversion whre document_timeout is reached versus those for which individual pages failed

1 participant