fix(pipeline): distinct status for document_timeout vs failed pages#3352
Draft
Bojun-Vvibe wants to merge 1 commit intodocling-project:mainfrom
Draft
fix(pipeline): distinct status for document_timeout vs failed pages#3352Bojun-Vvibe wants to merge 1 commit intodocling-project:mainfrom
Bojun-Vvibe wants to merge 1 commit intodocling-project:mainfrom
Conversation
Introduce ConversionStatus.PARTIAL_SUCCESS_TIMEOUT, emitted by the StandardPdfPipeline only when document_timeout is reached. Per-page failures continue to yield PARTIAL_SUCCESS, so downstream consumers can distinguish a truncated document (timeout) from one with a few failed pages.
Contributor
|
❌ DCO Check Failed Hi @Bojun-Vvibe, your pull request has failed the Developer Certificate of Origin (DCO) check. This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format. 🛠 Quick Fix: Add a remediation commitRun this command: git commit --allow-empty -s -m "DCO Remediation Commit for bugfix-mission <bugfix-mission@local>
I, bugfix-mission <bugfix-mission@local>, hereby add my Signed-off-by to this commit: b2c448434bc7114af6b7092857bb15173ffdccb2"
git push🔧 Advanced: Sign off each commit directlyFor the latest commit: git commit --amend --signoff
git push --force-with-leaseFor multiple commits: git rebase --signoff origin/main
git push --force-with-leaseMore info: DCO check report |
Contributor
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3205
Repo
docling-project/docling
Issue
#3205
Root cause
PR #2939 made
StandardPdfPipelineemitConversionStatus.PARTIAL_SUCCESSforboth
document_timeoutreached and per-page failures. Downstream consumers canno longer distinguish a truncated document (last N pages missing due to timeout)
from one where individual pages failed.
Fix
Added a new enum value
ConversionStatus.PARTIAL_SUCCESS_TIMEOUTand changedthe timeout branch in
StandardPdfPipeline._integrate_resultsto emit it.Per-page failures continue to use
PARTIAL_SUCCESS. Allow-lists indocument_converter.pyandservice_client/client.pyaccept the new status.Regression test
tests/test_options.py::test_document_timeoutupdated to assertConversionStatus.PARTIAL_SUCCESS_TIMEOUTfor both pipelines. Fails on HEAD(returns
PARTIAL_SUCCESS), passes after fix.Risk
low
Verification
skipped: full test run requires heavy native deps (docling_core, docling_parse,
docling_ibm_models). Syntax-checked all modified files with
python -m py_compile(OK). The change is a single new enum value plus 4 surgical call-site updates.