fix(openlineage): self-heal ProcessPoolExecutor on BrokenProcessPool#67400
Open
anmolxlight wants to merge 3 commits into
Open
fix(openlineage): self-heal ProcessPoolExecutor on BrokenProcessPool#67400anmolxlight wants to merge 3 commits into
anmolxlight wants to merge 3 commits into
Conversation
When a child process in the OpenLineage listener's ProcessPoolExecutor terminates abruptly, concurrent.futures marks the pool as permanently broken. Every subsequent submission raises BrokenProcessPool and lineage data stops flowing until the scheduler is restarted. This adds self-healing: submit_callable now catches BrokenProcessPool, shuts down the broken executor, creates a fresh one, and retries the submission so lineage reporting recovers automatically. Closes apache#67283
2 tasks
added 2 commits
May 24, 2026 12:17
ruff I001: `from concurrent.futures.process import BrokenProcessPool` must follow `from concurrent.futures import ProcessPoolExecutor`
The warning message fits within the line-length limit so it should not be split across three lines.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The OpenLineage listener uses a
ProcessPoolExecutorto asynchronously emit lineage events from the scheduler. When a child process in the pool terminates abruptly, Python'sconcurrent.futuresmarks the pool as permanently broken. After that point, every subsequent OpenLineage event fails withBrokenProcessPooland lineage data stops flowing indefinitely — only a scheduler restart recovers it.Fix
submit_callablenow catchesBrokenProcessPool, shuts down the broken executor, creates a fresh one, and retries the submission. This makes the listener self-healing: lineage reporting recovers automatically without a scheduler restart.Changes
listener.py: catchBrokenProcessPoolinsubmit_callable, recreate the executor, and retrytest_listener.py: addtest_submit_callable_recreates_executor_on_broken_poolthat verifies the broken pool is shut down, a new executor is created, and the submission is retriedTest Plan
Closes #67283