Skip to content

fix(openlineage): self-heal ProcessPoolExecutor on BrokenProcessPool#67400

Open
anmolxlight wants to merge 3 commits into
apache:mainfrom
anmolxlight:fix/openlineage-broken-process-pool
Open

fix(openlineage): self-heal ProcessPoolExecutor on BrokenProcessPool#67400
anmolxlight wants to merge 3 commits into
apache:mainfrom
anmolxlight:fix/openlineage-broken-process-pool

Conversation

@anmolxlight
Copy link
Copy Markdown
Contributor

Summary

The OpenLineage listener uses a ProcessPoolExecutor to asynchronously emit lineage events from the scheduler. When a child process in the pool terminates abruptly, Python's concurrent.futures marks the pool as permanently broken. After that point, every subsequent OpenLineage event fails with BrokenProcessPool and lineage data stops flowing indefinitely — only a scheduler restart recovers it.

Fix

submit_callable now catches BrokenProcessPool, shuts down the broken executor, creates a fresh one, and retries the submission. This makes the listener self-healing: lineage reporting recovers automatically without a scheduler restart.

Changes

  • listener.py: catch BrokenProcessPool in submit_callable, recreate the executor, and retry
  • test_listener.py: add test_submit_callable_recreates_executor_on_broken_pool that verifies the broken pool is shut down, a new executor is created, and the submission is retried

Test Plan

  • New unit test passes
  • All existing OpenLineage listener unit tests pass (26 passed, 35 skipped, 0 failed)

Closes #67283

When a child process in the OpenLineage listener's ProcessPoolExecutor
terminates abruptly, concurrent.futures marks the pool as permanently
broken. Every subsequent submission raises BrokenProcessPool and lineage
data stops flowing until the scheduler is restarted.

This adds self-healing: submit_callable now catches BrokenProcessPool,
shuts down the broken executor, creates a fresh one, and retries the
submission so lineage reporting recovers automatically.

Closes apache#67283
Anmol Mishra added 2 commits May 24, 2026 12:17
ruff I001: `from concurrent.futures.process import BrokenProcessPool`
must follow `from concurrent.futures import ProcessPoolExecutor`
The warning message fits within the line-length limit so it should not be
split across three lines.
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:openlineage AIP-53 ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenLineage listener's ProcessPoolExecutor becomes permanently broken

2 participants