Skip to content

fix: Service IPC sockets during dag bundle refresh to prevent child process blocking#65370

Open
rahulchheda wants to merge 1 commit intoapache:mainfrom
rahulchheda:fix/dag-processor-ipc-bottleneck
Open

fix: Service IPC sockets during dag bundle refresh to prevent child process blocking#65370
rahulchheda wants to merge 1 commit intoapache:mainfrom
rahulchheda:fix/dag-processor-ipc-bottleneck

Conversation

@rahulchheda
Copy link
Copy Markdown

What

Service IPC sockets in a background thread during _refresh_dag_bundles() so child parsing processes don't block on Variable.get() / Connection.get() calls.

Why

In _run_parsing_loop(), IPC is only serviced once per iteration via _service_processor_sockets(). While the main loop executes _refresh_dag_bundles() and other operations, child processes that call Variable.get() block on socket.recv() waiting for the parent to respond. With many DAG files and multiple parsing processes, this turns sub-second parses into multi-minute waits.

This is a regression from Airflow 2.x where Variable.get() was a direct DB query with no IPC dependency.

For comparison, WatchedSubprocess._monitor_subprocess() in the task execution path has a tight loop that continuously services IPC — the dag-processor lacks this.

How

  • Added a _ipc_service_thread() context manager that runs _service_processor_sockets() in a daemon thread with a 100ms poll interval
  • Wrapped _refresh_dag_bundles() with this context manager — the longest-running phase where child processes are likely to be blocked
  • Thread is automatically stopped when the context exits, before the main loop continues with its own _service_processor_sockets() call

The change is minimal and non-invasive — existing IPC servicing in the main loop is untouched.

Closes #65369


^ Add meaningful description above

Read the Pull Request Contribution Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {pr_number}.significant.md, in newsfragments.

…locking

Child parsing processes that call Variable.get() during DAG parsing
block on socket.recv() waiting for the parent DagFileProcessorManager
to service the IPC request. However, IPC is only serviced once per
main loop iteration in _service_processor_sockets(). During
_refresh_dag_bundles() and other main loop operations, child processes
are completely blocked — turning sub-second parses into multi-minute
waits.

This adds a background thread that continuously services IPC sockets
while _refresh_dag_bundles() runs, ensuring child processes get
immediate responses to Variable.get() and Connection.get() calls.

Closes apache#65369
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg bot commented Apr 16, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DagFileProcessorManager IPC bottleneck: child processes block on Variable.get() during parsing

1 participant