Skip to content

Conversation

@vatsrahul1001
Copy link
Contributor

During DAG serialization, write_dag uses joinedload to check if any task
instances exist for a DAG version. This loads all task instances into memory
just to answer a boolean question.

For long-running deployments with many DAG runs, this can cause high memory
usage during serialization.

This PR optimizes the check by using an EXISTS query instead, which has
constant memory usage regardless of how many task instances exist.

  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@vatsrahul1001 vatsrahul1001 force-pushed the optimize/write-dag-exists-query branch 2 times, most recently from 8f6d407 to a40e17c Compare January 22, 2026 13:31
@vatsrahul1001 vatsrahul1001 force-pushed the optimize/write-dag-exists-query branch from a40e17c to 7b46446 Compare January 22, 2026 14:10
@vatsrahul1001 vatsrahul1001 changed the title Optimize DAG serialization by replacing joinedload with EXISTS query Optimize DAG serialization Jan 22, 2026
@jedcunningham
Copy link
Member

I've been profiling this change locally. Here are the results with 600k TIs for the same dag version.

Before:
Screenshot 2026-01-22 at 12 20 52 PM

After:
Screenshot 2026-01-22 at 12 09 02 PM

@vatsrahul1001 vatsrahul1001 changed the title Optimize DAG serialization Fix DAG processor OOM: Avoid loading all TaskInstances when checking DagVersion in write_dag Jan 22, 2026
@vatsrahul1001 vatsrahul1001 changed the title Fix DAG processor OOM: Avoid loading all TaskInstances when checking DagVersion in write_dag Fix DAG processor OOM || Avoid loading all TaskInstances when checking DagVersion in write_dag Jan 22, 2026
@ephraimbuddy ephraimbuddy added this to the Airflow 3.1.7 milestone Jan 22, 2026
@jedcunningham jedcunningham requested a review from Copilot January 22, 2026 20:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the DAG serialization process to prevent out-of-memory (OOM) issues in the DAG processor. The change replaces an inefficient joinedload operation that loads all TaskInstances into memory with a lightweight EXISTS query that uses constant memory, regardless of the number of task instances.

Changes:

  • Replaced joinedload(DagVersion.task_instances) with an exists() query to check for task instance existence
  • Added explicit TaskInstance import to support the new query pattern
  • Removed unused joinedload import from sqlalchemy.orm

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@vatsrahul1001 vatsrahul1001 added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Jan 23, 2026
@vatsrahul1001 vatsrahul1001 merged commit 235595b into apache:main Jan 23, 2026
71 checks passed
@vatsrahul1001 vatsrahul1001 deleted the optimize/write-dag-exists-query branch January 23, 2026 04:24
@github-actions
Copy link

Backport failed to create: v3-1-test. View the failure log Run details

Status Branch Result
v3-1-test Commit Link

You can attempt to backport this manually by running:

cherry_picker 235595b v3-1-test

This should apply the commit to the v3-1-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

If you don't have cherry-picker installed, see the installation guide.

vatsrahul1001 added a commit that referenced this pull request Jan 23, 2026
…g DagVersion in write_dag (#60937)

Fix DAG processor OOM || Avoid loading all TaskInstances when checking DagVersion in write_dag (#60937)

(cherry picked from commit 235595b)
@vatsrahul1001
Copy link
Contributor Author

Auto backport failed.

Manual backport to v3-1-test #60962

dheerajturaga pushed a commit that referenced this pull request Jan 23, 2026
suii2210 pushed a commit to suii2210/airflow that referenced this pull request Jan 26, 2026
…g DagVersion in write_dag (apache#60937)

Fix DAG processor OOM || Avoid loading all TaskInstances when checking DagVersion in write_dag (apache#60937)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:serialization backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants