Dynamic task mapping increases DAG File Processing time #36454
Replies: 2 comments 7 replies
-
Is it possible to do some more monitoring and seeing what happens while you are doing the backfill? I think it could be caused not because of the parsing time, but because of excessive database queries - Here is one optimisation that is just about to be merged that limits the number of database queries issued during the backfill - and the number of queries is related to a number of tasks being backfilled, so this would likely match what you are observing #36418 (review) Can you please do some tracking of the DB queries that are executed during the backfill and see if that is related ? |
Beta Was this translation helpful? Give feedback.
-
I also created #36483 targeted to be released in 2.8.1 as part of our documentation, largely explaining the same I just did above in our documentation. |
Beta Was this translation helpful? Give feedback.
-
SInce we started to use dynamic task mapping, we have a backfill of roughly 100k tasks that we want to re-run using DTM. Normally, this DAG parses in 1-3 seconds. There are some legacy slow top-level imports which are hard to remove.
However, when trying to run the 100k tasks, the scheduler crawls to a halt - DAG file processing time increases massively on the DAGs with lots of tasks, to the point of timeouts. We've increased the timeout to 10 minutes but even that isn't enough. We still get timeouts, and no tasks are being scheduled.
Looking at the processor, I'm not seeing things that would scale with TaskInstances (no SLA callbacks). The metadata db health is fine. DAG pickling is turned off.
I'm lacking knowledge about Airflow internals here, what possible causes are there for DAG file processing time increase so much here, and how do I remedy this?
Our Airflow version is 2.7.3
Beta Was this translation helpful? Give feedback.
All reactions