Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent sequential scan of task instance table when clearing dags #8014

Merged
merged 1 commit into from Mar 31, 2020

Conversation

robinedwards
Copy link
Contributor

@robinedwards robinedwards commented Mar 30, 2020

I discovered that when performing an airflow clear without specifying --exclude_subdags a sequential scan is made of the task instance table which in our case is quite large and thus time consuming!

The actual subdag id's are already to hand so there's actually no need to use a like here.

This code is already covered by the following unit test:
def test_subdag_clear_parentdag_downstream_clear(self):

Issue link: WILL BE INSERTED BY boring-cyborg

Make sure to mark the boxes below before creating PR: [x]


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

The exact dag_id is known so no need to perform a like here which caused
a sequential scan.
@kaxil
Copy link
Member

kaxil commented Mar 30, 2020

Restarted the failing test

@codecov-io
Copy link

Codecov Report

Merging #8014 into master will decrease coverage by 27.32%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #8014       +/-   ##
===========================================
- Coverage   87.19%   59.86%   -27.33%     
===========================================
  Files         933      932        -1     
  Lines       45289    45331       +42     
===========================================
- Hits        39490    27139    -12351     
- Misses       5799    18192    +12393     
Impacted Files Coverage Δ
airflow/models/dag.py 91.68% <ø> (ø)
airflow/providers/amazon/aws/hooks/kinesis.py 0.00% <0.00%> (-100.00%) ⬇️
airflow/providers/apache/livy/sensors/livy.py 0.00% <0.00%> (-100.00%) ⬇️
airflow/providers/amazon/aws/sensors/redshift.py 0.00% <0.00%> (-100.00%) ⬇️
airflow/providers/postgres/operators/postgres.py 0.00% <0.00%> (-100.00%) ⬇️
airflow/providers/microsoft/azure/operators/adx.py 0.00% <0.00%> (-100.00%) ⬇️
...irflow/providers/amazon/aws/hooks/batch_waiters.py 0.00% <0.00%> (-100.00%) ⬇️
...ow/providers/amazon/aws/sensors/cloud_formation.py 0.00% <0.00%> (-100.00%) ⬇️
...w/providers/apache/hive/operators/mysql_to_hive.py 0.00% <0.00%> (-100.00%) ⬇️
...w/providers/snowflake/operators/s3_to_snowflake.py 0.00% <0.00%> (-100.00%) ⬇️
... and 321 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7790239...e5b8330. Read the comment docs.

@kaxil kaxil merged commit cfc0d6c into apache:master Mar 31, 2020
@kaxil kaxil added this to the Airflow 1.10.10 milestone Mar 31, 2020
kaxil pushed a commit that referenced this pull request Mar 31, 2020
The exact dag_id is known so no need to perform a like here which caused
a sequential scan.

(cherry picked from commit cfc0d6c)
kaxil pushed a commit that referenced this pull request Apr 1, 2020
The exact dag_id is known so no need to perform a like here which caused
a sequential scan.

(cherry picked from commit cfc0d6c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants