New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3392] Add index on dag_id in sla_miss table #4235

Merged
merged 1 commit into from Nov 27, 2018

Conversation

Projects
None yet
4 participants
@KevinYang21
Copy link
Contributor

KevinYang21 commented Nov 25, 2018

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    The select queries on sla_miss table produce a great % of DB traffic and thus made the DB CPU usage unnecessarily high. It would be a low hanging fruit to add an index and reduce the load. After all such queries that are issued by the DagFileProcessor should be optimized as much as possible.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Tested within Airbnb's setup. In Airbnb's setup, the # of rows scanned by this query is ~80% of all row scanned. Add such index reduced ~30% of the DB CPU usage and the cluster has been running healthy for 1 month +. Below is the CPU usage metrics.

cpu_diff

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes flake8

@KevinYang21 KevinYang21 force-pushed the KevinYang21:kevin_yang_add_sla_index branch 4 times, most recently from 5479b78 to 2d0ef26 Nov 25, 2018

@Fokko

This comment has been minimized.

Copy link
Contributor

Fokko commented Nov 25, 2018

flake8 runtests: commands[0] | flake8
./airflow/migrations/versions/03bc53e68815_add_sm_dag_index.py:9:2: W291 trailing whitespace
./airflow/migrations/versions/03bc53e68815_add_sm_dag_index.py:11:2: W291 trailing whitespace
./airflow/migrations/versions/03bc53e68815_add_sm_dag_index.py:33:1: E402 module level import not at top of file
@KevinYang21

This comment has been minimized.

Copy link
Contributor Author

KevinYang21 commented Nov 25, 2018

@Fokko Ty for taking a look! Was retrying the CI and not look into the actually failures as it has been pretty flaky in the past two days. Updated the style and will later on put a proper description( with performance improvement measurement) and remove the WIP in the title.

[AIRFLOW-3392] Add index on dag_id in sla_miss table
The select queries on sla_miss table produce a great % of DB traffic and
thus made the DB CPU usage unnecessarily high. It would be a low hanging
fruit to add an index and reduce the load.

@KevinYang21 KevinYang21 force-pushed the KevinYang21:kevin_yang_add_sla_index branch from 2d0ef26 to 264a52d Nov 25, 2018

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Nov 26, 2018

Codecov Report

Merging #4235 into master will increase coverage by <.01%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4235      +/-   ##
==========================================
+ Coverage   77.81%   77.82%   +<.01%     
==========================================
  Files         201      201              
  Lines       16366    16367       +1     
==========================================
+ Hits        12736    12737       +1     
  Misses       3630     3630
Impacted Files Coverage Δ
airflow/models.py 92.25% <50%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 930e2ff...264a52d. Read the comment docs.

@KevinYang21 KevinYang21 changed the title [WIP][AIRFLOW-3392] Add index on dag_id in sla_miss table [AIRFLOW-3392] Add index on dag_id in sla_miss table Nov 27, 2018

@KevinYang21

This comment has been minimized.

Copy link
Contributor Author

KevinYang21 commented Nov 27, 2018

@feng-tao

This comment has been minimized.

Copy link
Contributor

feng-tao commented Nov 27, 2018

lgtm. thanks @KevinYang21

@feng-tao feng-tao merged commit 0ac1e63 into apache:master Nov 27, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

kaxil added a commit that referenced this pull request Nov 30, 2018

[AIRFLOW-3392] Add index on dag_id in sla_miss table (#4235)
The select queries on sla_miss table produce a great % of DB traffic and
thus made the DB CPU usage unnecessarily high. It would be a low hanging
fruit to add an index and reduce the load.

elizabethhalper added a commit to cse-airflow/incubator-airflow that referenced this pull request Dec 7, 2018

[AIRFLOW-3392] Add index on dag_id in sla_miss table (apache#4235)
The select queries on sla_miss table produce a great % of DB traffic and
thus made the DB CPU usage unnecessarily high. It would be a low hanging
fruit to add an index and reduce the load.

ashb added a commit to ashb/airflow that referenced this pull request Dec 15, 2018

[AIRFLOW-3392] Add index on dag_id in sla_miss table (apache#4235)
The select queries on sla_miss table produce a great % of DB traffic and
thus made the DB CPU usage unnecessarily high. It would be a low hanging
fruit to add an index and reduce the load.

ashb added a commit to ashb/airflow that referenced this pull request Dec 15, 2018

[AIRFLOW-3392] Add index on dag_id in sla_miss table (apache#4235)
The select queries on sla_miss table produce a great % of DB traffic and
thus made the DB CPU usage unnecessarily high. It would be a low hanging
fruit to add an index and reduce the load.

aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019

[AIRFLOW-3392] Add index on dag_id in sla_miss table (apache#4235)
The select queries on sla_miss table produce a great % of DB traffic and
thus made the DB CPU usage unnecessarily high. It would be a low hanging
fruit to add an index and reduce the load.

kaxil added a commit that referenced this pull request Jan 9, 2019

[AIRFLOW-3392] Add index on dag_id in sla_miss table (#4235)
The select queries on sla_miss table produce a great % of DB traffic and
thus made the DB CPU usage unnecessarily high. It would be a low hanging
fruit to add an index and reduce the load.

ashb added a commit to ashb/airflow that referenced this pull request Jan 10, 2019

[AIRFLOW-3392] Add index on dag_id in sla_miss table (apache#4235)
The select queries on sla_miss table produce a great % of DB traffic and
thus made the DB CPU usage unnecessarily high. It would be a low hanging
fruit to add an index and reduce the load.

cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Jan 23, 2019

Add index on dag_id in sla_miss table (apache#4235)
The select queries on sla_miss table produce a great % of DB traffic and
thus made the DB CPU usage unnecessarily high. It would be a low hanging
fruit to add an index and reduce the load.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment