Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add run_type to DagRun #8227

Merged
merged 5 commits into from
Jun 4, 2020

Conversation

turbaszek
Copy link
Member

@turbaszek turbaszek commented Apr 9, 2020

https://issues.apache.org/jira/browse/AIRFLOW-6242


Make sure to mark the boxes below before creating PR: [x]


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Apr 9, 2020
@turbaszek turbaszek requested review from ashb and mik-laj April 9, 2020 10:15
@turbaszek turbaszek added area:MetaDB Meta Database related issues. area:performance and removed area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Apr 9, 2020
@boring-cyborg boring-cyborg bot added area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Apr 9, 2020
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dag.create_run() is a quasi-public API so we should probably mention this change in updating.md?

Looks good, just need to work out what is the best index to put on the new run_type column.

@ashb
Copy link
Member

ashb commented Apr 9, 2020

Oh and we should add the run_type column to the DAGRunModelView.

@turbaszek turbaszek force-pushed the add-run-type-to-dag-run branch 6 times, most recently from 6e2e65e to 01b339e Compare April 15, 2020 16:25
@codecov-io
Copy link

codecov-io commented Apr 17, 2020

Codecov Report

Merging #8227 into master will increase coverage by 0.01%.
The diff coverage is 14.63%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #8227      +/-   ##
=========================================
+ Coverage    6.22%   6.24%   +0.01%     
=========================================
  Files         946     941       -5     
  Lines       45695   45667      -28     
=========================================
+ Hits         2846    2851       +5     
+ Misses      42849   42816      -33     
Impacted Files Coverage Δ
airflow/api/common/experimental/mark_tasks.py 0.00% <ø> (ø)
airflow/api/common/experimental/trigger_dag.py 0.00% <0.00%> (ø)
airflow/jobs/backfill_job.py 0.00% <0.00%> (ø)
airflow/jobs/base_job.py 0.00% <ø> (ø)
airflow/jobs/scheduler_job.py 0.00% <0.00%> (ø)
airflow/operators/subdag_operator.py 0.00% <0.00%> (ø)
airflow/ti_deps/deps/dagrun_id_dep.py 69.23% <0.00%> (-2.20%) ⬇️
airflow/www/views.py 0.00% <0.00%> (ø)
airflow/models/dag.py 26.47% <11.11%> (-0.18%) ⬇️
airflow/models/dagrun.py 27.19% <33.33%> (+0.58%) ⬆️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9eaacbd...ed5618c. Read the comment docs.

dag_id
))
run_type = DagRunType.MANUAL
dag_run = dag_run.find(dag_id=dag_id, run_type=run_type, execution_date=execution_date)
Copy link
Member

@ashb ashb Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there is already a scheduled dag_run for this exact time but of a different run_type, this find() will fail to find anything, but the unique constraint (on exeuction_date, dag_id) would still be violated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it

airflow/models/dag.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member

ashb commented Jun 1, 2020

@turbaszek Is it much work to run your benchmark again: DR.run_type != DagRunType.BACKFILL_JOB.value is something we do in the "hot" path of the scheduler, and the != isn't covered in your current benchmarks.

(It's hard for me to see the detail in the google sheet as in view only mode it won't even let me resize the columns. Thanks Google.)

Seriously? I've added edit access

Yeah, it's cos sometimes that makes sense cos all users share the same view (sorting, filtering, maybe you want to hide colums from some users etc). Annoying for like 80% of the time though!

@turbaszek
Copy link
Member Author

@turbaszek Is it much work to run your benchmark again: DR.run_type != DagRunType.BACKFILL_JOB.value is something we do in the "hot" path of the scheduler, and the != isn't covered in your current benchmarks.

I will run those tests today / tomorrow

@ashb
Copy link
Member

ashb commented Jun 2, 2020

@turbaszek Is it much work to run your benchmark again: DR.run_type != DagRunType.BACKFILL_JOB.value is something we do in the "hot" path of the scheduler, and the != isn't covered in your current benchmarks.

I will run those tests today / tomorrow

Cool, thanks. I don't expect this to make a difference, but if it's easy to check let's do it

@turbaszek
Copy link
Member Author

Assuming that we query all combination of (dag_id, run_id, state) + run_typ != Backfill we get

how many times given idex yielded the minimum time

 "Index('dag_id_state_type', dag_id, _state, run_type),": 0,
 "Index('dag_id_state', dag_id, _state),Index('dag_id_type', dag_id, run_type),": 3,
 "Index('dag_id_type', dag_id, run_type),": 2,
 "Index('dag_id_state', dag_id, _state),": 2

@ashb I've updated the spreadsheet, as expected there's nothing that would favor another index than the current one.

@turbaszek turbaszek force-pushed the add-run-type-to-dag-run branch 2 times, most recently from 21e288c to 3a018fd Compare June 3, 2020 12:08
@turbaszek
Copy link
Member Author

@kaxil @ashb I also addressed all your suggestions

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small change, then LGTM!

(Oh plus fixing the failing test 😉)

airflow/www/views.py Outdated Show resolved Hide resolved
airflow/jobs/scheduler_job.py Outdated Show resolved Hide resolved
@turbaszek turbaszek requested a review from ashb June 3, 2020 17:41
@turbaszek
Copy link
Member Author

Thanks @ashb , your comments are addressed 👌

fixup! Add run_type to DagRun

fixup! fixup! Add run_type to DagRun

fixup! fixup! fixup! Add run_type to DagRun

fixup! fixup! fixup! Add run_type to DagRun

fixup! Add run_type to DagRun

fixup! Add run_type to DagRun

Adjust TriggerDagRunOperator

fixup! Adjust TriggerDagRunOperator

Add index

Make run_type not nullable

Add type check for run_type

fixup! Add type check for run_type
@turbaszek
Copy link
Member Author

@ashb @kaxil @mik-laj I've fixed the migration for MySQL and SQLite

@turbaszek turbaszek merged commit 533b143 into apache:master Jun 4, 2020
@turbaszek turbaszek deleted the add-run-type-to-dag-run branch June 4, 2020 14:20
@turbaszek
Copy link
Member Author

Thanks all!

@potiuk
Copy link
Member

potiuk commented Jun 4, 2020

🎉

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:MetaDB Meta Database related issues. area:performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants