Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4591] Make default_pool a real pool #5349

Merged
merged 1 commit into from Jun 20, 2019

Conversation

milton0825
Copy link
Contributor

@milton0825 milton0825 commented May 31, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

tl;dl non_pooled_task_slot_count and non_pooled_backfill_task_slot_count
are removed in favor of a real pool, e.g. default_pool.

Currently the number of running tasks that do not have a pool specified will be limited by non_pooled_task_slot_count or non_pooled_backfill_task_slot_count depends on the whether the task is triggered by scheduler or backfill. Conceptually it is a pool with slots specified through configuration but with a bit different implementation. This PR propose to make this pool a real pool, e.g. using the common implementation of pool so that the implementation can be simplified.

By default tasks are running in default_pool.
default_pool is initialized with 128 slots and user can change the
number of slots through UI/CLI. default_pool cannot be removed.

Screen Shot 2019-05-30 at 6 23 23 PM

Screen Shot 2019-05-30 at 6 26 15 PM

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@milton0825 milton0825 force-pushed the airflow-default-pool branch 2 times, most recently from d8608e1 to a0b6cc6 Compare May 31, 2019 03:35
@codecov-io
Copy link

Codecov Report

Merging #5349 into master will decrease coverage by <.01%.
The diff coverage is 87.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5349      +/-   ##
==========================================
- Coverage   78.94%   78.93%   -0.01%     
==========================================
  Files         480      480              
  Lines       30153    30160       +7     
==========================================
+ Hits        23803    23807       +4     
- Misses       6350     6353       +3
Impacted Files Coverage Δ
airflow/www/views.py 75.51% <0%> (-0.22%) ⬇️
airflow/jobs/scheduler_job.py 70.18% <100%> (-0.14%) ⬇️
airflow/models/baseoperator.py 93.98% <100%> (+0.01%) ⬆️
airflow/models/pool.py 97.05% <100%> (+0.08%) ⬆️
airflow/jobs/backfill_job.py 91.41% <100%> (-0.16%) ⬇️
airflow/utils/db.py 90.82% <100%> (+0.72%) ⬆️
airflow/api/common/experimental/pool.py 100% <100%> (ø) ⬆️
airflow/models/taskinstance.py 92.61% <0%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 669b026...a0b6cc6. Read the comment docs.

@codecov-io
Copy link

codecov-io commented May 31, 2019

Codecov Report

Merging #5349 into master will decrease coverage by 0.06%.
The diff coverage is 88.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5349      +/-   ##
==========================================
- Coverage   79.12%   79.05%   -0.07%     
==========================================
  Files         488      485       -3     
  Lines       30550    30356     -194     
==========================================
- Hits        24172    23999     -173     
+ Misses       6378     6357      -21
Impacted Files Coverage Δ
airflow/www/views.py 75.7% <0%> (-0.23%) ⬇️
airflow/jobs/scheduler_job.py 70.18% <100%> (-0.14%) ⬇️
airflow/models/baseoperator.py 93.98% <100%> (-0.44%) ⬇️
airflow/models/pool.py 97.05% <100%> (+0.08%) ⬆️
airflow/models/taskinstance.py 93.02% <100%> (ø) ⬆️
airflow/jobs/backfill_job.py 91.41% <100%> (-0.16%) ⬇️
airflow/utils/db.py 90.9% <100%> (+0.81%) ⬆️
airflow/api/common/experimental/pool.py 100% <100%> (ø) ⬆️
airflow/operators/slack_operator.py 97.36% <0%> (-2.64%) ⬇️
airflow/contrib/operators/hipchat_operator.py 97.56% <0%> (-2.44%) ⬇️
... and 60 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c8c0b3...4f4f24b. Read the comment docs.

@milton0825
Copy link
Contributor Author

@ashb thanks for the review. This PR is ready for another look.

@milton0825
Copy link
Contributor Author

PTAL @ashb @feng-tao @potiuk

@ashb
Copy link
Member

ashb commented Jun 13, 2019

Sorry - taking another look now. Was rather swamped after vacation and forgot to come back to this.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the code now, but would like others to take a look and see if they agree with the feature.

Is it worth adding a migration to the TaskInstance table to make Pool a non-nullable column too?

airflow/utils/db.py Outdated Show resolved Hide resolved
@milton0825
Copy link
Contributor Author

I think it doesn't hurt to restrict the field to nonnull

@feng-tao
Copy link
Member

I will take a look at the pr next Monday if the pr is still open.

@milton0825 milton0825 force-pushed the airflow-default-pool branch 2 times, most recently from 9330537 to f6a31c7 Compare June 15, 2019 07:15
@milton0825
Copy link
Contributor Author

Ready for another look @ashb @feng-tao @XD-DENG

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good now. Will give others a day or two to chime in.

We could possibly pull this in to 1.10.4 too - it doesn't change any behaviour in a serious way I think?

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good work @milton0825

@milton0825
Copy link
Contributor Author

@ashb This PR only changes how user configure the default pool.

@feng-tao
Copy link
Member

@milton0825 could you rebase with master to resolve the conflict? Will merge it once the CI is clear.

`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.

By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.
@milton0825
Copy link
Contributor Author

@feng-tao I have resolved the conflicts and now CI is green.

@feng-tao feng-tao merged commit 2c99ec6 into apache:master Jun 20, 2019
@feng-tao
Copy link
Member

ship

@milton0825 milton0825 deleted the airflow-default-pool branch June 20, 2019 17:18
@ashb
Copy link
Member

ashb commented Jun 21, 2019

A fair few conflicts - so this might not make it in to 1.10.4 (I haven't looked how difficult they are to resolve, just that there ~10 files and I don't want to deal with that last thing on a Friday afternoon)

Resolved them - they weren't too bad once I looked at it. This will go in to 1.10.4 (as will the pool stats PR)

ashb pushed a commit that referenced this pull request Jun 26, 2019
`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.

By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.

(cherry picked from commit 2c99ec6)
ashb pushed a commit to ashb/airflow that referenced this pull request Jun 26, 2019
`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.

By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.

(cherry picked from commit 2c99ec6)
andriisoldatenko pushed a commit to andriisoldatenko/airflow that referenced this pull request Jul 26, 2019
`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.

By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.

By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.
@vardancse
Copy link
Contributor

Thanks @milton0825 for working on this. 6 days back I discovered the need and now I see this has already been addressed 👍

dharamsk pushed a commit to postmates/airflow that referenced this pull request Aug 8, 2019
`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.

By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.

(cherry picked from commit 2c99ec6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants