Conversation
420f8cb to
e4530c9
Compare
Codecov Report
@@ Coverage Diff @@
## master #2067 +/- ##
=========================================
- Coverage 67.54% 67% -0.55%
=========================================
Files 139 140 +1
Lines 10581 10625 +44
=========================================
- Hits 7147 7119 -28
- Misses 3434 3506 +72
Continue to review full report at Codecov.
|
d9c00a0 to
0e0cd7e
Compare
airflow/executors/__init__.py
Outdated
There was a problem hiding this comment.
Shouldn't we handle this the same as the MesosExecutor (CeleryExecutor should be updated as well), ie. make it dependent one the configuration if we should load the module and not dependent whether it is available.
tests/executors/dask_executor.py
Outdated
There was a problem hiding this comment.
please add some real unit tests (these are integration tests) that test the functions and their expected input/outputs.
There was a problem hiding this comment.
Added tests for both the Executor functions and also the process of manually stepping through running TaskInstances (simulating a job)
airflow/executors/dask_executor.py
Outdated
There was a problem hiding this comment.
Does Dask not support queues? Then this should be noted in the documentation and it should be logged as a warning if the queue is anything different from the default.
There was a problem hiding this comment.
Updated to issue a warning
There was a problem hiding this comment.
Nice. Please also add it to the documentation.
|
@bolkedebruin thanks for the comments -- I've updated the PR, let me know if it's in line. |
tests/executors/dask_executor.py
Outdated
There was a problem hiding this comment.
Is that required? this way it still "breathes" integration test, rather than unit test.
There was a problem hiding this comment.
I hear you -- here are the full details:
- The function is being executed in the cluster, so if we proceed immediately it might not have time to complete
- We could check the future in a
whileloop until it completes, but we'd have to wrap that in a timeout just in case there is an error communicating with the cluster - Therefore, my thought was to skip the while loop and just block for a moment to see if the future completes.
However for completeness I'll put that in the code, better to be explicit! One more commit coming.
|
I'm still a bit picky on the unit tests rest apart from doc update LGTM |
0a3ad80 to
b920d5e
Compare
Adds a DaskExecutor for running Airflow tasks in Dask clusters.
b920d5e to
2dddb2a
Compare
|
@jlowin I just noticed that codecov reported a decreased coverage and travis is reporting this Can you fix that please? |
|
Shoot -- it was the change we made to Submitting a fix immediately. |
|
The coverage issue appears to be from the same change, which moved the import statements for "non-standard" executors from the top of the file to inside a block that firsts checks airflow.cfg. I think what this means is that there have just never been unit tests for the Celery Executor, so it never gets imported (and therefore coverage drops) |
|
@jlowin Correct, celery does not have coverage. This is one of the reasons why I would like to really start enforcing unit tests (as opposed to integration tests) and integration tests. Coverage drops because Dask is being added but not tests (I am not sure what you meant with your last statement) |
Adds a DaskExecutor for running Airflow tasks in Dask clusters. Closes apache#2067 from jlowin/dask-executor
|
@jlowin can distribute version above v2 work? Setup.py limits it to <2 |
Dear Airflow Maintainers,
Please accept this PR that addresses the following issues:
Description:
The Dask Distributed subproject makes it incredibly easy to create clusters of Python workers. Distributed is pure-Python, doesn't require an external database, has a built-in monitoring UI, and can be run anywhere from a laptop to thousands of networked cores. This Executor allows Airflow to execute tasks in a Dask cluster.
To quickly get started with a cluster, see the instructions in the Airflow configuration docs.
Testing Done: