Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1914 engine redesign dask #1990

Merged
merged 27 commits into from
Mar 24, 2021
Merged

Conversation

chukarsten
Copy link
Contributor

PR to add some unit tests for the dask engine to the engine redesign.

@codecov
Copy link

codecov bot commented Mar 17, 2021

Codecov Report

Merging #1990 (611e3f6) into 1914-engine-redesign (24de1e0) will increase coverage by 0.2%.
The diff coverage is 99.7%.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           1914-engine-redesign   #1990     +/-   ##
======================================================
+ Coverage                  99.8%   99.9%   +0.2%     
======================================================
  Files                       279     282      +3     
  Lines                     22715   22969    +254     
======================================================
+ Hits                      22650   22931    +281     
+ Misses                       65      38     -27     
Impacted Files Coverage Δ
evalml/tests/automl_tests/dask_testing.py 98.2% <98.2%> (ø)
evalml/automl/engine/dask_engine.py 97.5% <100.0%> (+62.4%) ⬆️
evalml/tests/automl_tests/test_automl_dask.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_dask_engine.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 100.0% <100.0%> (ø)
evalml/automl/automl_search.py 100.0% <0.0%> (+0.8%) ⬆️
evalml/automl/engine/engine_base.py 98.4% <0.0%> (+0.9%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24de1e0...611e3f6. Read the comment docs.

@chukarsten chukarsten force-pushed the 1914-engine-redesign-dask branch from fb11627 to 54f8700 Compare March 18, 2021 20:33
@@ -0,0 +1,36 @@
from collections import namedtuple
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file exists to support parallel testing as Dask testing is super finicky about stuff not being defined at top level when being serialized out to workers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? In my "testing" on jupyter notebooks, the dask engine is able to send everything without a problem to the workers and the AutoMLData and pipelines are defined within the AutoMLSearch instance. I'm surprised it'd be different when it comes to running within pytest.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten This is fantastic! The test_dask_engine tests are very thorough. I think there are three tests I'd like to see in test_automl_dask before merge. The other thing I want to discuss is whether the dask_testing util file is actually needed. The rest are minor comments that I don't think would hold up a merge.

evalml/tests/automl_tests/dask_testing.py Outdated Show resolved Hide resolved
@@ -0,0 +1,36 @@
from collections import namedtuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? In my "testing" on jupyter notebooks, the dask engine is able to send everything without a problem to the workers and the AutoMLData and pipelines are defined within the AutoMLSearch instance. I'm surprised it'd be different when it comes to running within pytest.

evalml/tests/automl_tests/test_dask_engine.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_dask_engine.py Show resolved Hide resolved
evalml/automl/engine/dask_engine.py Show resolved Hide resolved
evalml/tests/conftest.py Show resolved Hide resolved
evalml/tests/automl_tests/test_dask_engine.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_dask_engine.py Show resolved Hide resolved
evalml/tests/automl_tests/test_dask_engine.py Show resolved Hide resolved
evalml/tests/automl_tests/test_dask_engine.py Outdated Show resolved Hide resolved
@@ -0,0 +1,297 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten please take this notebook off your branch before merging (or make it a docs demo!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I need to update it for use with this branch as I think its a design iteration behind, but just wanted to include it for convenience for now in case anyone wanted to mess with it

@chukarsten chukarsten force-pushed the 1914-engine-redesign-dask branch from 390abbf to f475365 Compare March 22, 2021 19:40
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @chukarsten ! 👏

@pytest.fixture(autouse=True)
def inject_fixtures(self, caplog):
""" Gives the unittests access to the logger"""
self._caplog = caplog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the caplog need to be an instance property? Won't that mean that the caplog is shared across tests? I guess I'm worried what would happen if we have two tests running in parallel, one checks that "error happened" is in caplog and the other checks that "error happened" is not in the caplog.

@chukarsten chukarsten force-pushed the 1914-engine-redesign-dask branch from c227bb6 to 7056109 Compare March 23, 2021 20:57
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look super solid! Great job with this one. I left a single comment on a typo, but looks good otherwise

dask_pipeline_fitted = pipeline_future.get_result()
assert dask_pipeline_fitted._is_fitted

# Verify parallelization has no affect on output of function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: effect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grazi

@chukarsten chukarsten force-pushed the 1914-engine-redesign-dask branch from 7056109 to 611e3f6 Compare March 24, 2021 04:02
@chukarsten chukarsten merged commit 5c5046e into 1914-engine-redesign Mar 24, 2021
freddyaboulton added a commit that referenced this pull request Apr 8, 2021
* Sequential Engine implementation

* Adding DaskEngine.

* Fixing some unit tests.

* Fixing unit tests.

* Fixing test_callback for minimal dependencies.

* Making logging work better for non-sequential engines.

* Scattering data in DaskEngine.

* Adding ray engine.

* Linting ray engine.

* Deleting code that should not be covered.

* Deleting save error callback tests in automl.

* Deleting unneeded lines from AutoMLSearch._should_continue.

* Updating woodwork.

* Fixing leak.

* Use name instead of custom name.

* sorting imports

* 1914 engine redesign dask (#1990)

* Added DaskEngine unit tests

* Added AutoMLSearch unit tests utilizing Dask Engine

* Removing callbacks from api ref.

* Deleting un-used lines in test_automl.py

* Adding test for JobLogger error and warning messages.

* Fixing docstrings logging tabs

* Delete ray engine.

* Use engine to train best pipeline

* Adding docstring explaining how SequentialComputation works

* PR changes - Don't pop in AutoMLSearch while loop. Adding docstrings and sleep in automl search loop.

* Linting

* Updating latest_dependencies_version.txt to include dask.

* Updated docstrings in dask_engine.

* Cache rename.

* Updated the name of the dask testing utils file.

* Added cancel test.

* Resend data if cancelled

* Trigger Docs

Co-authored-by: chukarsten <64713315+chukarsten@users.noreply.github.com>
Co-authored-by: Karsten Chu <karsten.chu@alteryx.com>
@freddyaboulton freddyaboulton deleted the 1914-engine-redesign-dask branch May 13, 2022 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants