-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1914 engine redesign dask #1990
Conversation
Codecov Report
@@ Coverage Diff @@
## 1914-engine-redesign #1990 +/- ##
======================================================
+ Coverage 99.8% 99.9% +0.2%
======================================================
Files 279 282 +3
Lines 22715 22969 +254
======================================================
+ Hits 22650 22931 +281
+ Misses 65 38 -27
Continue to review full report at Codecov.
|
fb11627
to
54f8700
Compare
@@ -0,0 +1,36 @@ | |||
from collections import namedtuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file exists to support parallel testing as Dask testing is super finicky about stuff not being defined at top level when being serialized out to workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? In my "testing" on jupyter notebooks, the dask engine is able to send everything without a problem to the workers and the AutoMLData
and pipelines are defined within the AutoMLSearch
instance. I'm surprised it'd be different when it comes to running within pytest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chukarsten This is fantastic! The test_dask_engine
tests are very thorough. I think there are three tests I'd like to see in test_automl_dask
before merge. The other thing I want to discuss is whether the dask_testing
util file is actually needed. The rest are minor comments that I don't think would hold up a merge.
@@ -0,0 +1,36 @@ | |||
from collections import namedtuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? In my "testing" on jupyter notebooks, the dask engine is able to send everything without a problem to the workers and the AutoMLData
and pipelines are defined within the AutoMLSearch
instance. I'm surprised it'd be different when it comes to running within pytest.
Dask Engine Testing.ipynb
Outdated
@@ -0,0 +1,297 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chukarsten please take this notebook off your branch before merging (or make it a docs demo!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, I need to update it for use with this branch as I think its a design iteration behind, but just wanted to include it for convenience for now in case anyone wanted to mess with it
390abbf
to
f475365
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @chukarsten ! 👏
@pytest.fixture(autouse=True) | ||
def inject_fixtures(self, caplog): | ||
""" Gives the unittests access to the logger""" | ||
self._caplog = caplog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the caplog need to be an instance property? Won't that mean that the caplog is shared across tests? I guess I'm worried what would happen if we have two tests running in parallel, one checks that "error happened" is in caplog and the other checks that "error happened" is not in the caplog.
c227bb6
to
7056109
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look super solid! Great job with this one. I left a single comment on a typo, but looks good otherwise
dask_pipeline_fitted = pipeline_future.get_result() | ||
assert dask_pipeline_fitted._is_fitted | ||
|
||
# Verify parallelization has no affect on output of function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grazi
…to accomodate both styles.
… change to the train test.
7056109
to
611e3f6
Compare
* Sequential Engine implementation * Adding DaskEngine. * Fixing some unit tests. * Fixing unit tests. * Fixing test_callback for minimal dependencies. * Making logging work better for non-sequential engines. * Scattering data in DaskEngine. * Adding ray engine. * Linting ray engine. * Deleting code that should not be covered. * Deleting save error callback tests in automl. * Deleting unneeded lines from AutoMLSearch._should_continue. * Updating woodwork. * Fixing leak. * Use name instead of custom name. * sorting imports * 1914 engine redesign dask (#1990) * Added DaskEngine unit tests * Added AutoMLSearch unit tests utilizing Dask Engine * Removing callbacks from api ref. * Deleting un-used lines in test_automl.py * Adding test for JobLogger error and warning messages. * Fixing docstrings logging tabs * Delete ray engine. * Use engine to train best pipeline * Adding docstring explaining how SequentialComputation works * PR changes - Don't pop in AutoMLSearch while loop. Adding docstrings and sleep in automl search loop. * Linting * Updating latest_dependencies_version.txt to include dask. * Updated docstrings in dask_engine. * Cache rename. * Updated the name of the dask testing utils file. * Added cancel test. * Resend data if cancelled * Trigger Docs Co-authored-by: chukarsten <64713315+chukarsten@users.noreply.github.com> Co-authored-by: Karsten Chu <karsten.chu@alteryx.com>
PR to add some unit tests for the dask engine to the engine redesign.