-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoML & CFEngine Convenience #2667
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2667 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 301 301
Lines 27688 27786 +98
=======================================
+ Hits 27639 27737 +98
Misses 49 49
Continue to review full report at Codecov.
|
fe44132
to
7ae4eb8
Compare
26ef204
to
521d6b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @chukarsten ! Looks great. I left some minor comments I'd like to resolve before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
7f684ca
to
4e3a68c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code and tests look great! Just left a lot of nit-picky comments on documentation (per usual) 😁
e5f4ff4
to
abb7378
Compare
@@ -79,3 +79,6 @@ def submit_scoring_job(self, automl_config, pipeline, X, y, objectives): | |||
) | |||
computation.meta_data["pipeline_name"] = pipeline.name | |||
return computation | |||
|
|||
def close(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just allows the SequentialEngine to not error out when automl is called to shutdown its engines.
@@ -117,7 +117,7 @@ def new(self, parameters, random_seed=0): | |||
def clone(self): | |||
return self.__class__(self.parameters, random_seed=self.random_seed) | |||
|
|||
@delayed(15) | |||
@delayed(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to cutdown runtime.
@@ -18,294 +13,261 @@ | |||
) | |||
from evalml.tuners import SKOptTuner | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Process-level parallelism is still an issue on CI.
) | ||
@pytest.mark.parametrize( | ||
"engine_str", | ||
engine_strs + ["cf_process"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to minimize the execution of the process engines, but need to make sure I get coverage on the special process closing code.
with Client(cluster) as client: | ||
engine = DaskEngine(client=client) | ||
|
||
with DaskEngine() as engine: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New use of the context manager for DaskEngine to ensure the resource allocated are deallocated before the next test is run. This also prevents the pesky "Port # 8787 is occupied" style warnings. It improves test runtime too.
@@ -221,7 +221,7 @@ def X_y_binary(): | |||
return X, y | |||
|
|||
|
|||
@pytest.fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this to try and jibe with the new Sequential engine fixture to cut down on runtime and prevent the SequentialEngine from running more than once by using the cached results from the fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet - did it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @chukarsten ! This is fantastic. I love the updates to the user guide as well. I left some comments/suggestions for possible follow-up issues but nothing blocking merge!
def _get_engine_support(parallel_engine_type, thread_pool, cluster): | ||
"""Helper function to return the proper combination of resource pool, client class and | ||
engine class for testing purposes. | ||
def sequential_results(X_y_binary_cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only used once?
@property | ||
def is_closed(self): | ||
"""Property that determines whether the Engine's Client's resources are shutdown.""" | ||
return self.cluster.status.value == "closed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check if the client is closed? I guess what wondering what if client.close()
throws an exception but cluster.close()
succeeds. Idk if the engine is properly closed at that point.
@@ -221,7 +221,7 @@ def X_y_binary(): | |||
return X, y | |||
|
|||
|
|||
@pytest.fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet - did it work?
Addresses #2561
The point of this PR is to enable AutoMLSearch to be conveniently parallelized using Dask and concurrent.futures as well as make the CFEngine a bit more convenient to use by specifying better default actions. AutoMLSearch can now accept "cf_threaded", "dask_threaded", "cf_process" and "dask_process" to utilize different types of parallel engines for pipeline search.