Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoML & CFEngine Convenience #2667

Merged
merged 45 commits into from
Sep 1, 2021
Merged

Conversation

chukarsten
Copy link
Contributor

@chukarsten chukarsten commented Aug 20, 2021

Addresses #2561

The point of this PR is to enable AutoMLSearch to be conveniently parallelized using Dask and concurrent.futures as well as make the CFEngine a bit more convenient to use by specifying better default actions. AutoMLSearch can now accept "cf_threaded", "dask_threaded", "cf_process" and "dask_process" to utilize different types of parallel engines for pipeline search.

@chukarsten chukarsten marked this pull request as draft August 20, 2021 04:49
@codecov
Copy link

codecov bot commented Aug 20, 2021

Codecov Report

Merging #2667 (2936b7a) into main (95250c4) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2667     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        301     301             
  Lines      27688   27786     +98     
=======================================
+ Hits       27639   27737     +98     
  Misses        49      49             
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.9% <100.0%> (+0.1%) ⬆️
evalml/automl/engine/cf_engine.py 100.0% <100.0%> (ø)
evalml/automl/engine/dask_engine.py 100.0% <100.0%> (ø)
evalml/automl/engine/sequential_engine.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/dask_test_utils.py 100.0% <100.0%> (ø)
...ts/automl_tests/parallel_tests/test_automl_dask.py 100.0% <100.0%> (ø)
...ests/automl_tests/parallel_tests/test_cf_engine.py 100.0% <100.0%> (ø)
...ts/automl_tests/parallel_tests/test_dask_engine.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.7% <100.0%> (+0.1%) ⬆️
evalml/tests/conftest.py 98.6% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95250c4...2936b7a. Read the comment docs.

@chukarsten chukarsten force-pushed the 2561_parallel_automl_convenience branch 2 times, most recently from fe44132 to 7ae4eb8 Compare August 23, 2021 18:14
@chukarsten chukarsten marked this pull request as ready for review August 23, 2021 19:10
@chukarsten chukarsten changed the title AutoML engine parameter accept string AutoML & CFEngine Convenience Aug 23, 2021
@chukarsten chukarsten marked this pull request as draft August 23, 2021 20:11
@chukarsten chukarsten force-pushed the 2561_parallel_automl_convenience branch from 26ef204 to 521d6b6 Compare August 24, 2021 00:02
@chukarsten chukarsten marked this pull request as ready for review August 24, 2021 00:02
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @chukarsten ! Looks great. I left some minor comments I'd like to resolve before merging.

evalml/automl/automl_search.py Outdated Show resolved Hide resolved
evalml/automl/automl_search.py Outdated Show resolved Hide resolved
evalml/automl/automl_search.py Outdated Show resolved Hide resolved
evalml/automl/engine/cf_engine.py Show resolved Hide resolved
evalml/automl/automl_search.py Outdated Show resolved Hide resolved
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@chukarsten chukarsten force-pushed the 2561_parallel_automl_convenience branch from 7f684ca to 4e3a68c Compare August 25, 2021 15:43
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code and tests look great! Just left a lot of nit-picky comments on documentation (per usual) 😁

docs/source/user_guide/automl.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/automl.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/automl.ipynb Show resolved Hide resolved
evalml/automl/automl_search.py Outdated Show resolved Hide resolved
evalml/automl/automl_search.py Outdated Show resolved Hide resolved
evalml/automl/engine/cf_engine.py Outdated Show resolved Hide resolved
evalml/automl/engine/cf_engine.py Outdated Show resolved Hide resolved
evalml/automl/engine/dask_engine.py Outdated Show resolved Hide resolved
@chukarsten chukarsten force-pushed the 2561_parallel_automl_convenience branch 2 times, most recently from e5f4ff4 to abb7378 Compare August 31, 2021 00:23
evalml/automl/engine/dask_engine.py Show resolved Hide resolved
evalml/automl/engine/dask_engine.py Show resolved Hide resolved
@@ -79,3 +79,6 @@ def submit_scoring_job(self, automl_config, pipeline, X, y, objectives):
)
computation.meta_data["pipeline_name"] = pipeline.name
return computation

def close(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just allows the SequentialEngine to not error out when automl is called to shutdown its engines.

@@ -117,7 +117,7 @@ def new(self, parameters, random_seed=0):
def clone(self):
return self.__class__(self.parameters, random_seed=self.random_seed)

@delayed(15)
@delayed(2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to cutdown runtime.

@@ -18,294 +13,261 @@
)
from evalml.tuners import SKOptTuner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process-level parallelism is still an issue on CI.

)
@pytest.mark.parametrize(
"engine_str",
engine_strs + ["cf_process"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to minimize the execution of the process engines, but need to make sure I get coverage on the special process closing code.

with Client(cluster) as client:
engine = DaskEngine(client=client)

with DaskEngine() as engine:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New use of the context manager for DaskEngine to ensure the resource allocated are deallocated before the next test is run. This also prevents the pesky "Port # 8787 is occupied" style warnings. It improves test runtime too.

@@ -221,7 +221,7 @@ def X_y_binary():
return X, y


@pytest.fixture
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to try and jibe with the new Sequential engine fixture to cut down on runtime and prevent the SequentialEngine from running more than once by using the cached results from the fixture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet - did it work?

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @chukarsten ! This is fantastic. I love the updates to the user guide as well. I left some comments/suggestions for possible follow-up issues but nothing blocking merge!

evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
evalml/automl/automl_search.py Show resolved Hide resolved
evalml/automl/engine/dask_engine.py Show resolved Hide resolved
def _get_engine_support(parallel_engine_type, thread_pool, cluster):
"""Helper function to return the proper combination of resource pool, client class and
engine class for testing purposes.
def sequential_results(X_y_binary_cls):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only used once?

@property
def is_closed(self):
"""Property that determines whether the Engine's Client's resources are shutdown."""
return self.cluster.status.value == "closed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check if the client is closed? I guess what wondering what if client.close() throws an exception but cluster.close() succeeds. Idk if the engine is properly closed at that point.

evalml/automl/engine/dask_engine.py Show resolved Hide resolved
@@ -221,7 +221,7 @@ def X_y_binary():
return X, y


@pytest.fixture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet - did it work?

@chukarsten chukarsten merged commit b0212dd into main Sep 1, 2021
@chukarsten chukarsten mentioned this pull request Sep 1, 2021
@angela97lin angela97lin deleted the 2561_parallel_automl_convenience branch January 11, 2022 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants