Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename max_pipelines to max_iterations #1169

Merged
merged 9 commits into from
Sep 22, 2020

Conversation

christopherbunn
Copy link
Contributor

@christopherbunn christopherbunn commented Sep 14, 2020

Changed the max_pipelines parameter to max_iterations in AutoMLSearch, AutoMLAlgorithm, and IterativeAlgorithm. Added a warning to be raised saying that it will be deprecated in the next release.

Resolves #1149

@codecov
Copy link

codecov bot commented Sep 14, 2020

Codecov Report

Merging #1169 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1169   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         196      196           
  Lines       11987    11995    +8     
=======================================
+ Hits        11978    11986    +8     
  Misses          9        9           
Impacted Files Coverage Δ
...lml/automl/automl_algorithm/iterative_algorithm.py 100.00% <ø> (ø)
evalml/tests/automl_tests/test_automl_algorithm.py 100.00% <ø> (ø)
evalml/automl/automl_algorithm/automl_algorithm.py 100.00% <100.00%> (ø)
evalml/automl/automl_search.py 99.58% <100.00%> (+<0.01%) ⬆️
evalml/tests/automl_tests/test_automl.py 100.00% <100.00%> (ø)
.../automl_tests/test_automl_search_classification.py 100.00% <100.00%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.00% <100.00%> (ø)
.../tests/objective_tests/test_cost_benefit_matrix.py 100.00% <100.00%> (ø)
...alml/tests/objective_tests/test_fraud_detection.py 100.00% <100.00%> (ø)
evalml/tests/objective_tests/test_lead_scoring.py 100.00% <100.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 826b35b...4e584bd. Read the comment docs.

@christopherbunn christopherbunn force-pushed the 1149_rename_max_iterations branch 2 times, most recently from 2c19526 to ca4c003 Compare September 15, 2020 14:22
@christopherbunn christopherbunn marked this pull request as ready for review September 15, 2020 14:22
Comment on lines 50 to 51
def next_batch(self):
pass
Copy link
Contributor Author

@christopherbunn christopherbunn Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov isn't passing because of this line. Should I call it in the line below? Since it's an abstract method I need to have next_batch() but it seem unnecessary to call next_batch() without needing to explicitly test it...

Copy link
Contributor

@freddyaboulton freddyaboulton Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you add a docstring instead of pass it will count as being "run" - but if we just rename max_pipelines to max_iterations in AutoMLAlgorithm then we don't have to raise a deprecation warning and we can delete this code hehe

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn This looks good! I have a comment about using the logger instead of warnings. I also think we don't need to accept both arguments in IterativeAlgorithm and AutoMLAlgorithm.

@@ -12,6 +13,7 @@ class IterativeAlgorithm(AutoMLAlgorithm):
def __init__(self,
allowed_pipelines=None,
max_pipelines=None,
max_iterations=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to accept both max_iterations and max_pipelines. Users don't interact with this class as far as I know - they are called within AutoMLSearch.

@@ -23,13 +25,18 @@ def __init__(self,

Arguments:
allowed_pipelines (list(class)): A list of PipelineBase subclasses indicating the pipelines allowed in the search. The default of None indicates all pipelines for this problem type are allowed.
max_pipelines (int): The maximum number of pipelines to be evaluated.
max_pipelines (int): Will be deprecated in the next release. The maximum number of pipelines to be evaluated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here - I think we can just have a max_iterations parameter in this class since users don't interact with this class.

Comment on lines 50 to 51
def next_batch(self):
pass
Copy link
Contributor

@freddyaboulton freddyaboulton Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you add a docstring instead of pass it will count as being "run" - but if we just rename max_pipelines to max_iterations in AutoMLAlgorithm then we don't have to raise a deprecation warning and we can delete this code hehe

if max_pipelines:
if not max_iterations:
max_iterations = max_pipelines
warnings.warn("`max_pipelines will be deprecated in the next release. Use `max_iterations` instead.", DeprecationWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use logger.warning so that the warning persists in the log for now - although we may change how our logger works in the future (#1174).

Even if we keep warnings.warn we should not use a DeprecationWarning because they are ignored by default unless triggered in __main__ so they would not show up in a jupyter notebook.

@christopherbunn christopherbunn force-pushed the 1149_rename_max_iterations branch 2 times, most recently from c2b8d44 to 8f34cf9 Compare September 17, 2020 15:39
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn looks great, thanks! I left a few small things

@@ -105,9 +106,12 @@ def __init__(self,
LogLossMulticlass for multiclass classification problems, and
R2 for regression problems.

max_pipelines (int): Maximum number of pipelines to search. If max_pipelines and
max_pipelines (int): Will be deprecated in the next release. Maximum number of pipelines to search. If max_pipelines and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks. Did we file another ticket yet to track doing the deprecation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just opened one, it's #1201

self.max_pipelines = 5
logger.info("Using default limit of max_pipelines=5.\n")
if max_pipelines:
if not max_iterations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-pick: I think this if is unnecessary:

if max_pipelines:
    max_iterations = max_pipelines
    logger.warning("`max_pipelines` will be deprecated in the next release. Use `max_iterations` instead.")

Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case I thought of was if max_iterations and max_pipelines were both set, I would want to show something saying to not set max_pipelines but also respect what was in max_iterations. I'll admit that it's a pretty narrow use case though.

logger.warning("`max_pipelines` will be deprecated in the next release. Use `max_iterations` instead.")

self.max_iterations = max_iterations
if self.max_iterations is None and self.max_time is None and _max_batches is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit-pick: if not self.max_iterations and not self.max_time and not _max_batches works too


def test_max_pipelines_deprecation(caplog):
AutoMLSearch(problem_type='binary', max_pipelines=5)
assert "`max_pipelines` will be deprecated in the next release. Use `max_iterations` instead." in caplog.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test where you set both, to different values? Would be good to have that for completeness, even thought we'll delete it soon

@dsherry
Copy link
Contributor

dsherry commented Sep 17, 2020

Oh @christopherbunn one more thing: our current automl user guide doesn't explain what max_pipelines/max_iterations is. I know our API ref has an explanation, but it would be nice to have a clear sentence about that in the user guide. Same with the "Start" guide for that matter. Do you think you could make that happen? Fine to do in a separate PR, or to file separately.

I think its ok if you just say it's one of the parameters which controls the stopping criterion for the automl search, and that it tells automl the maximum number of pipelines to train and evaluate. Something along those lines. You don't need to describe the automl algorithm in further detail, even though we don't do that right now.

@christopherbunn
Copy link
Contributor Author

christopherbunn commented Sep 18, 2020

Oh @christopherbunn one more thing: our current automl user guide doesn't explain what max_pipelines/max_iterations is. I know our API ref has an explanation, but it would be nice to have a clear sentence about that in the user guide. Same with the "Start" guide for that matter. Do you think you could make that happen? Fine to do in a separate PR, or to file separately.

I filed the new issue here: #1203. It makes sense to add some more context to this in the user guides. I'll grab it soon, it shouldn't be too difficult to accomplish.

@christopherbunn christopherbunn merged commit 0ae5014 into main Sep 22, 2020
@freddyaboulton freddyaboulton deleted the 1149_rename_max_iterations branch May 13, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename max_pipelines to max_iterations
3 participants