New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dont allow duplicate pipeline names in AutoMLSearch #1932
Dont allow duplicate pipeline names in AutoMLSearch #1932
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1932 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 267 267
Lines 21700 21730 +30
=========================================
+ Hits 21694 21724 +30
Misses 6 6
Continue to review full report at Codecov.
|
@@ -85,3 +85,29 @@ def tune_binary_threshold(pipeline, objective, problem_type, X_threshold_tuning, | |||
y_predict_proba = pipeline.predict_proba(X_threshold_tuning) | |||
y_predict_proba = y_predict_proba.iloc[:, 1] | |||
pipeline.threshold = objective.optimize_threshold(y_predict_proba, y_threshold_tuning, X=X_threshold_tuning) | |||
|
|||
|
|||
def check_all_pipeline_names_unique(pipelines): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also in my #1913 PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
classic stacked branches. disappointed this was only one on another and not 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🥳
If we want to be super sure maybe it'd be good to add test cases for when multiple (>3) pipelines have the same name to make sure its printed out once (Custom, Custom, Custom), or multiple pipelines with the same names (Custom, Custom, Custom1, Custom1) but probably not super necessary since at that point we're just testing set functionality 😂
|
||
with pytest.raises(ValueError, | ||
match="All pipeline names must be unique. The names 'Custom Pipeline' were repeated."): | ||
AutoMLSearch(X, y, problem_type="binary", allowed_pipelines=[MyPipeline1, MyPipeline2, MyPipeline3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super duper nit-pick suggestion but I wonder if there's a way to phrase this so that it gramatically makes sense for the case with 1 or multiple duplicates :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should've known @angela97lin was driving the tense calculator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to heavily debate whether it was worth it or not to comment about this LOL 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good but I like Angela's suggestion on fixing the grammar in the one duplicate case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed alternate impl for the check to reduce lines, but no need to accept. Great work.
@@ -85,3 +85,29 @@ def tune_binary_threshold(pipeline, objective, problem_type, X_threshold_tuning, | |||
y_predict_proba = pipeline.predict_proba(X_threshold_tuning) | |||
y_predict_proba = y_predict_proba.iloc[:, 1] | |||
pipeline.threshold = objective.optimize_threshold(y_predict_proba, y_threshold_tuning, X=X_threshold_tuning) | |||
|
|||
|
|||
def check_all_pipeline_names_unique(pipelines): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
classic stacked branches. disappointed this was only one on another and not 5
evalml/automl/utils.py
Outdated
None | ||
|
||
Raises: | ||
ValueError if any pipeline names are duplicated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mentioned in your other PR that normally I think we do: ValueError: if....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
seen_names.add(pipeline.name) | ||
|
||
if duplicate_names: | ||
plural, tense = ("s", "were") if len(duplicate_names) > 1 else ("", "was") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I respect this dedication to verb tense calculation.
evalml/automl/utils.py
Outdated
seen_names = set() | ||
duplicate_names = set() | ||
|
||
for pipeline in pipelines: | ||
if pipeline.name in seen_names: | ||
duplicate_names.add(pipeline.name) | ||
else: | ||
seen_names.add(pipeline.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have twice proposed a pandas impl and twice deleted it thinking I was being overbearing. Here it is...not blocking, feel free to reject.
name_count = pd.Series([p.name for p in pipelines]).value_counts()
duplicate_names = name_count[name_count > 1] # I don't think this line is quite right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up taking it! 👏
|
||
with pytest.raises(ValueError, | ||
match="All pipeline names must be unique. The names 'Custom Pipeline' were repeated."): | ||
AutoMLSearch(X, y, problem_type="binary", allowed_pipelines=[MyPipeline1, MyPipeline2, MyPipeline3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should've known @angela97lin was driving the tense calculator
Pull Request Description
Fixes #1858
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.