-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added None test case to AutoBase __str__ representation #783
Conversation
73b414f
to
61a38c0
Compare
Codecov Report
@@ Coverage Diff @@
## master #783 +/- ##
=======================================
Coverage 99.40% 99.40%
=======================================
Files 150 150
Lines 5567 5578 +11
=======================================
+ Hits 5534 5545 +11
Misses 33 33
Continue to review full report at Codecov.
|
evalml/automl/auto_search_base.py
Outdated
@@ -103,26 +103,41 @@ def __str__(self): | |||
_list_separator = '\n\t' | |||
|
|||
def _print_list(in_attr): | |||
return _list_separator + \ | |||
_list_separator.join(obj.name for obj in in_attr) | |||
if isinstance(in_attr, Iterable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a check to make sure that the input for this private function is Iterable
. If something is not accessible, it will attempt to get the string representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I could just directly check to see if it's a list too 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. So I think if you follow my suggestion for the call site, we can delete/replace this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but since the Additional Objectives
output uses the same logic, I think it makes sense to generalize the function by retaining the _print_list
function name but use that suggestion you linked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I didn't see that. Got it.
evalml/automl/auto_search_base.py
Outdated
search_desc = ( | ||
f"{self.problem_type} Search\n\n" | ||
f"{_get_name(self.problem_type)} Search\n\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace with handle_problem_type(self.problem_type).name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I guess its actually handle_problem_types
evalml/automl/auto_search_base.py
Outdated
f"Max Time: {self.max_time}\n" | ||
f"Max Pipelines: {self.max_pipelines}\n" | ||
f"Possible Pipelines: {_print_list(self.possible_pipelines)}\n" | ||
f"Patience: {self.patience}\n" | ||
f"Tolerance: {self.tolerance}\n" | ||
f"Cross Validation: {self.cv}\n" | ||
f"Tuner: {type(list(self.tuners.values())[0]).__name__}\n" | ||
f"Tuner: {_get_tuner(self.tuners)}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you replace this with str(self.tuner.__name__)
. self.tuner
is a Tuner
subclass. Later we can add a proper name
API for getting this info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry It seems like self.tuner
doesn't actually exist, but rather self.tuners
. Since self.tuners
is actually a dictionary of possible hyper parameter spaces, this is the only way to access the type of tuner. The tuner class that is passed in isn't stored anywhere. Should I keep this method or add self.tuner
(singular) as a new field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn ah sorry, I must've been looking at the branch I'm working on now 😂
How about type(list(tuners.values())[0]).__name__ if len(self.tuners) else ''
? Would that work?
I think what you have is fine but if we can smoosh it into one line, I think it'll be easier to follow.
evalml/automl/auto_search_base.py
Outdated
f"Parameters: \n{'='*20}\n" | ||
f"Objective: {self.objective.name}\n" | ||
f"Objective: {_get_name(self.objective)}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace with get_objective(self.objective).name
?
evalml/automl/auto_search_base.py
Outdated
@@ -136,7 +151,7 @@ def _get_funct_name(function): | |||
try: | |||
rankings_str = self.rankings.drop(['parameters'], axis='columns').to_string() | |||
rankings_desc = f"\nSearch Results: \n{'='*20}\n{rankings_str}" | |||
except KeyError: | |||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other exception(s) are thrown here in the case when the search hasn't been run yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at one point, I had an AttributeError
when self.objective
was None
. Since we can assume that self.objective
will always be populated, I'll switch this back to a KeyError
.
@@ -160,3 +161,60 @@ def _dummy_callback(param1, param2): | |||
str_rep = str(automl) | |||
assert "Search Results:" in str_rep | |||
assert str(automl.rankings.drop(['parameters'], axis='columns')) in str_rep | |||
|
|||
|
|||
@patch('evalml.pipelines.BinaryClassificationPipeline.fit') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to mock fit if we never run the automl search
'optimize_thresholds': None | ||
} | ||
|
||
automl = AutoClassificationSearch(**search_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not set these directly in the class? We don't use this dict-passing pattern elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I was going to use the search_params
dictionary to assert the string output later on (like in test_automl_str_search
). Since the field names are outputted differently anyways (and is why we need the param_list
afterwards) I can just directly pass it into the class.
automl.cv = None | ||
automl.objective = None | ||
automl.problem_type = None | ||
automl.tuners = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set all these things to None
? I would expect this unit test would simply create an AutoSearchBase
subclass instance, then call str(automl)
and check the result is as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I misunderstood the point of this test then. I assumed that we needed to test what would happen if for some reason every single field was None
. If we can go in with the assumption that certain fields will always be populated (such as automl.objective
), that makes a lot of the logic I implemented above easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nope. Our API doesn't define getters/setters for these, so it's not made clear explicitly, but: none of those fields are intended to be mutable externally. They're read-only. Modifying them directly will produce unspecified behavior. The only things people should be doing with AutoSearchBase
subclass instances is instantiating them, calling search
and then accessing the read-only fields like rankings
.
In short, yeah, sounds good.
for i in range(0, 3): | ||
assert f"\t{None}" in str_rep | ||
else: | ||
assert f"{param}: None" in str_rep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you simplify this code by removing 'Possible Pipelines'
from the param_list
var, then replace this code with just:
for param in param_list:
assert '{}: None'.format(param) in str_rep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, if you want to unit-test what happens when possible_pipelines
is an empty list or None for some reason, that would be a great short separate unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be redundant to test this since we can assume certain fields will be populated. I'll take it out.
evalml/automl/auto_search_base.py
Outdated
f"Parameters: \n{'='*20}\n" | ||
f"Objective: {self.objective.name}\n" | ||
f"Objective: {_get_name(self.objective)}\n" | ||
f"Max Time: {self.max_time}\n" | ||
f"Max Pipelines: {self.max_pipelines}\n" | ||
f"Possible Pipelines: {_print_list(self.possible_pipelines)}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we update this to be:
f"Possible Pipelines: {_print_possible_pipelines(self.possible_pipelines or [])}\n"
Then do something like:
def _print_possible_pipelines(possible_pipelines):
lines = ['\t{}'.format(p) for p in possible_pipelines]
return '\n'.join(lines)
I guess you could continue to use _list_separator = \t
if that felt helpful but maybe not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason that we print out the string representation of the class rather than the name of the possible pipeline? This prints it out as <class 'evalml.pipelines.classification.catboost_binary.CatBoostBinaryClassificationPipeline'>
whereas changing it to '\t{}'.format(p.name)
prints it out as Cat Boost Binary Classification Pipeline
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh, no sorry that was just a mistake on my part. Should've done p.name
. Good stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. @christopherbunn is gonna try to address sometime today, and I'll get him a prompt re-review!
_list_separator.join(obj.name for obj in in_attr) | ||
def _print_list(obj_list): | ||
lines = ['\t{}'.format(o.name) for o in obj_list] | ||
return '\n'.join(lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
else: | ||
assert f"{param}: {str(value)}" in str_rep | ||
|
||
assert "Search Results" not in str_rep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn this is an improvement! I still think it could be simplified. Why have special logic for catboost and xgboost? All we care about here is that the output makes sense and no errors are thrown--the specific pipelines included aren't so important IMO. We have other tests which cover that functionality.
Suggestion:
# first define param_str_reps as you did but omit "Possible Pipelines"
str_rep = str(automl)
for param, value in param_str_reps.items():
assert f"{param}" in str_rep
if isinstance(value, list):
value = "\n".join(["\t{}".format(item) for item in value])
assert value in str_rep
assert "Possible Pipelines" in str_rep
assert "Search Results" not in str_rep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The special logic is so that the minimal dependency version passes. Since we aren't explicitly testing to see if these pipelines exist, we can use the simplified logic you have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment on the test. But the impl looks ready to go!
96bc6be
to
36fc982
Compare
@dsherry can you override the checks to merge? It looks like everything else but the changelog passes. |
@christopherbunn I think that's because there's no changelog entry on this PR right now! Can you please add one? Then all the tests will pass :) And if you don't have time before you sign off, that's ok too, just let me know and one of us will take care of it. |
Ah that's my bad, I assumed that we could just lump it in with the changelog entry for the "Add str to the AutoSearch object". I'll make a separate entry and pull it in! |
@christopherbunn you can add it to the same entry! And/or edit that entry. Just append ":pr:`<PR_NUM>`" to the end. A separate entry is fine too. |
@@ -24,6 +24,7 @@ Changelog | |||
* Delete codecov yml, use codecov.io's default :pr:`732` | |||
* Added unit tests for fraud cost, lead scoring, and standard metric objectives :pr:`741` | |||
* Update codecov client :pr:`782` | |||
* Updated AutoBase __str__ test to include no parameters case :pr:`783` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for following up on this!
A continuation of this test case and #481