Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added None test case to AutoBase __str__ representation #783

Merged
merged 6 commits into from
May 20, 2020

Conversation

christopherbunn
Copy link
Contributor

A continuation of this test case and #481

@codecov
Copy link

codecov bot commented May 19, 2020

Codecov Report

Merging #783 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #783   +/-   ##
=======================================
  Coverage   99.40%   99.40%           
=======================================
  Files         150      150           
  Lines        5567     5578   +11     
=======================================
+ Hits         5534     5545   +11     
  Misses         33       33           
Impacted Files Coverage Δ
evalml/automl/auto_search_base.py 97.77% <100.00%> (ø)
evalml/tests/automl_tests/test_autobase.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a5270d...109a351. Read the comment docs.

@@ -103,26 +103,41 @@ def __str__(self):
_list_separator = '\n\t'

def _print_list(in_attr):
return _list_separator + \
_list_separator.join(obj.name for obj in in_attr)
if isinstance(in_attr, Iterable):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a check to make sure that the input for this private function is Iterable. If something is not accessible, it will attempt to get the string representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I could just directly check to see if it's a list too 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. So I think if you follow my suggestion for the call site, we can delete/replace this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but since the Additional Objectives output uses the same logic, I think it makes sense to generalize the function by retaining the _print_list function name but use that suggestion you linked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I didn't see that. Got it.

search_desc = (
f"{self.problem_type} Search\n\n"
f"{_get_name(self.problem_type)} Search\n\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with handle_problem_type(self.problem_type).name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I guess its actually handle_problem_types

f"Max Time: {self.max_time}\n"
f"Max Pipelines: {self.max_pipelines}\n"
f"Possible Pipelines: {_print_list(self.possible_pipelines)}\n"
f"Patience: {self.patience}\n"
f"Tolerance: {self.tolerance}\n"
f"Cross Validation: {self.cv}\n"
f"Tuner: {type(list(self.tuners.values())[0]).__name__}\n"
f"Tuner: {_get_tuner(self.tuners)}\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you replace this with str(self.tuner.__name__). self.tuner is a Tuner subclass. Later we can add a proper name API for getting this info.

Copy link
Contributor Author

@christopherbunn christopherbunn May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry It seems like self.tuner doesn't actually exist, but rather self.tuners. Since self.tuners is actually a dictionary of possible hyper parameter spaces, this is the only way to access the type of tuner. The tuner class that is passed in isn't stored anywhere. Should I keep this method or add self.tuner (singular) as a new field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn ah sorry, I must've been looking at the branch I'm working on now 😂

How about type(list(tuners.values())[0]).__name__ if len(self.tuners) else '' ? Would that work?

I think what you have is fine but if we can smoosh it into one line, I think it'll be easier to follow.

f"Parameters: \n{'='*20}\n"
f"Objective: {self.objective.name}\n"
f"Objective: {_get_name(self.objective)}\n"
Copy link
Contributor

@dsherry dsherry May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with get_objective(self.objective).name ?

@@ -136,7 +151,7 @@ def _get_funct_name(function):
try:
rankings_str = self.rankings.drop(['parameters'], axis='columns').to_string()
rankings_desc = f"\nSearch Results: \n{'='*20}\n{rankings_str}"
except KeyError:
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other exception(s) are thrown here in the case when the search hasn't been run yet?

Copy link
Contributor Author

@christopherbunn christopherbunn May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at one point, I had an AttributeError when self.objective was None. Since we can assume that self.objective will always be populated, I'll switch this back to a KeyError.

@@ -160,3 +161,60 @@ def _dummy_callback(param1, param2):
str_rep = str(automl)
assert "Search Results:" in str_rep
assert str(automl.rankings.drop(['parameters'], axis='columns')) in str_rep


@patch('evalml.pipelines.BinaryClassificationPipeline.fit')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to mock fit if we never run the automl search

'optimize_thresholds': None
}

automl = AutoClassificationSearch(**search_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not set these directly in the class? We don't use this dict-passing pattern elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I was going to use the search_params dictionary to assert the string output later on (like in test_automl_str_search). Since the field names are outputted differently anyways (and is why we need the param_list afterwards) I can just directly pass it into the class.

automl.cv = None
automl.objective = None
automl.problem_type = None
automl.tuners = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set all these things to None? I would expect this unit test would simply create an AutoSearchBase subclass instance, then call str(automl) and check the result is as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I misunderstood the point of this test then. I assumed that we needed to test what would happen if for some reason every single field was None. If we can go in with the assumption that certain fields will always be populated (such as automl.objective), that makes a lot of the logic I implemented above easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nope. Our API doesn't define getters/setters for these, so it's not made clear explicitly, but: none of those fields are intended to be mutable externally. They're read-only. Modifying them directly will produce unspecified behavior. The only things people should be doing with AutoSearchBase subclass instances is instantiating them, calling search and then accessing the read-only fields like rankings.

In short, yeah, sounds good.

for i in range(0, 3):
assert f"\t{None}" in str_rep
else:
assert f"{param}: None" in str_rep
Copy link
Contributor

@dsherry dsherry May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you simplify this code by removing 'Possible Pipelines' from the param_list var, then replace this code with just:

for param in param_list:
    assert '{}: None'.format(param) in str_rep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, if you want to unit-test what happens when possible_pipelines is an empty list or None for some reason, that would be a great short separate unit test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be redundant to test this since we can assume certain fields will be populated. I'll take it out.

f"Parameters: \n{'='*20}\n"
f"Objective: {self.objective.name}\n"
f"Objective: {_get_name(self.objective)}\n"
f"Max Time: {self.max_time}\n"
f"Max Pipelines: {self.max_pipelines}\n"
f"Possible Pipelines: {_print_list(self.possible_pipelines)}\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we update this to be:

f"Possible Pipelines: {_print_possible_pipelines(self.possible_pipelines or [])}\n"

Then do something like:

def _print_possible_pipelines(possible_pipelines):
    lines = ['\t{}'.format(p) for p in possible_pipelines]
    return '\n'.join(lines)

I guess you could continue to use _list_separator = \t if that felt helpful but maybe not necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason that we print out the string representation of the class rather than the name of the possible pipeline? This prints it out as <class 'evalml.pipelines.classification.catboost_binary.CatBoostBinaryClassificationPipeline'> whereas changing it to '\t{}'.format(p.name) prints it out as Cat Boost Binary Classification Pipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh, no sorry that was just a mistake on my part. Should've done p.name. Good stuff

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. @christopherbunn is gonna try to address sometime today, and I'll get him a prompt re-review!

_list_separator.join(obj.name for obj in in_attr)
def _print_list(obj_list):
lines = ['\t{}'.format(o.name) for o in obj_list]
return '\n'.join(lines)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

else:
assert f"{param}: {str(value)}" in str_rep

assert "Search Results" not in str_rep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn this is an improvement! I still think it could be simplified. Why have special logic for catboost and xgboost? All we care about here is that the output makes sense and no errors are thrown--the specific pipelines included aren't so important IMO. We have other tests which cover that functionality.

Suggestion:

# first define param_str_reps as you did but omit "Possible Pipelines"
str_rep = str(automl)
for param, value in param_str_reps.items():
    assert f"{param}" in str_rep
    if isinstance(value, list):
        value = "\n".join(["\t{}".format(item) for item in value])
        assert value in str_rep
assert "Possible Pipelines" in str_rep
assert "Search Results" not in str_rep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special logic is so that the minimal dependency version passes. Since we aren't explicitly testing to see if these pipelines exist, we can use the simplified logic you have here.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment on the test. But the impl looks ready to go!

@christopherbunn
Copy link
Contributor Author

@dsherry can you override the checks to merge? It looks like everything else but the changelog passes.

@dsherry
Copy link
Contributor

dsherry commented May 20, 2020

@christopherbunn I think that's because there's no changelog entry on this PR right now! Can you please add one? Then all the tests will pass :) And if you don't have time before you sign off, that's ok too, just let me know and one of us will take care of it.

@christopherbunn
Copy link
Contributor Author

@christopherbunn I think that's because there's no changelog entry on this PR right now! Can you please add one? Then all the tests will pass :) And if you don't have time before you sign off, that's ok too, just let me know and one of us will take care of it.

Ah that's my bad, I assumed that we could just lump it in with the changelog entry for the "Add str to the AutoSearch object". I'll make a separate entry and pull it in!

@dsherry
Copy link
Contributor

dsherry commented May 20, 2020

@christopherbunn you can add it to the same entry! And/or edit that entry. Just append ":pr:`<PR_NUM>`" to the end. A separate entry is fine too.

@@ -24,6 +24,7 @@ Changelog
* Delete codecov yml, use codecov.io's default :pr:`732`
* Added unit tests for fraud cost, lead scoring, and standard metric objectives :pr:`741`
* Update codecov client :pr:`782`
* Updated AutoBase __str__ test to include no parameters case :pr:`783`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for following up on this!

@christopherbunn christopherbunn merged commit 2f26183 into master May 20, 2020
@dsherry dsherry deleted the cb_autobase_none branch May 20, 2020 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants