-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing get_objective to be able to get any objective. #1132
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1132 +/- ##
=======================================
Coverage 99.91% 99.91%
=======================================
Files 195 195
Lines 11372 11460 +88
=======================================
+ Hits 11362 11450 +88
Misses 10 10
Continue to review full report at Codecov.
|
@property | ||
def data_check_results(self): | ||
return self._data_check_results | ||
|
||
def __str__(self): | ||
def _print_list(obj_list): | ||
lines = ['\t{}'.format(o.name) for o in obj_list] | ||
lines = sorted(['\t{}'.format(o.name) for o in obj_list]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to add the sorted
to get test_automl_str_no_param_search
to pass. I think this has to do with how OPTIONS
being deleted.
@@ -147,19 +161,22 @@ def __init__(self, | |||
self.optimize_thresholds = optimize_thresholds | |||
if objective == 'auto': | |||
objective = self._DEFAULT_OBJECTIVES[self.problem_type.value] | |||
self.objective = get_objective(objective) | |||
objective = get_objective(objective, return_instance=False) | |||
self.objective = self._validate_objective(objective) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1078, we agreed to have AutoML check whether or not the objective can be used in auto ml search so we added _validate_objective
.
evalml/automl/automl_search.py
Outdated
if objective in self._objectives_not_allowed_in_automl: | ||
raise ValueError(f"{objective.name} is not allowed in AutoML! " + | ||
"Try one of the following objective names: \n" + | ||
pretty_print_all_valid_objective_names()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding pretty_print_all_valid_objective_names
is technically out of scope of this PR but I think it gives helpful feedback to the user when they pass in the wrong name. This is what it looks like now - happy to edit or delete:
ObjectiveNotFoundError: log_loss_binary is not a valid Objective! Objective must be one of:
+-------------------+-------------------+-------------------+------------------+
| AUC | AUC Macro | AUC Micro | AUC Weighted |
+===================+===================+===================+==================+
| Accuracy Binary | Accuracy | Balanced Accuracy | Balanced |
| | Multiclass | Binary | Accuracy |
| | | | Multiclass |
+-------------------+-------------------+-------------------+------------------+
| Cost Benefit | ExpVariance | F1 | F1 Macro |
| Matrix | | | |
+-------------------+-------------------+-------------------+------------------+
| F1 Micro | F1 Weighted | Fraud Cost | Lead Scoring |
+-------------------+-------------------+-------------------+------------------+
| Log Loss Binary | Log Loss | MAE | MCC Binary |
| | Multiclass | | |
+-------------------+-------------------+-------------------+------------------+
| MCC Multiclass | MSE | MaxError | Mean Squared Log |
| | | | Error |
+-------------------+-------------------+-------------------+------------------+
| MedianAE | Precision | Precision Macro | Precision Micro |
+-------------------+-------------------+-------------------+------------------+
| Precision | R2 | Recall | Recall Macro |
| Weighted | | | |
+-------------------+-------------------+-------------------+------------------+
| Recall Micro | Recall Weighted | Root Mean Squared | Root Mean |
| | | Error | Squared Log |
| | | | Error |
+-------------------+-------------------+-------------------+------------------+
| accuracy binary | accuracy | auc | auc macro |
| | multiclass | | |
+-------------------+-------------------+-------------------+------------------+
| auc micro | auc weighted | balanced accuracy | balanced |
| | | binary | accuracy |
| | | | multiclass |
+-------------------+-------------------+-------------------+------------------+
| cost benefit | expvariance | f1 | f1 macro |
| matrix | | | |
+-------------------+-------------------+-------------------+------------------+
| f1 micro | f1 weighted | fraud cost | lead scoring |
+-------------------+-------------------+-------------------+------------------+
| log loss binary | log loss | mae | maxerror |
| | multiclass | | |
+-------------------+-------------------+-------------------+------------------+
| mcc binary | mcc multiclass | mean squared log | medianae |
| | | error | |
+-------------------+-------------------+-------------------+------------------+
| mse | precision | precision macro | precision micro |
+-------------------+-------------------+-------------------+------------------+
| precision | r2 | recall | recall macro |
| weighted | | | |
+-------------------+-------------------+-------------------+------------------+
| recall micro | recall weighted | root mean squared | root mean |
| | | error | squared log |
| | | | error |
+-------------------+-------------------+-------------------+------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah neat! A few things:
- why is it that some are returned as capitalized and others are not?
- I think the header for the first row is a little confusing (AUC, AUC Micro, etc.) as it makes that row seem more important
- Might be a little confusing since we're saying "Objective must be one of:" and printing out some objectives which we can't pass as strings (ex: cost benefit matrix, fraud cost, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is definitely helpful to have, but seems like a lot of information to throw at the user. Perhaps raise the error, and then print the lines of code to get the table? Wonder what other people think.
from evalml.objectives import pretty_print_all_valid_objective_names
pretty_print_all_valid_objective_names()
- Also, do you think it might be better to return a DataFrame? I notice it's texttable. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback guys! @angela97lin I added a method to AutoMLSearch
called print_objective_names_allowed_in_automl
that only displays the objectives you can pass into AutoMLSearch
and I removed the header row and standardized to lower case names. @gsheni I am now telling the user how to print all the valid objective names in the error message as opposed to printing out all the names.
+-------------------+-------------------+-------------------+------------------+
| accuracy binary | accuracy | auc | auc macro |
| | multiclass | | |
+-------------------+-------------------+-------------------+------------------+
| auc micro | auc weighted | balanced accuracy | balanced |
| | | binary | accuracy |
| | | | multiclass |
+-------------------+-------------------+-------------------+------------------+
| expvariance | f1 | f1 macro | f1 micro |
+-------------------+-------------------+-------------------+------------------+
| f1 weighted | log loss binary | log loss | mae |
| | | multiclass | |
+-------------------+-------------------+-------------------+------------------+
| maxerror | mcc binary | mcc multiclass | medianae |
+-------------------+-------------------+-------------------+------------------+
| mse | precision | precision macro | precision micro |
+-------------------+-------------------+-------------------+------------------+
| precision | r2 | root mean squared | |
| weighted | | error | |
+-------------------+-------------------+-------------------+------------------+
evalml/objectives/utils.py
Outdated
@@ -66,4 +73,7 @@ def get_objectives(problem_type): | |||
List of Objectives | |||
""" | |||
problem_type = handle_problem_types(problem_type) | |||
return [obj for obj in OPTIONS.values() if obj.problem_type == problem_type] | |||
all_objectives_dict = _all_objectives_dict() | |||
# To remove duplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we include the lowercase and upper case names in _all_objectives_dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah again see comment about just storing lowercase version in _all_objectives_dict
and then just lower-casing the user input and checking for equality in a case-insensitive way
if 'evalml.objectives' not in objective.__module__: | ||
continue | ||
objectives_dict[objective.name] = objective | ||
objectives_dict[objective.name.lower()] = objective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1078, we mentioned that it'd be nice to include the lowercase name. I personally think we shouldn't bother with the all uppercase name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, rather than storing both here, why not just when we try to get_objective
we check case-insensitively? That is, we convert the user input to all lowercase and check? That way we only have to store objective.name.lower()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Just pushed this up
|
||
EPS = 1e-5 | ||
|
||
all_objectives = _get_subclasses(ObjectiveBase) | ||
all_automl_objectives = _all_objectives_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to define this dictionary because Options
was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, particularly about not storing both upper/lower case versions in _all_objectives_dict
and just checking in get_objective
!
Also, I'm curious, design wise: is it necessary to return an instance? I wonder how much more value that adds given that if get_objective
returns a class, it's easy enough to just instantiate an instance of that class. The best we can do is provide the default parameters anyways. This had come up in this related issue: #580
docs/source/release_notes.rst
Outdated
**Breaking Changes** | ||
* `get_objective` will now return a class definition rather than an instance by default :pr:`1132` | ||
* Deleted `OPTIONS` dictionary in `evalml.objectives.utils.py` :pr:`1132` | ||
* If specifying an objective by string, the string must now match the objective's `name` field. Note that a lowercase case is also valid :pr:`1132` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a lowercase version" (of the name) maybe?
* `get_objective` will now return a class definition rather than an instance by default :pr:`1132` | ||
* Deleted `OPTIONS` dictionary in `evalml.objectives.utils.py` :pr:`1132` | ||
* If specifying an objective by string, the string must now match the objective's `name` field. Note that a lowercase case is also valid :pr:`1132` | ||
* Passing "Cost Benefit Matrix", "Fraud Cost", "Lead Scoring", "Mean Squared Log Error", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Makes much more sense!
evalml/automl/automl_search.py
Outdated
if objective in self._objectives_not_allowed_in_automl: | ||
raise ValueError(f"{objective.name} is not allowed in AutoML! " + | ||
"Try one of the following objective names: \n" + | ||
pretty_print_all_valid_objective_names()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah neat! A few things:
- why is it that some are returned as capitalized and others are not?
- I think the header for the first row is a little confusing (AUC, AUC Micro, etc.) as it makes that row seem more important
- Might be a little confusing since we're saying "Objective must be one of:" and printing out some objectives which we can't pass as strings (ex: cost benefit matrix, fraud cost, etc.)
all_objectives = _get_subclasses(ObjectiveBase) | ||
objectives_dict = {} | ||
for objective in all_objectives: | ||
if 'evalml.objectives' not in objective.__module__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, does this check mean that if a user defines their own objective in their own code, it won't be picked up by this? :O
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! That's the same behavior with get_components()
.
if 'evalml.objectives' not in objective.__module__: | ||
continue | ||
objectives_dict[objective.name] = objective | ||
objectives_dict[objective.name.lower()] = objective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, rather than storing both here, why not just when we try to get_objective
we check case-insensitively? That is, we convert the user input to all lowercase and check? That way we only have to store objective.name.lower()
here?
evalml/objectives/utils.py
Outdated
return objectives_dict | ||
|
||
|
||
def iterate_in_batches(sequence, batch_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion but maybe define this in pretty_print_all_valid_objective_names
if it's only used there? Or otherwise leave a docstr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just refactored this! Ended up creating _print_objectives_in_table
and defining iterate_in_batches
inside that function.
evalml/objectives/utils.py
Outdated
|
||
def get_objective(objective): | ||
|
||
def pretty_print_all_valid_objective_names(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also docstring for this! :D
evalml/objectives/utils.py
Outdated
return table.add_rows(iterate_in_batches(sorted(list(all_objectives_dict.keys())), 4)).draw() | ||
|
||
|
||
def get_objective(objective, return_instance=False): | ||
"""Returns the Objective object of the given objective name | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update "Args" --> "Arguments" :))
"""Returns the Objective object of the given objective name | ||
|
||
Args: | ||
objective (str): name of the objective | ||
objective (str or ObjectiveBase): name or instance of the objective class. | ||
return_instance (bool): Whether to return an instance of the objective. This only applies if objective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, see comment about whether this is absolutely necessary 🤔 Maybe we can get away without this? That way we don't need to check and raise a TypeError...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd get a bit hairy. For example, we'd have to change score
in ClassificationPipeline
to be something like:
objectives = [get_objective(o) for o in objectives]
objective_instances = []
for obj in objectives:
if isinstance(obj, type):
objective_instances.append(obj())
else:
objective_instances.append(obj)
y = self._encode_targets(y)
y_predicted, y_predicted_proba = self._compute_predictions(X, objective_instances)
return self._score_all_objectives(X, y, y_predicted, y_predicted_proba, objectives)
We'd have to do something similar in RegressionPipeline
and anywhere else get_objective
is called. I think it's cleaner to let get_objective
handle whether or not to return an instance?
evalml/objectives/utils.py
Outdated
@@ -66,4 +73,7 @@ def get_objectives(problem_type): | |||
List of Objectives | |||
""" | |||
problem_type = handle_problem_types(problem_type) | |||
return [obj for obj in OPTIONS.values() if obj.problem_type == problem_type] | |||
all_objectives_dict = _all_objectives_dict() | |||
# To remove duplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah again see comment about just storing lowercase version in _all_objectives_dict
and then just lower-casing the user input and checking for equality in a case-insensitive way
evalml/automl/automl_search.py
Outdated
if objective in self._objectives_not_allowed_in_automl: | ||
raise ValueError(f"{objective.name} is not allowed in AutoML! " + | ||
"Try one of the following objective names: \n" + | ||
pretty_print_all_valid_objective_names()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is definitely helpful to have, but seems like a lot of information to throw at the user. Perhaps raise the error, and then print the lines of code to get the table? Wonder what other people think.
from evalml.objectives import pretty_print_all_valid_objective_names
pretty_print_all_valid_objective_names()
- Also, do you think it might be better to return a DataFrame? I notice it's texttable. Just a thought.
@@ -28,6 +29,12 @@ | |||
from evalml.pipelines import BinaryClassificationPipeline | |||
from evalml.problem_types import ProblemTypes | |||
|
|||
_not_allowed_in_automl = AutoMLSearch._objectives_not_allowed_in_automl | |||
|
|||
binary_objectives = [obj() for obj in get_objectives(ProblemTypes.BINARY) if obj not in _not_allowed_in_automl] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are useful functions to have, should they go in a utils somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed they are used in multiple test files...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I just made these into test fixtures.
evalml/objectives/utils.py
Outdated
return table.add_rows(iterate_in_batches(sorted(list(all_objectives_dict.keys())), 4)).draw() | ||
|
||
|
||
def get_objective(objective, return_instance=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using kwargs
would allow you to add support for passing arguments for objectives like
get_objective("cost benefit matrix", return_instance=True, true_positive....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed this up!
f8bc150
to
3fb123e
Compare
15f45c8
to
c577975
Compare
… but instance can't be created.
… functions better.
2848ba1
to
e9b1581
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looooks good! I left a few comments and suggestions but nothing blocking 😊
docs/source/release_notes.rst
Outdated
**Breaking Changes** | ||
* `get_objective` will now return a class definition rather than an instance by default :pr:`1132` | ||
* Deleted `OPTIONS` dictionary in `evalml.objectives.utils.py` :pr:`1132` | ||
* If specifying an objective by string, the string must now match the objective's `name` field. Note that a lowercase case version of the name is also valid :pr:`1132` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change "Note that a lowercase case version of the name is also valid" to "If specifying an objective by string, the string must now match the objective's name
field, case-insensitive"?
evalml/automl/automl_search.py
Outdated
if self.data_split is not None and not issubclass(self.data_split.__class__, BaseCrossValidator): | ||
raise ValueError("Not a valid data splitter") | ||
if self.problem_type != self.objective.problem_type: | ||
raise ValueError("Given objective {} is not compatible with a {} problem.".format(self.objective.name, self.problem_type.value)) | ||
if additional_objectives is None: | ||
additional_objectives = get_objectives(self.problem_type) | ||
additional_objectives = [obj for obj in additional_objectives if obj not in self._objectives_not_allowed_in_automl] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-pick: could combine these two to:
additional_objectives = [obj for obj in get_objectives(self.problem_type) if obj not in self._objectives_not_allowed_in_automl]
evalml/objectives/utils.py
Outdated
@@ -66,4 +93,7 @@ def get_objectives(problem_type): | |||
List of Objectives | |||
""" | |||
problem_type = handle_problem_types(problem_type) | |||
return [obj for obj in OPTIONS.values() if obj.problem_type == problem_type] | |||
all_objectives_dict = _all_objectives_dict() | |||
# To remove duplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't necessary anymore right? :d
@@ -1070,3 +1076,7 @@ def test_max_batches_must_be_non_negative(max_batches): | |||
|
|||
with pytest.raises(ValueError, match="Parameter max batches must be None or non-negative. Received {max_batches}."): | |||
AutoMLSearch(problem_type="binary", _max_batches=max_batches) | |||
|
|||
|
|||
def test_can_print_out_automl_objective_names(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test check for the str output? Or just verifies that it runs / doesn't crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just verifies that it runs! I don't think checking the str output is worth it since it will change whenever we add an objective and this is just a "bonus" function to help users.
X, y = X_y_regression | ||
pipeline = linear_regression_pipeline_class(parameters={}, random_state=np.random.RandomState(42)) | ||
pipeline.fit(X, y) | ||
for objective in get_objectives(ProblemTypes.REGRESSION): | ||
for objective in regression_objectives_allowed_in_automl: | ||
permutation_importance = calculate_permutation_importance(pipeline, X, y, objective) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering: do these have to be objectives that are allowed in automl? Maybe since our prev get_objectives()
method returned that and this is just a translation but otherwise would it be a good idea to test on all of our regression objectives? (Same w/ classification)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea you're right that I'm only testing the objectives used in AutoML because that is what was being done before. I agree that testing all objectives would be useful but it could get hairy with some objectives such as CostBenefitMatrix
. Worth exploring in the future though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, no need to do in this PR :D
def test_get_objective_return_instance_does_not_work_for_some_objectives(): | ||
|
||
with pytest.raises(TypeError, match="In get_objective, cannot pass in return_instance=True for Cost Benefit Matrix"): | ||
get_objective("Cost Benefit Matrix", return_instance=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious:
If we pass in an objective instance such as cost benefit matrix and set return_instance=True
, it should still pass right? Is this worth testing (maybe unnecessary/over the top lolol)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will pass because the first thing we do in get_objective
is check if we have an instance of ObjectiveBase
and if so return it but I will add that as a test case! Good suggestion.
Pull Request Description
Fixes #1078 . This issue has two requirements:
get_objective
so that it can return any objective defined in EvalMLget_objective
to use the objective'sname
field as opposed to snake case names.In the implementation, we decided to use
_get_subclasses
and define a getter for all objectives called_all_objectives_dict
instead of maintaining a staticOPTIONS
dictionary. This is very similar to the pattern we have for components.Since we want to
get_objective
to return any valid objective, we can no longer return a class instance by default (because not all objectives have default values for all parameters), so I added areturn_instance
flag toget_objective
.Sample Usage
Getting any objective
Getting all objectives
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.