Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing get_objective to be able to get any objective. #1132

Merged
merged 14 commits into from Sep 4, 2020

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Aug 31, 2020

Pull Request Description

Fixes #1078 . This issue has two requirements:

  1. Modify get_objective so that it can return any objective defined in EvalML
  2. Modify get_objective to use the objective's name field as opposed to snake case names.

In the implementation, we decided to use _get_subclasses and define a getter for all objectives called _all_objectives_dict instead of maintaining a static OPTIONS dictionary. This is very similar to the pattern we have for components.

Since we want to get_objective to return any valid objective, we can no longer return a class instance by default (because not all objectives have default values for all parameters), so I added a return_instance flag to get_objective.

Sample Usage

Getting any objective

from evalml.objectives import get_objective

# returns the class
get_objective("Log Loss Binary")
get_objective("log loss binary")
get_objective("cost benefit matrix")

# returns an instance
get_objective("log loss binary", return_instance=True)

# errors out when name doesn't match
get_objective("log_loss")

# errors out because cost benefit matrix doesn't have default arguments
get_objective("cost benefit matrix", return_instance=True)

Getting all objectives

from evalml.objectives.utils import _all_objectives_dict

# return a dictionary from name to class
_all_objectives_dict()

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Aug 31, 2020

Codecov Report

Merging #1132 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1132   +/-   ##
=======================================
  Coverage   99.91%   99.91%           
=======================================
  Files         195      195           
  Lines       11372    11460   +88     
=======================================
+ Hits        11362    11450   +88     
  Misses         10       10           
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.58% <100.00%> (+0.01%) ⬆️
evalml/model_understanding/graphs.py 100.00% <100.00%> (ø)
evalml/objectives/__init__.py 100.00% <100.00%> (ø)
evalml/objectives/utils.py 100.00% <100.00%> (ø)
evalml/pipelines/binary_classification_pipeline.py 100.00% <100.00%> (ø)
evalml/pipelines/classification_pipeline.py 100.00% <100.00%> (ø)
evalml/pipelines/regression_pipeline.py 100.00% <100.00%> (ø)
evalml/tests/automl_tests/test_automl.py 100.00% <100.00%> (ø)
.../automl_tests/test_automl_search_classification.py 100.00% <100.00%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.00% <100.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41b711a...a39c069. Read the comment docs.

@property
def data_check_results(self):
return self._data_check_results

def __str__(self):
def _print_list(obj_list):
lines = ['\t{}'.format(o.name) for o in obj_list]
lines = sorted(['\t{}'.format(o.name) for o in obj_list])
Copy link
Contributor Author

@freddyaboulton freddyaboulton Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add the sorted to get test_automl_str_no_param_search to pass. I think this has to do with how OPTIONS being deleted.

@@ -147,19 +161,22 @@ def __init__(self,
self.optimize_thresholds = optimize_thresholds
if objective == 'auto':
objective = self._DEFAULT_OBJECTIVES[self.problem_type.value]
self.objective = get_objective(objective)
objective = get_objective(objective, return_instance=False)
self.objective = self._validate_objective(objective)
Copy link
Contributor Author

@freddyaboulton freddyaboulton Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1078, we agreed to have AutoML check whether or not the objective can be used in auto ml search so we added _validate_objective.

if objective in self._objectives_not_allowed_in_automl:
raise ValueError(f"{objective.name} is not allowed in AutoML! " +
"Try one of the following objective names: \n" +
pretty_print_all_valid_objective_names())
Copy link
Contributor Author

@freddyaboulton freddyaboulton Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding pretty_print_all_valid_objective_names is technically out of scope of this PR but I think it gives helpful feedback to the user when they pass in the wrong name. This is what it looks like now - happy to edit or delete:

ObjectiveNotFoundError: log_loss_binary is not a valid Objective! Objective must be one of:
+-------------------+-------------------+-------------------+------------------+
|        AUC        |     AUC Macro     |     AUC Micro     |   AUC Weighted   |
+===================+===================+===================+==================+
| Accuracy Binary   | Accuracy          | Balanced Accuracy | Balanced         |
|                   | Multiclass        | Binary            | Accuracy         |
|                   |                   |                   | Multiclass       |
+-------------------+-------------------+-------------------+------------------+
| Cost Benefit      | ExpVariance       | F1                | F1 Macro         |
| Matrix            |                   |                   |                  |
+-------------------+-------------------+-------------------+------------------+
| F1 Micro          | F1 Weighted       | Fraud Cost        | Lead Scoring     |
+-------------------+-------------------+-------------------+------------------+
| Log Loss Binary   | Log Loss          | MAE               | MCC Binary       |
|                   | Multiclass        |                   |                  |
+-------------------+-------------------+-------------------+------------------+
| MCC Multiclass    | MSE               | MaxError          | Mean Squared Log |
|                   |                   |                   | Error            |
+-------------------+-------------------+-------------------+------------------+
| MedianAE          | Precision         | Precision Macro   | Precision Micro  |
+-------------------+-------------------+-------------------+------------------+
| Precision         | R2                | Recall            | Recall Macro     |
| Weighted          |                   |                   |                  |
+-------------------+-------------------+-------------------+------------------+
| Recall Micro      | Recall Weighted   | Root Mean Squared | Root Mean        |
|                   |                   | Error             | Squared Log      |
|                   |                   |                   | Error            |
+-------------------+-------------------+-------------------+------------------+
| accuracy binary   | accuracy          | auc               | auc macro        |
|                   | multiclass        |                   |                  |
+-------------------+-------------------+-------------------+------------------+
| auc micro         | auc weighted      | balanced accuracy | balanced         |
|                   |                   | binary            | accuracy         |
|                   |                   |                   | multiclass       |
+-------------------+-------------------+-------------------+------------------+
| cost benefit      | expvariance       | f1                | f1 macro         |
| matrix            |                   |                   |                  |
+-------------------+-------------------+-------------------+------------------+
| f1 micro          | f1 weighted       | fraud cost        | lead scoring     |
+-------------------+-------------------+-------------------+------------------+
| log loss binary   | log loss          | mae               | maxerror         |
|                   | multiclass        |                   |                  |
+-------------------+-------------------+-------------------+------------------+
| mcc binary        | mcc multiclass    | mean squared log  | medianae         |
|                   |                   | error             |                  |
+-------------------+-------------------+-------------------+------------------+
| mse               | precision         | precision macro   | precision micro  |
+-------------------+-------------------+-------------------+------------------+
| precision         | r2                | recall            | recall macro     |
| weighted          |                   |                   |                  |
+-------------------+-------------------+-------------------+------------------+
| recall micro      | recall weighted   | root mean squared | root mean        |
|                   |                   | error             | squared log      |
|                   |                   |                   | error            |
+-------------------+-------------------+-------------------+------------------+

Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah neat! A few things:

  • why is it that some are returned as capitalized and others are not?
  • I think the header for the first row is a little confusing (AUC, AUC Micro, etc.) as it makes that row seem more important
  • Might be a little confusing since we're saying "Objective must be one of:" and printing out some objectives which we can't pass as strings (ex: cost benefit matrix, fraud cost, etc.)

Copy link
Member

@gsheni gsheni Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely helpful to have, but seems like a lot of information to throw at the user. Perhaps raise the error, and then print the lines of code to get the table? Wonder what other people think.

from evalml.objectives import pretty_print_all_valid_objective_names
pretty_print_all_valid_objective_names()
  • Also, do you think it might be better to return a DataFrame? I notice it's texttable. Just a thought.

Copy link
Contributor Author

@freddyaboulton freddyaboulton Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback guys! @angela97lin I added a method to AutoMLSearch called print_objective_names_allowed_in_automl that only displays the objectives you can pass into AutoMLSearch and I removed the header row and standardized to lower case names. @gsheni I am now telling the user how to print all the valid objective names in the error message as opposed to printing out all the names.

+-------------------+-------------------+-------------------+------------------+
| accuracy binary   | accuracy          | auc               | auc macro        |
|                   | multiclass        |                   |                  |
+-------------------+-------------------+-------------------+------------------+
| auc micro         | auc weighted      | balanced accuracy | balanced         |
|                   |                   | binary            | accuracy         |
|                   |                   |                   | multiclass       |
+-------------------+-------------------+-------------------+------------------+
| expvariance       | f1                | f1 macro          | f1 micro         |
+-------------------+-------------------+-------------------+------------------+
| f1 weighted       | log loss binary   | log loss          | mae              |
|                   |                   | multiclass        |                  |
+-------------------+-------------------+-------------------+------------------+
| maxerror          | mcc binary        | mcc multiclass    | medianae         |
+-------------------+-------------------+-------------------+------------------+
| mse               | precision         | precision macro   | precision micro  |
+-------------------+-------------------+-------------------+------------------+
| precision         | r2                | root mean squared |                  |
| weighted          |                   | error             |                  |
+-------------------+-------------------+-------------------+------------------+

@@ -66,4 +73,7 @@ def get_objectives(problem_type):
List of Objectives
"""
problem_type = handle_problem_types(problem_type)
return [obj for obj in OPTIONS.values() if obj.problem_type == problem_type]
all_objectives_dict = _all_objectives_dict()
# To remove duplicates
Copy link
Contributor Author

@freddyaboulton freddyaboulton Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we include the lowercase and upper case names in _all_objectives_dict

Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah again see comment about just storing lowercase version in _all_objectives_dict and then just lower-casing the user input and checking for equality in a case-insensitive way

if 'evalml.objectives' not in objective.__module__:
continue
objectives_dict[objective.name] = objective
objectives_dict[objective.name.lower()] = objective
Copy link
Contributor Author

@freddyaboulton freddyaboulton Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1078, we mentioned that it'd be nice to include the lowercase name. I personally think we shouldn't bother with the all uppercase name.

Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, rather than storing both here, why not just when we try to get_objective we check case-insensitively? That is, we convert the user input to all lowercase and check? That way we only have to store objective.name.lower() here?

Copy link
Contributor Author

@freddyaboulton freddyaboulton Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Just pushed this up


EPS = 1e-5

all_objectives = _get_subclasses(ObjectiveBase)
all_automl_objectives = _all_objectives_dict()
Copy link
Contributor Author

@freddyaboulton freddyaboulton Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to define this dictionary because Options was deleted.

@freddyaboulton freddyaboulton marked this pull request as ready for review Aug 31, 2020
@freddyaboulton freddyaboulton self-assigned this Aug 31, 2020
Copy link
Contributor

@angela97lin angela97lin left a comment

Left a few comments, particularly about not storing both upper/lower case versions in _all_objectives_dict and just checking in get_objective!

Also, I'm curious, design wise: is it necessary to return an instance? I wonder how much more value that adds given that if get_objective returns a class, it's easy enough to just instantiate an instance of that class. The best we can do is provide the default parameters anyways. This had come up in this related issue: #580

**Breaking Changes**
* `get_objective` will now return a class definition rather than an instance by default :pr:`1132`
* Deleted `OPTIONS` dictionary in `evalml.objectives.utils.py` :pr:`1132`
* If specifying an objective by string, the string must now match the objective's `name` field. Note that a lowercase case is also valid :pr:`1132`
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a lowercase version" (of the name) maybe?

* `get_objective` will now return a class definition rather than an instance by default :pr:`1132`
* Deleted `OPTIONS` dictionary in `evalml.objectives.utils.py` :pr:`1132`
* If specifying an objective by string, the string must now match the objective's `name` field. Note that a lowercase case is also valid :pr:`1132`
* Passing "Cost Benefit Matrix", "Fraud Cost", "Lead Scoring", "Mean Squared Log Error",
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Makes much more sense!

if objective in self._objectives_not_allowed_in_automl:
raise ValueError(f"{objective.name} is not allowed in AutoML! " +
"Try one of the following objective names: \n" +
pretty_print_all_valid_objective_names())
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah neat! A few things:

  • why is it that some are returned as capitalized and others are not?
  • I think the header for the first row is a little confusing (AUC, AUC Micro, etc.) as it makes that row seem more important
  • Might be a little confusing since we're saying "Objective must be one of:" and printing out some objectives which we can't pass as strings (ex: cost benefit matrix, fraud cost, etc.)

all_objectives = _get_subclasses(ObjectiveBase)
objectives_dict = {}
for objective in all_objectives:
if 'evalml.objectives' not in objective.__module__:
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, does this check mean that if a user defines their own objective in their own code, it won't be picked up by this? :O

Copy link
Contributor Author

@freddyaboulton freddyaboulton Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! That's the same behavior with get_components() .

if 'evalml.objectives' not in objective.__module__:
continue
objectives_dict[objective.name] = objective
objectives_dict[objective.name.lower()] = objective
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, rather than storing both here, why not just when we try to get_objective we check case-insensitively? That is, we convert the user input to all lowercase and check? That way we only have to store objective.name.lower() here?

return objectives_dict


def iterate_in_batches(sequence, batch_size):
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion but maybe define this in pretty_print_all_valid_objective_names if it's only used there? Or otherwise leave a docstr?

Copy link
Contributor Author

@freddyaboulton freddyaboulton Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just refactored this! Ended up creating _print_objectives_in_table and defining iterate_in_batches inside that function.


def get_objective(objective):

def pretty_print_all_valid_objective_names():
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also docstring for this! :D

return table.add_rows(iterate_in_batches(sorted(list(all_objectives_dict.keys())), 4)).draw()


def get_objective(objective, return_instance=False):
"""Returns the Objective object of the given objective name

Args:
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update "Args" --> "Arguments" :))

"""Returns the Objective object of the given objective name

Args:
objective (str): name of the objective
objective (str or ObjectiveBase): name or instance of the objective class.
return_instance (bool): Whether to return an instance of the objective. This only applies if objective
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, see comment about whether this is absolutely necessary 🤔 Maybe we can get away without this? That way we don't need to check and raise a TypeError...

Copy link
Contributor Author

@freddyaboulton freddyaboulton Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd get a bit hairy. For example, we'd have to change score in ClassificationPipeline to be something like:

        objectives = [get_objective(o) for o in objectives]
        objective_instances = []
        for obj in objectives:
            if isinstance(obj, type):
                objective_instances.append(obj())
            else:
                objective_instances.append(obj)
    
        y = self._encode_targets(y)
        y_predicted, y_predicted_proba = self._compute_predictions(X, objective_instances)
        return self._score_all_objectives(X, y, y_predicted, y_predicted_proba, objectives)

We'd have to do something similar in RegressionPipeline and anywhere else get_objective is called. I think it's cleaner to let get_objective handle whether or not to return an instance?

@@ -66,4 +73,7 @@ def get_objectives(problem_type):
List of Objectives
"""
problem_type = handle_problem_types(problem_type)
return [obj for obj in OPTIONS.values() if obj.problem_type == problem_type]
all_objectives_dict = _all_objectives_dict()
# To remove duplicates
Copy link
Contributor

@angela97lin angela97lin Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah again see comment about just storing lowercase version in _all_objectives_dict and then just lower-casing the user input and checking for equality in a case-insensitive way

if objective in self._objectives_not_allowed_in_automl:
raise ValueError(f"{objective.name} is not allowed in AutoML! " +
"Try one of the following objective names: \n" +
pretty_print_all_valid_objective_names())
Copy link
Member

@gsheni gsheni Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely helpful to have, but seems like a lot of information to throw at the user. Perhaps raise the error, and then print the lines of code to get the table? Wonder what other people think.

from evalml.objectives import pretty_print_all_valid_objective_names
pretty_print_all_valid_objective_names()
  • Also, do you think it might be better to return a DataFrame? I notice it's texttable. Just a thought.

@@ -28,6 +29,12 @@
from evalml.pipelines import BinaryClassificationPipeline
from evalml.problem_types import ProblemTypes

_not_allowed_in_automl = AutoMLSearch._objectives_not_allowed_in_automl

binary_objectives = [obj() for obj in get_objectives(ProblemTypes.BINARY) if obj not in _not_allowed_in_automl]
Copy link
Member

@gsheni gsheni Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are useful functions to have, should they go in a utils somewhere?

Copy link
Member

@gsheni gsheni Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed they are used in multiple test files...

Copy link
Contributor Author

@freddyaboulton freddyaboulton Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I just made these into test fixtures.

return table.add_rows(iterate_in_batches(sorted(list(all_objectives_dict.keys())), 4)).draw()


def get_objective(objective, return_instance=False):
Copy link
Member

@gsheni gsheni Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using kwargs would allow you to add support for passing arguments for objectives like

get_objective("cost benefit matrix", return_instance=True, true_positive....)

Copy link
Contributor Author

@freddyaboulton freddyaboulton Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed this up!

@freddyaboulton freddyaboulton force-pushed the 1078-human-readable-objective-names branch from f8bc150 to 3fb123e Compare Sep 2, 2020
@freddyaboulton freddyaboulton requested a review from angela97lin Sep 2, 2020
@freddyaboulton freddyaboulton force-pushed the 1078-human-readable-objective-names branch from 15f45c8 to c577975 Compare Sep 3, 2020
@freddyaboulton freddyaboulton force-pushed the 1078-human-readable-objective-names branch from 2848ba1 to e9b1581 Compare Sep 4, 2020
Copy link
Contributor

@angela97lin angela97lin left a comment

Looooks good! I left a few comments and suggestions but nothing blocking 😊

**Breaking Changes**
* `get_objective` will now return a class definition rather than an instance by default :pr:`1132`
* Deleted `OPTIONS` dictionary in `evalml.objectives.utils.py` :pr:`1132`
* If specifying an objective by string, the string must now match the objective's `name` field. Note that a lowercase case version of the name is also valid :pr:`1132`
Copy link
Contributor

@angela97lin angela97lin Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change "Note that a lowercase case version of the name is also valid" to "If specifying an objective by string, the string must now match the objective's name field, case-insensitive"?

if self.data_split is not None and not issubclass(self.data_split.__class__, BaseCrossValidator):
raise ValueError("Not a valid data splitter")
if self.problem_type != self.objective.problem_type:
raise ValueError("Given objective {} is not compatible with a {} problem.".format(self.objective.name, self.problem_type.value))
if additional_objectives is None:
additional_objectives = get_objectives(self.problem_type)
additional_objectives = [obj for obj in additional_objectives if obj not in self._objectives_not_allowed_in_automl]
Copy link
Contributor

@angela97lin angela97lin Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-pick: could combine these two to:

additional_objectives = [obj for obj in get_objectives(self.problem_type) if obj not in self._objectives_not_allowed_in_automl]

@@ -66,4 +93,7 @@ def get_objectives(problem_type):
List of Objectives
"""
problem_type = handle_problem_types(problem_type)
return [obj for obj in OPTIONS.values() if obj.problem_type == problem_type]
all_objectives_dict = _all_objectives_dict()
# To remove duplicates
Copy link
Contributor

@angela97lin angela97lin Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't necessary anymore right? :d

@@ -1070,3 +1076,7 @@ def test_max_batches_must_be_non_negative(max_batches):

with pytest.raises(ValueError, match="Parameter max batches must be None or non-negative. Received {max_batches}."):
AutoMLSearch(problem_type="binary", _max_batches=max_batches)


def test_can_print_out_automl_objective_names():
Copy link
Contributor

@angela97lin angela97lin Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test check for the str output? Or just verifies that it runs / doesn't crash?

Copy link
Contributor Author

@freddyaboulton freddyaboulton Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just verifies that it runs! I don't think checking the str output is worth it since it will change whenever we add an objective and this is just a "bonus" function to help users.

X, y = X_y_regression
pipeline = linear_regression_pipeline_class(parameters={}, random_state=np.random.RandomState(42))
pipeline.fit(X, y)
for objective in get_objectives(ProblemTypes.REGRESSION):
for objective in regression_objectives_allowed_in_automl:
permutation_importance = calculate_permutation_importance(pipeline, X, y, objective)
Copy link
Contributor

@angela97lin angela97lin Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: do these have to be objectives that are allowed in automl? Maybe since our prev get_objectives() method returned that and this is just a translation but otherwise would it be a good idea to test on all of our regression objectives? (Same w/ classification)

Copy link
Contributor Author

@freddyaboulton freddyaboulton Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea you're right that I'm only testing the objectives used in AutoML because that is what was being done before. I agree that testing all objectives would be useful but it could get hairy with some objectives such as CostBenefitMatrix. Worth exploring in the future though!

Copy link
Contributor

@angela97lin angela97lin Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, no need to do in this PR :D

def test_get_objective_return_instance_does_not_work_for_some_objectives():

with pytest.raises(TypeError, match="In get_objective, cannot pass in return_instance=True for Cost Benefit Matrix"):
get_objective("Cost Benefit Matrix", return_instance=True)
Copy link
Contributor

@angela97lin angela97lin Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious:

If we pass in an objective instance such as cost benefit matrix and set return_instance=True, it should still pass right? Is this worth testing (maybe unnecessary/over the top lolol)

Copy link
Contributor Author

@freddyaboulton freddyaboulton Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will pass because the first thing we do in get_objective is check if we have an instance of ObjectiveBase and if so return it but I will add that as a test case! Good suggestion.

@freddyaboulton freddyaboulton merged commit 221bceb into main Sep 4, 2020
This was referenced Sep 17, 2020
@freddyaboulton freddyaboulton deleted the 1078-human-readable-objective-names branch Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoMLSearch: update objective string to use human-readable format
3 participants