Allowing get_objective to be able to get any objective. #1132

freddyaboulton · 2020-08-31T16:33:07Z

Pull Request Description

Fixes #1078 . This issue has two requirements:

Modify get_objective so that it can return any objective defined in EvalML
Modify get_objective to use the objective's name field as opposed to snake case names.

In the implementation, we decided to use _get_subclasses and define a getter for all objectives called _all_objectives_dict instead of maintaining a static OPTIONS dictionary. This is very similar to the pattern we have for components.

Since we want to get_objective to return any valid objective, we can no longer return a class instance by default (because not all objectives have default values for all parameters), so I added a return_instance flag to get_objective.

Sample Usage

Getting any objective

from evalml.objectives import get_objective

# returns the class
get_objective("Log Loss Binary")
get_objective("log loss binary")
get_objective("cost benefit matrix")

# returns an instance
get_objective("log loss binary", return_instance=True)

# errors out when name doesn't match
get_objective("log_loss")

# errors out because cost benefit matrix doesn't have default arguments
get_objective("cost benefit matrix", return_instance=True)

Getting all objectives

from evalml.objectives.utils import _all_objectives_dict

# return a dictionary from name to class
_all_objectives_dict()

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-08-31T16:37:03Z

Codecov Report

Merging #1132 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1132   +/-   ##
=======================================
  Coverage   99.91%   99.91%           
=======================================
  Files         195      195           
  Lines       11372    11460   +88     
=======================================
+ Hits        11362    11450   +88     
  Misses         10       10

Impacted Files	Coverage Δ
evalml/automl/automl_search.py	`99.58% <100.00%> (+0.01%)`	⬆️
evalml/model_understanding/graphs.py	`100.00% <100.00%> (ø)`
evalml/objectives/__init__.py	`100.00% <100.00%> (ø)`
evalml/objectives/utils.py	`100.00% <100.00%> (ø)`
evalml/pipelines/binary_classification_pipeline.py	`100.00% <100.00%> (ø)`
evalml/pipelines/classification_pipeline.py	`100.00% <100.00%> (ø)`
evalml/pipelines/regression_pipeline.py	`100.00% <100.00%> (ø)`
evalml/tests/automl_tests/test_automl.py	`100.00% <100.00%> (ø)`
.../automl_tests/test_automl_search_classification.py	`100.00% <100.00%> (ø)`
...ests/automl_tests/test_automl_search_regression.py	`100.00% <100.00%> (ø)`
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41b711a...a39c069. Read the comment docs.

freddyaboulton · 2020-08-31T20:12:41Z

evalml/automl/automl_search.py

    @property
    def data_check_results(self):
        return self._data_check_results

    def __str__(self):
        def _print_list(obj_list):
-            lines = ['\t{}'.format(o.name) for o in obj_list]
+            lines = sorted(['\t{}'.format(o.name) for o in obj_list])


Had to add the sorted to get test_automl_str_no_param_search to pass. I think this has to do with how OPTIONS being deleted.

freddyaboulton · 2020-08-31T20:31:10Z

evalml/automl/automl_search.py

@@ -147,19 +161,22 @@ def __init__(self,
        self.optimize_thresholds = optimize_thresholds
        if objective == 'auto':
            objective = self._DEFAULT_OBJECTIVES[self.problem_type.value]
-        self.objective = get_objective(objective)
+        objective = get_objective(objective, return_instance=False)
+        self.objective = self._validate_objective(objective)


In #1078, we agreed to have AutoML check whether or not the objective can be used in auto ml search so we added _validate_objective.

freddyaboulton · 2020-08-31T20:33:05Z

evalml/automl/automl_search.py

+            if objective in self._objectives_not_allowed_in_automl:
+                raise ValueError(f"{objective.name} is not allowed in AutoML! " +
+                                 "Try one of the following objective names: \n" +
+                                 pretty_print_all_valid_objective_names())


Woah neat! A few things:

why is it that some are returned as capitalized and others are not?

I think the header for the first row is a little confusing (AUC, AUC Micro, etc.) as it makes that row seem more important

Might be a little confusing since we're saying "Objective must be one of:" and printing out some objectives which we can't pass as strings (ex: cost benefit matrix, fraud cost, etc.)

It is definitely helpful to have, but seems like a lot of information to throw at the user. Perhaps raise the error, and then print the lines of code to get the table? Wonder what other people think.

from evalml.objectives import pretty_print_all_valid_objective_names pretty_print_all_valid_objective_names()

Also, do you think it might be better to return a DataFrame? I notice it's texttable. Just a thought.

freddyaboulton · 2020-08-31T20:33:48Z

evalml/objectives/utils.py

@@ -66,4 +73,7 @@ def get_objectives(problem_type):
        List of Objectives
    """
    problem_type = handle_problem_types(problem_type)
-    return [obj for obj in OPTIONS.values() if obj.problem_type == problem_type]
+    all_objectives_dict = _all_objectives_dict()
+    # To remove duplicates


This is because we include the lowercase and upper case names in _all_objectives_dict

Ah again see comment about just storing lowercase version in _all_objectives_dict and then just lower-casing the user input and checking for equality in a case-insensitive way

freddyaboulton · 2020-08-31T20:34:50Z

evalml/objectives/utils.py

+        if 'evalml.objectives' not in objective.__module__:
+            continue
+        objectives_dict[objective.name] = objective
+        objectives_dict[objective.name.lower()] = objective


In #1078, we mentioned that it'd be nice to include the lowercase name. I personally think we shouldn't bother with the all uppercase name.

Hm, rather than storing both here, why not just when we try to get_objective we check case-insensitively? That is, we convert the user input to all lowercase and check? That way we only have to store objective.name.lower() here?

Good idea! Just pushed this up

freddyaboulton · 2020-08-31T20:36:04Z

evalml/tests/objective_tests/test_standard_metrics.py


 EPS = 1e-5
-
-all_objectives = _get_subclasses(ObjectiveBase)
+all_automl_objectives = _all_objectives_dict()


Have to define this dictionary because Options was deleted.

angela97lin

Left a few comments, particularly about not storing both upper/lower case versions in _all_objectives_dict and just checking in get_objective!

Also, I'm curious, design wise: is it necessary to return an instance? I wonder how much more value that adds given that if get_objective returns a class, it's easy enough to just instantiate an instance of that class. The best we can do is provide the default parameters anyways. This had come up in this related issue: #580

angela97lin · 2020-08-31T21:35:32Z

docs/source/release_notes.rst

+    **Breaking Changes**
+        * `get_objective` will now return a class definition rather than an instance by default :pr:`1132`
+        * Deleted `OPTIONS` dictionary in `evalml.objectives.utils.py` :pr:`1132`
+        * If specifying an objective by string, the string must now match the objective's `name` field. Note that a lowercase case is also valid :pr:`1132`


"a lowercase version" (of the name) maybe?

angela97lin · 2020-08-31T21:36:11Z

docs/source/release_notes.rst

+        * `get_objective` will now return a class definition rather than an instance by default :pr:`1132`
+        * Deleted `OPTIONS` dictionary in `evalml.objectives.utils.py` :pr:`1132`
+        * If specifying an objective by string, the string must now match the objective's `name` field. Note that a lowercase case is also valid :pr:`1132`
+        * Passing "Cost Benefit Matrix", "Fraud Cost", "Lead Scoring", "Mean Squared Log Error",


👍 Makes much more sense!

angela97lin · 2020-08-31T21:42:07Z

evalml/automl/automl_search.py

+            if objective in self._objectives_not_allowed_in_automl:
+                raise ValueError(f"{objective.name} is not allowed in AutoML! " +
+                                 "Try one of the following objective names: \n" +
+                                 pretty_print_all_valid_objective_names())


Woah neat! A few things:

why is it that some are returned as capitalized and others are not?

I think the header for the first row is a little confusing (AUC, AUC Micro, etc.) as it makes that row seem more important

Might be a little confusing since we're saying "Objective must be one of:" and printing out some objectives which we can't pass as strings (ex: cost benefit matrix, fraud cost, etc.)

angela97lin · 2020-08-31T21:43:51Z

evalml/objectives/utils.py

+    all_objectives = _get_subclasses(ObjectiveBase)
+    objectives_dict = {}
+    for objective in all_objectives:
+        if 'evalml.objectives' not in objective.__module__:


Just wondering, does this check mean that if a user defines their own objective in their own code, it won't be picked up by this? :O

Yes! That's the same behavior with get_components() .

angela97lin · 2020-08-31T21:45:12Z

evalml/objectives/utils.py

+        if 'evalml.objectives' not in objective.__module__:
+            continue
+        objectives_dict[objective.name] = objective
+        objectives_dict[objective.name.lower()] = objective


Hm, rather than storing both here, why not just when we try to get_objective we check case-insensitively? That is, we convert the user input to all lowercase and check? That way we only have to store objective.name.lower() here?

angela97lin · 2020-08-31T21:45:56Z

evalml/objectives/utils.py

+    return objectives_dict
+
+
+def iterate_in_batches(sequence, batch_size):


Just a suggestion but maybe define this in pretty_print_all_valid_objective_names if it's only used there? Or otherwise leave a docstr?

Just refactored this! Ended up creating _print_objectives_in_table and defining iterate_in_batches inside that function.

angela97lin · 2020-08-31T21:46:05Z

evalml/objectives/utils.py

-
-def get_objective(objective):
+
+def pretty_print_all_valid_objective_names():


Also docstring for this! :D

angela97lin · 2020-08-31T21:46:27Z

evalml/objectives/utils.py

+    return table.add_rows(iterate_in_batches(sorted(list(all_objectives_dict.keys())), 4)).draw()
+
+
+def get_objective(objective, return_instance=False):
    """Returns the Objective object of the given objective name

    Args:


Can you update "Args" --> "Arguments" :))

angela97lin · 2020-08-31T21:47:11Z

evalml/objectives/utils.py

    """Returns the Objective object of the given objective name

    Args:
-        objective (str): name of the objective
+        objective (str or ObjectiveBase): name or instance of the objective class.
+        return_instance (bool): Whether to return an instance of the objective. This only applies if objective


Hm, see comment about whether this is absolutely necessary 🤔 Maybe we can get away without this? That way we don't need to check and raise a TypeError...

I think it'd get a bit hairy. For example, we'd have to change score in ClassificationPipeline to be something like:

objectives = [get_objective(o) for o in objectives] objective_instances = [] for obj in objectives: if isinstance(obj, type): objective_instances.append(obj()) else: objective_instances.append(obj) y = self._encode_targets(y) y_predicted, y_predicted_proba = self._compute_predictions(X, objective_instances) return self._score_all_objectives(X, y, y_predicted, y_predicted_proba, objectives)

We'd have to do something similar in RegressionPipeline and anywhere else get_objective is called. I think it's cleaner to let get_objective handle whether or not to return an instance?

angela97lin · 2020-08-31T21:48:05Z

evalml/objectives/utils.py

@@ -66,4 +73,7 @@ def get_objectives(problem_type):
        List of Objectives
    """
    problem_type = handle_problem_types(problem_type)
-    return [obj for obj in OPTIONS.values() if obj.problem_type == problem_type]
+    all_objectives_dict = _all_objectives_dict()
+    # To remove duplicates


Ah again see comment about just storing lowercase version in _all_objectives_dict and then just lower-casing the user input and checking for equality in a case-insensitive way

gsheni · 2020-08-31T23:07:57Z

evalml/automl/automl_search.py

+            if objective in self._objectives_not_allowed_in_automl:
+                raise ValueError(f"{objective.name} is not allowed in AutoML! " +
+                                 "Try one of the following objective names: \n" +
+                                 pretty_print_all_valid_objective_names())


It is definitely helpful to have, but seems like a lot of information to throw at the user. Perhaps raise the error, and then print the lines of code to get the table? Wonder what other people think.

from evalml.objectives import pretty_print_all_valid_objective_names pretty_print_all_valid_objective_names()

Also, do you think it might be better to return a DataFrame? I notice it's texttable. Just a thought.

gsheni · 2020-08-31T23:12:26Z

evalml/tests/model_understanding_tests/test_graphs.py

@@ -28,6 +29,12 @@
 from evalml.pipelines import BinaryClassificationPipeline
 from evalml.problem_types import ProblemTypes

+_not_allowed_in_automl = AutoMLSearch._objectives_not_allowed_in_automl
+
+binary_objectives = [obj() for obj in get_objectives(ProblemTypes.BINARY) if obj not in _not_allowed_in_automl]


These are useful functions to have, should they go in a utils somewhere?

I noticed they are used in multiple test files...

Good idea, I just made these into test fixtures.

gsheni · 2020-08-31T23:16:00Z

evalml/objectives/utils.py

+    return table.add_rows(iterate_in_batches(sorted(list(all_objectives_dict.keys())), 4)).draw()
+
+
+def get_objective(objective, return_instance=False):


I wonder if using kwargs would allow you to add support for passing arguments for objectives like

get_objective("cost benefit matrix", return_instance=True, true_positive....)

Just pushed this up!

… but instance can't be created.

…e is passed in.

… functions better.

…owed in automl.

angela97lin

Looooks good! I left a few comments and suggestions but nothing blocking 😊

angela97lin · 2020-09-04T14:55:35Z

docs/source/release_notes.rst

+    **Breaking Changes**
+        * `get_objective` will now return a class definition rather than an instance by default :pr:`1132`
+        * Deleted `OPTIONS` dictionary in `evalml.objectives.utils.py` :pr:`1132`
+        * If specifying an objective by string, the string must now match the objective's `name` field. Note that a lowercase case version of the name is also valid :pr:`1132`


Maybe change "Note that a lowercase case version of the name is also valid" to "If specifying an objective by string, the string must now match the objective's name field, case-insensitive"?

angela97lin · 2020-09-04T14:57:27Z

evalml/automl/automl_search.py

        if self.data_split is not None and not issubclass(self.data_split.__class__, BaseCrossValidator):
            raise ValueError("Not a valid data splitter")
        if self.problem_type != self.objective.problem_type:
            raise ValueError("Given objective {} is not compatible with a {} problem.".format(self.objective.name, self.problem_type.value))
        if additional_objectives is None:
            additional_objectives = get_objectives(self.problem_type)
+            additional_objectives = [obj for obj in additional_objectives if obj not in self._objectives_not_allowed_in_automl]


Nit-pick: could combine these two to:

additional_objectives = [obj for obj in get_objectives(self.problem_type) if obj not in self._objectives_not_allowed_in_automl]

angela97lin · 2020-09-04T15:04:22Z

evalml/objectives/utils.py

@@ -66,4 +93,7 @@ def get_objectives(problem_type):
        List of Objectives
    """
    problem_type = handle_problem_types(problem_type)
-    return [obj for obj in OPTIONS.values() if obj.problem_type == problem_type]
+    all_objectives_dict = _all_objectives_dict()
+    # To remove duplicates


This comment isn't necessary anymore right? :d

angela97lin · 2020-09-04T15:05:47Z

evalml/tests/automl_tests/test_automl.py

@@ -1070,3 +1076,7 @@ def test_max_batches_must_be_non_negative(max_batches):

    with pytest.raises(ValueError, match="Parameter max batches must be None or non-negative. Received {max_batches}."):
        AutoMLSearch(problem_type="binary", _max_batches=max_batches)
+
+
+def test_can_print_out_automl_objective_names():


Does this test check for the str output? Or just verifies that it runs / doesn't crash?

Just verifies that it runs! I don't think checking the str output is worth it since it will change whenever we add an objective and this is just a "bonus" function to help users.

angela97lin · 2020-09-04T15:09:10Z

evalml/tests/model_understanding_tests/test_graphs.py

    X, y = X_y_regression
    pipeline = linear_regression_pipeline_class(parameters={}, random_state=np.random.RandomState(42))
    pipeline.fit(X, y)
-    for objective in get_objectives(ProblemTypes.REGRESSION):
+    for objective in regression_objectives_allowed_in_automl:
        permutation_importance = calculate_permutation_importance(pipeline, X, y, objective)


Just wondering: do these have to be objectives that are allowed in automl? Maybe since our prev get_objectives() method returned that and this is just a translation but otherwise would it be a good idea to test on all of our regression objectives? (Same w/ classification)

Yea you're right that I'm only testing the objectives used in AutoML because that is what was being done before. I agree that testing all objectives would be useful but it could get hairy with some objectives such as CostBenefitMatrix. Worth exploring in the future though!

Agreed, no need to do in this PR :D

angela97lin · 2020-09-04T15:11:40Z

evalml/tests/objective_tests/test_objectives.py

+def test_get_objective_return_instance_does_not_work_for_some_objectives():
+
+    with pytest.raises(TypeError, match="In get_objective, cannot pass in return_instance=True for Cost Benefit Matrix"):
+        get_objective("Cost Benefit Matrix", return_instance=True)


Curious:

If we pass in an objective instance such as cost benefit matrix and set return_instance=True, it should still pass right? Is this worth testing (maybe unnecessary/over the top lolol)

Yes it will pass because the first thing we do in get_objective is check if we have an instance of ObjectiveBase and if so return it but I will add that as a test case! Good suggestion.

freddyaboulton commented Aug 31, 2020

View reviewed changes

freddyaboulton marked this pull request as ready for review August 31, 2020 20:36

freddyaboulton self-assigned this Aug 31, 2020

freddyaboulton requested review from gsheni, angela97lin, jeremyliweishih, eccabay and bchen1116 August 31, 2020 20:36

angela97lin suggested changes Aug 31, 2020

View reviewed changes

gsheni reviewed Aug 31, 2020

View reviewed changes

freddyaboulton force-pushed the 1078-human-readable-objective-names branch from f8bc150 to 3fb123e Compare September 2, 2020 17:59

freddyaboulton requested a review from angela97lin September 2, 2020 19:58

freddyaboulton force-pushed the 1078-human-readable-objective-names branch from 15f45c8 to c577975 Compare September 3, 2020 18:13

freddyaboulton added 10 commits September 4, 2020 10:10

Allowing get_objective to be able to get any objective.

1f39257

Updating docs to use new objective names.

de78cfa

Adding realse notes for PR 1132.

01ad770

Fixing coverage in test_objectives.py by using new all objectives lists.

9be02f1

Fixing import order in test_objectives.

07a22aa

Editing docstrings, adding helpful error when return_instance is True…

b4461de

… but instance can't be created.

Adding more breaking changes to PR 1132 in release_notes.

17abf61

Adding test to check that get_objectives throws a type error when Non…

abdc1f0

…e is passed in.

Only storing lowercase names in _all_objectives_dict. Making printing…

c89eab5

… functions better.

Adding fixtures for binary, multiclass, and regression objectives all…

c3897bc

…owed in automl.

freddyaboulton added 2 commits September 4, 2020 10:12

Adding coverage for print_all_objective_names.

384448f

Updating unit tests for test graphs to use lowercase objective names.

e9b1581

freddyaboulton force-pushed the 1078-human-readable-objective-names branch from 2848ba1 to e9b1581 Compare September 4, 2020 14:30

Fixing get_objective call in graph_binary_objective

46de4a4

angela97lin approved these changes Sep 4, 2020

View reviewed changes

Making misc edits - tidying comments and release notes.

a39c069

freddyaboulton merged commit 221bceb into main Sep 4, 2020

This was referenced Sep 17, 2020

Release v0.14.0 #1191

Closed

Release v0.13.2 #1192

Merged

freddyaboulton deleted the 1078-human-readable-objective-names branch October 22, 2020 18:28

		return objectives_dict


		def iterate_in_batches(sequence, batch_size):


		def get_objective(objective):

		def pretty_print_all_valid_objective_names():

		return table.add_rows(iterate_in_batches(sorted(list(all_objectives_dict.keys())), 4)).draw()


		def get_objective(objective, return_instance=False):

Allowing get_objective to be able to get any objective. #1132

Allowing get_objective to be able to get any objective. #1132

Conversation

freddyaboulton commented Aug 31, 2020 • edited Loading

Pull Request Description

Sample Usage

Getting any objective

Getting all objectives

codecov bot commented Aug 31, 2020 • edited Loading

Codecov Report

freddyaboulton Aug 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton commented Aug 31, 2020 •

edited

Loading

codecov bot commented Aug 31, 2020 •

edited

Loading

freddyaboulton Aug 31, 2020 •

edited

Loading