-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RMSE, MSLE, RMSLE #788
Conversation
Codecov Report
@@ Coverage Diff @@
## master #788 +/- ##
=======================================
Coverage 99.51% 99.52%
=======================================
Files 150 150
Lines 5781 5843 +62
=======================================
+ Hits 5753 5815 +62
Misses 28 28
Continue to review full report at Codecov.
|
@@ -252,6 +252,36 @@ def objective_function(self, y_true, y_predicted, X=None): | |||
return metrics.matthews_corrcoef(y_true, y_predicted) | |||
|
|||
|
|||
class RMSE(RegressionObjective): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy our convention has been to spell out the class names in CamelCase format. Let's stick with that convention in this PR. So this one would be RootMeanSquaredError
, and same for the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah and same for the name
field. Check out what we did for some of the other classes whose names are more than one word
|
||
|
||
class RMSLE(RegressionObjective): | ||
"""Root mean squared log error for regression. Only valid for nonnegative inputs""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice docstring!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion: perhaps we can add "Otherwise, will throw an ."
score_needs_proba = False | ||
|
||
def objective_function(self, y_true, y_predicted, X=None): | ||
return metrics.mean_squared_error(y_true, y_predicted, squared=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, looks right. I see the following in the docs for mean_squared_error
's squared
parameter:
If True returns MSE value, if False returns RMSE value.
|
||
assert root_obj.score(s1p, s1a) == pytest.approx(np.sqrt(0.5624673249102612)) | ||
assert root_obj.score(s2p, s2a) == pytest.approx(0) | ||
assert root_obj.score(s3p, s3a) == pytest.approx(np.sqrt(0.617267976207983)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these tests you added!
One note: perhaps you could update the values to be s1p = np.pow(np.e, np.array([0, 0, 0, 1, 1, 1, 2, 2, 2]))
. That way, it should be easy to reason about what the expected values should be, just like in your other test test_mse_linear_model
.
evalml/objectives/utils.py
Outdated
@@ -29,6 +29,7 @@ | |||
"log_loss_multi": standard_metrics.LogLossMulticlass(), | |||
"mcc_binary": standard_metrics.MCCBinary(), | |||
"mcc_multi": standard_metrics.MCCMulticlass(), | |||
"rmse": standard_metrics.RMSE(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To recap, by not including MSLE and RMSLE in this global objective OPTIONS
list, we're achieving the following:
- MSLE/RMSLE won't be included in the default list of
additional_objectives
for automl regression searches. This is necessary because until we have Automl: if any provided objective errors, all computed scores are nan #786 fixed, doing so could make all objective scores be set tonan
if the target/predictions contain negative values. - MSLE/RMSLE can't be specified in an automl search by string name. Instances of these objectives can still be passed into the automl search directly.
This feels like the right thing to do at the moment. Down the road, it would be nice if we had a way to distinguish between the string recognition capability and what's the default set of additional_objectives
in automl, but there's no need to solve that right now IMO. We hit a remarkably similar issue with @eccabay's work on removing recall from automl search for #476 .
@ctduffy : we should add unit test coverage for both of the cases I mentioned. I think we already have coverage of the first case. As for the second, @eccabay just did this for recall in #784 so I suggest you talk to her about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I left some really minor points but I think once you address what Dylan commented, this'll be good to go :D
s2_actual = np.power(np.e, np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])) | ||
|
||
s3_predicted = np.power(np.e, np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])) | ||
s3_actual = np.power(np.e, np.array([2, 2, 2, 0, 0, 0, 1, 1, 1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy , after reading this I'm now realizing I was incorrect about the power
helping here. Sorry about that! I had hoped that adding the power here would mean that the expected values below would be integers or rational fractions, but I forgot the squared term inside the sum in MSLE means we can't do that. My bad. I think it would be simpler to just delete the power
here.
|
||
ar = AutoRegressionSearch(additional_objectives=[RootMeanSquaredLogError(), MeanSquaredLogError()]) | ||
assert ar.additional_objectives[0].name == 'Root Mean Squared Log Error' | ||
assert ar.additional_objectives[1].name == 'Mean Squared Log Error' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! One comment: please move this to test_auto_regression_search.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctduffy I left two more comments but other than that LGTM, nice work!!
Closes #728, adds these metrics as potential objectives. Does not add MSLE or RMSLE to the options dictionary (so that it is found when calling
get_objectives
for a linear regression, because they throw a ValueError on negative y values, which might be a common occurrence in linear regressions. In the future, these can be added to the options dictionary, but #786 should be remedied first.