Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of r2_score function #1896

Merged
merged 14 commits into from Dec 10, 2016
Merged

Implementation of r2_score function #1896

merged 14 commits into from Dec 10, 2016

Conversation

mottodora
Copy link
Member

I implemented the r2_score function that calculates R^2(coefficient of determination) regression score function.
As of now, the code is compatible with scikit-learn, only when the "sample_weight" argument is set to "none".
Is this argument necessary for chainer? If necessary, I will implement the changes.
Additionally, I would like to know whether the type of this argument is Variable or ndarray?

@delta2323
Copy link
Member

Please fix violation of the coding guideline by referencing travis result.

@delta2323 delta2323 self-assigned this Nov 18, 2016
Copy link
Member

@delta2323 delta2323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review.

)

type_check.expect(
pred_type.ndim >= true_type.ndim,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need this condition because pred_type.shape == true_type.shape implies the equality of ndim's

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this line.

if self.multioutput == 'uniform_average':
return xp.asarray((1 - SS_res / SS_tot).mean(), dtype=pred.dtype),
elif self.multioutput == 'raw_values':
return xp.asarray((1 - SS_res / SS_tot), dtype=pred.dtype),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scikit learn seems to set r2 value to 0 when SS_tot is 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this problem in 73d3f3b. Please check it.

"""Computes R^2(coefficient of determination) regression score function.

Args:
pred(Variable): Variable holding a vector or matrix of estimated \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need backslashes in breaking the description of an argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted backslash.


"""
return R2_score(sample_weight=sample_weight, multioutput=multioutput)\
(pred, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can break the line in the middle of parentheses as follows:

return R2_score(sample_weight=sample_weight,
                multioutput=multioutput)(pred, true)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this line.

Copy link
Member

@delta2323 delta2323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second review comments



def r2_score(pred, true, sample_weight=None, multioutput='uniform_average'):
"""Computes R^2(coefficient of determination) regression score function.

Args:
pred(Variable): Variable holding a vector or matrix of estimated \
pred(Variable): Variable holding a vector or matrix of estimated
target values.
true(Variable): Variable holding a vector or matrix of correct target \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove backslash here, too.

return xp.asarray((1 - SS_res / SS_tot).mean(),
dtype=pred.dtype),
elif self.multioutput == 'raw_values':
if xp.any(SS_tot == 0):
Copy link
Member

@delta2323 delta2323 Nov 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the reason for this branching is to avoid zero division by SS_tot, we cannot avoid it because SS_res / SS_tot is evaluated anyway in case SS_tot == 0. Is there other reason for this branching? If not, I recommend to remove the branching. By removing it, we can expect faster forward computation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scikit learn returns 0 when SS_tot is 0. If not brancing, this function will return NaN.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought something like this:

ret = xp.where(SS_tot != 0, 1 - SS_res / SS_tot, 0.0).astype(pred.dtype)
if self.multioutput == 'uniform_average':
    return ret.mean()
else:
    return ret

Does it make sense to you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm going to implement like this.

.astype(pred.dtype),
else:
return xp.asarray((1 - SS_res / SS_tot), dtype=pred.dtype),

Copy link
Member

@delta2323 delta2323 Nov 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge uniform_average case and raw_values case as follows

ret = xp.asarray((1 - SS_res / SS_tot), dtype=pred.dtype)
if self.multioutput == 'uniform_average'
    return ret.mean()
else:
    return ret


Args:
pred(Variable): Variable holding a vector or matrix of estimated
target values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pred can be not only a vector or a matrix but a tensor of any dimension.

Args:
pred(Variable): Variable holding a vector or matrix of estimated
target values.
true(Variable): Variable holding a vector or matrix of correct target \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true can be not only a vector or a matrix but a tensor of any dimension.

target values.
true(Variable): Variable holding a vector or matrix of correct target \
values.
sample_weight: None.
Copy link
Member

@delta2323 delta2323 Nov 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about writing the description in more detail like this:

This argument is for compatibility with scikit-learn's implementation of r2_score. Current implementation admits None only.

values.
sample_weight: None.
multioutput(string): ['uniform_average', 'raw_values']. if
'uniform_average', this function return an average of R^2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns

'multioutput' is 'uniform_average' or a vector of R^2
scores if 'multioutput' is 'raw_values'.

.. note:: This function is non-differentiable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As forward computation of this function consists of differentiable operations, we can back propagate an error if we implement backward method appropriately. Do you think it is beneficial to implement backward?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there is no difference between mean squared error and r_squared as loss function.

expected = r2_score(self.x, self.t, sample_weight=None,
multioutput=self.multioutput)
testing.assert_allclose(
expected, cuda.to_cpu(y.data), **self.check_forward_options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need not transfer y.data to CPU because testing.assert_allclose does it.

@delta2323 delta2323 added the cat:feature Implementation that introduces new interfaces. label Nov 28, 2016
@delta2323
Copy link
Member

Could you merge the latest master branch?

@delta2323
Copy link
Member

LGTM

@delta2323 delta2323 merged commit fc688be into chainer:master Dec 10, 2016
@delta2323
Copy link
Member

Thank you for your contribution!

@delta2323 delta2323 added this to the v1.19.0 milestone Dec 10, 2016
@mottodora mottodora deleted the r_squared branch December 10, 2016 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:feature Implementation that introduces new interfaces.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants