Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: use scipy.special.xlogy for entr and kl_div #53

Merged
merged 1 commit into from
Feb 21, 2014

Conversation

argriffing
Copy link

This should fix the edge cases of the domain of entr and kl_div so that it allows zero in some places that were not allowed before this PR, and so that it improves numerical stability near zero. It still needs unit tests and it needs to return the correct values (-inf or inf or whatever) outside the domain but I hope these could be added by someone else after this PR is merged.

@SteveDiamond
Copy link
Collaborator

This is great! I was just looking into this. Unfortunately getting the domain edge cases to work for variable arguments is much more difficult.

SteveDiamond pushed a commit that referenced this pull request Feb 21, 2014
MAINT: use scipy.special.xlogy for entr and kl_div
@SteveDiamond SteveDiamond merged commit 6441ee4 into cvxpy:master Feb 21, 2014
@argriffing
Copy link
Author

By the way, I'm curious why kl_div is defined like x log(x/y) - x + y instead of the simplerx log(x/y)? I'm sure there is some reason that I don't know. Both are convex, and both have the same sum over vectors defining finite distributions (the kl-divergence between the distributions). Maybe the reason is that it is positive on its domain except when x==y where it is zero? But sign_from_args is listed as unknown and http://dcp.stanford.edu/functions shows it as unknown also, and this sign property is not documented in the cvx user guide... I don't get it.

@argriffing
Copy link
Author

Here's a scipy PR that would fix more edge cases (the merged cvxpy PR fixes pretty much only the x=0 case).

@SteveDiamond
Copy link
Collaborator

Honestly, I'm not sure why it's defined as x log(x/y) - x + y instead of x log(x/y). I copied it from CVX (http://web.cvxr.com/cvx/doc/funcref.html). Your explanation seems as good as any.

I hadn't noticed that it was non-negative. I'll have to update dcp.stanford.edu.

Thanks for submitting the scipy pull request.

@caglarari
Copy link

I think kl divergence (x log x/y -x +y ) is more general than relative entropy (x log x/y).where the vectors x,y do not need to sum to 1.

@argriffing
Copy link
Author

These functions (entr, kl_div, and rel_entr) are now merged into scipy.special of the scipy development branch https://github.com/scipy/scipy/blob/master/scipy/special/basic.py#L1411 and should become available in scipy 0.15. They should be defined exactly the same as the similarly named cvx functions including corner cases (zeros, infinities, etc.).

Edit: actually according to my notes in the scipy pull request, some corner cases still do not follow cvx conventions, maybe some knownfailure tests should be added...

@argriffing
Copy link
Author

Do you guys have any opinion about how out-of-domain inputs should be handled? By out-of-domain I mean violating the domain restrictions in http://dcp.stanford.edu/functions. A long time ago I browsed a couple Boyd lectures on youtube and my fading residual memory is that the functions should be extended to preserve interesting properties like convexity so that for example entr(-1) should be -inf, but I don't have a nice reference. Right now entr(-1) is implemented in scipy as -xlogy(-1, -1) which happens to return nan.

@SteveDiamond
Copy link
Collaborator

The general idea is convex functions have value +inf outside of their
domain and concave functions have value -inf. That way they are
convex/concave over all of R^n.

Here's how entr, kl_div, and rel_entr work:

entr(x) = 0 for x=0 and -inf for x < 0
kl_div(x,y) = 0 for x=y=0 and +inf if x < 0 or y < 0
rel_entr is the same as kl_div

On Wed, Sep 3, 2014 at 11:03 PM, argriffing notifications@github.com
wrote:

Do you guys have any opinion about how out-of-domain inputs should be
handled? By out-of-domain I mean violating the domain restrictions in
http://dcp.stanford.edu/functions. A long time ago I browsed a couple
Boyd lectures on youtube and my fading residual memory is that the
functions should be extended to preserve interesting properties like
convexity so that for example entr(-1) should be -inf, but I don't have a
nice reference. Right now entr(-1) is implemented in scipy as -xlogy(-1,
-1) which happens to return nan.


Reply to this email directly or view it on GitHub
https://github.com/cvxgrp/cvxpy/pull/53#issuecomment-54366786.

@argriffing
Copy link
Author

@SteveDiamond yes that's what I expected as in https://github.com/cvxgrp/cvxpy/issues/131 but to be super-picky it leaves out some cases. For example what about kl_div(x=0, y>0)? I guess this should be 0? What about kl_div(x>0, y=0)? It should be +inf? I'd like to get these exactly correct before they are frozen for backwards compatibility.

@SteveDiamond
Copy link
Collaborator

Ah, I see. Here are the function definitions in CVX. I would go with these.

             { -X.*LOG(X) if X > 0,

entr(X) = { 0 if X == 0,
{ -Inf otherwise.

                  { X.*LOG(X./Y)-X+Y if X >  0 & Y >  0,

kl_div(X,Y) = { 0 if X == 0 & Y >= 0,
{ +Inf otherwise.

                     { X.*LOG(X./Y) if X >  0 & Y >  0

rel_entr(X,Y) = { 0 if X == 0 & Y >= 0,
{ +Inf otherwise.

On Thu, Sep 4, 2014 at 7:20 PM, argriffing notifications@github.com wrote:

@SteveDiamond https://github.com/SteveDiamond yes that's what I
expected as in #131 https://github.com/cvxgrp/cvxpy/issues/131 but to
be super-picky it leaves out some cases. For example what about kl_div(x=0,
y>0)? I guess this should be 0? What about kl_div(x>0, y=0)? It should be
+inf? I'd like to get these exactly correct before they are frozen for
backwards compatibility.


Reply to this email directly or view it on GitHub
https://github.com/cvxgrp/cvxpy/pull/53#issuecomment-54512832.

@argriffing
Copy link
Author

@SteveDiamond sorry to bother you but what about (+inf, +inf)? It would be nan if I evaluate it naively using the definition above, but maybe this is OK? Or should it also be +inf? Is it the same for both kl_div and rel_entr?

@argriffing
Copy link
Author

Besides (+inf, +inf) there are also easier cases like kl_div(+inf, 0<y<+inf) which I guess should be +inf using a straightforward 1d limit but which also naively evaluates to nan using the formula above.

@argriffing
Copy link
Author

Ugh, sorry to multi-post but the kl_div limit x=0, y>0 seems like it should return y instead of 0, so my old scipy tests for these functions are failing. Is it really 0 for this case?

@SteveDiamond
Copy link
Collaborator

You're right, kl_div(0, y>0) should be y. The CVX description had a typo
apparently.

As for the +/-inf arguments, extended value functions have domain R^n and
range R U {+inf, -inf}. So they aren't defined on the arguments +inf or
-inf. That suggests returning NaN, though I don't really like that choice.
I'll get back to you soon with my suggestions for these cases.

On Wed, Sep 10, 2014 at 6:08 AM, argriffing notifications@github.com
wrote:

Ugh, sorry to multi-post but the kl_div limit x=0, y>0 seems like it
should return y instead of 0, so my old scipy tests for these functions are
failing. Is it really 0 for this case?


Reply to this email directly or view it on GitHub
https://github.com/cvxgrp/cvxpy/pull/53#issuecomment-55068868.

@argriffing
Copy link
Author

@SteveDiamond thanks, I'm trying another scipy PR scipy/scipy#3981 to add these as ufuncs with the edge cases handled carefully.

@SteveDiamond
Copy link
Collaborator

I asked Stephen Boyd what the value of the functions should be when one or more of the arguments are +/-infty. He wasn't sure. So whatever makes the most sense to you is fine. In particular the definition in scipy/scipy#3981 looks good.

@argriffing
Copy link
Author

@SteveDiamond thanks!

@argriffing
Copy link
Author

In particular the definition in scipy/scipy#3981 looks good.

That PR is now merged, including entr, kl_div, rel_entr, huber, and pseudo_huber. The first three are intended to be exactly as in cvx. huber uses the Wikipedia notation, with an extra degree of freedom and I think it's off by a factor of 2 from cvx. pseudo_huber is a smoother variant of huber which as far as I know is not used in cvxpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants