Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical issue with entropy regularization in generic_conditional_gradient #502

Closed
kachayev opened this issue Aug 16, 2023 · 6 comments · Fixed by #504
Closed

Numerical issue with entropy regularization in generic_conditional_gradient #502

kachayev opened this issue Aug 16, 2023 · 6 comments · Fixed by #504

Comments

@kachayev
Copy link
Collaborator

Describe the bug

ot.optim.generic_conditional_gradient uses the following cost function for line search (when reg2 is provided):

def cost(G):
    return nx.sum(M * G) + reg1 * f(G) + reg2 * nx.sum(G * nx.log(G))

When matrix G contains negative values, nx.log(G) returns nan with a RuntimeWarning.

Code sample

>>> from ot.da import SinkhornL1l2Transport
>>> from ot.datasets import make_data_classif
>>> ns, nt = 50, 50
>>> Xs, ys = make_data_classif('3gauss', ns)
>>> Xt, yt = make_data_classif('3gauss2', nt)
>>> otda = SinkhornL1l2Transport()
>>> otda.fit(Xs=Xs, ys=ys, Xt=Xt)
ot/backend.py:1082: RuntimeWarning: invalid value encountered in log
  return np.log(a)
<ot.da.SinkhornL1l2Transport object at 0x10327cb50>

This example is used for test_sinkhorn_l1l2_transport_class in test/test_da.py.

Expected behavior

The question now is negative values are expected to happen while performing line search? If yes, should we evaluate the cost to -inf in this case? Or is it okay for the cost to be nan (if this is the case, we should short-circuit the call to avoid warnings).

Environment (please complete the following information):

macOS-13.4-arm64-arm-64bit
Python 3.8.16 (default, Mar  1 2023, 21:18:45)
[Clang 14.0.6 ]
NumPy 1.24.3
SciPy 1.10.1
POT 0.9.1
@rflamary
Copy link
Collaborator

G should not contain negative value by construction (it is a scaled gaussian kernel when using sinkhorn) so this should not occur. Could you have a look at what is happening? Maybe we get steps in the FW that are larger that 1?

@kachayev
Copy link
Collaborator Author

Yeah, that's why I was confused with this. I will try to find why exactly this happens.

@kachayev
Copy link
Collaborator Author

kachayev commented Aug 17, 2023

It turns out that scalar_search_armijo suggests negative alpha in this callback. Which leads to initial function

f(xk0 + alpha10 * pk0)

to get value outside of the interpolation range (in our case, this "outside" happens to have negative values as it would be expected because matrix G has a lot of items pushed to almost 0).

This is somewhat unexpected behavior from scalar_search_armijo as it has default amin=0 parameter. Though, in fact, there's no check for the boundary here - it computes the minimizer of a quadratic interpolant, executes callback phi, and returns if condition is met without checking for amin (the check happens only later in the code).

One possible solution to this would be to check if alpha1 suggested by the search is outside of the min/max range on our side (e.g. by returning inf value from the phi callback). But I'm not sure that this is what supposed to happen in the algorithm in the first place: original line_search_armijo works with the projection onto search direction given by the gradient (i.e. derphi0 = np.dot(gfk, pk). In POT we have derphi0 = np.sum(pk * gfk), and I'm not sure that this value should guarantee that quadratic interpolation doesn't break the min/max condition. WDYT?

@rflamary
Copy link
Collaborator

Ouch this one is subtle. Since scalar_search_armijo does not work as expected (does not respect amin) i would add a quich threshold outsite of its call to force a positive or 0 step.

Are you saying that in the function sometime the gradient is not an ascent direction? This is possible since at each iteration we solve a sinkhorn that might not have converged, in this case setting a step of 0 (because numerical precision prevents us to get descent direction) seems reasonable.

@kachayev
Copy link
Collaborator Author

I can surely add additional check into the callback function. I still don't have an intuitive feeling for this to be the right solution, though. I will run more tests.

@kachayev
Copy link
Collaborator Author

Are you saying that in the function sometime the gradient is not an ascent direction?

So, this is from the debugger. I'm just evaluating phi (this one here) over a linspace between 0. and 0.99. In the first case, you can see that the cost goes down. While on the second loop it clearly goes up. Not sure why, Sinkhorn does converge in this case.

np.array([phi(x) for x in np.linspace(0., .99, 20)])
array([ 2.7194557 ,  2.29096204,  1.87171902,  1.46029814,  1.0558716 ,
        0.65791566,  0.26609441, -0.11979583, -0.4998549 , -0.87409216,
       -1.24242958, -1.60469608, -1.96061174, -2.30975788, -2.65152358,
       -2.98500709, -3.30881903, -3.62062802, -3.91580393, -4.17963043])

np.array([phi(x) for x in np.linspace(0., .99, 20)])
array([-4.17963043, -4.16778306, -4.14616826, -4.11810441, -4.08496656,
       -4.04752903, -4.00628706, -3.96157971, -3.91364772, -3.86266445,
       -3.80875372, -3.75200051, -3.69245742, -3.63014829, -3.56506962,
       -3.49718998, -3.42644711, -3.35274138, -3.27592209, -3.19574438])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants