Changes to LP solver: #25

toto6 · 2017-08-29T13:39:03Z

Allow to modify the maximal number of iterations
Display an error message in the python console if the solver encountered an issue

- Allow to modify the maximal number of iterations - Display an error message in the python console if the solver encountered an issue

rflamary

Thank you for your work, this will definitely make POT easier to use for large matrices.

But there are a few changes missing. Indeed since the PR #22 recent merge, OTDA is deprecated because we wanted to have classes more compatible with scikit-learn.

You should also add this parameter to the ```init_```` .

rflamary · 2017-08-30T06:42:24Z

ot/da.py

@@ -658,7 +658,7 @@ def __init__(self, metric='sqeuclidean'):
        self.metric = metric
        self.computed = False

-    def fit(self, xs, xt, ws=None, wt=None, norm=None):
+    def fit(self, xs, xt, ws=None, wt=None, norm=None, numItermax=10000):


the default value for numItermax should be 100000 because it allows out of the box to handle larger matrices. Same for emd and emd2.

rflamary · 2017-08-30T06:43:21Z

ot/lp/__init__.py

@@ -15,7 +15,7 @@



-def emd(a, b, M):
+def emd(a, b, M, numItermax=10000):


default value to 100000

rflamary · 2017-08-30T06:43:30Z

ot/lp/__init__.py


-def emd2(a, b, M,processes=multiprocessing.cpu_count()):
+def emd2(a, b, M, processes=multiprocessing.cpu_count(), numItermax=10000):


Slasnista · 2017-08-30T07:33:36Z

Hi @aje, according to @rflamary remark about OTDA object, you can update the class EMDTransport() as follows using max_iter var name to stick to the terminology introduced in PR #22

class EMDTransport(BaseTransport):
    """Domain Adapatation OT method based on Earth Mover's Distance

    Parameters
    ----------
    mapping : string, optional (default="barycentric")
        The kind of mapping to apply to transport samples from a domain into
        another one.
        if "barycentric" only the samples used to estimate the coupling can
        be transported from a domain to another one.
    metric : string, optional (default="sqeuclidean")
        The ground metric for the Wasserstein problem
    distribution : string, optional (default="uniform")
        The kind of distribution estimation to employ
    verbose : int, optional (default=0)
        Controls the verbosity of the optimization algorithm
    log : int, optional (default=0)
        Controls the logs of the optimization algorithm
    limit_max: float, optional (default=10)
        Controls the semi supervised mode. Transport between labeled source
        and target samples of different classes will exhibit an infinite cost
        (10 times the maximum value of the cost matrix)
    max_iter : int, float, optional (default=10000)
        The minimum number of iteration before stopping the optimization
        algorithm if no it has not converged

    Attributes
    ----------
    coupling_ : array-like, shape (n_source_samples, n_target_samples)
        The optimal coupling

    References
    ----------
    .. [1] N. Courty; R. Flamary; D. Tuia; A. Rakotomamonjy,
           "Optimal Transport for Domain Adaptation," in IEEE Transactions
           on Pattern Analysis and Machine Intelligence , vol.PP, no.99, pp.1-1
    """

    def __init__(self, metric="sqeuclidean",
                 distribution_estimation=distribution_estimation_uniform,
                 out_of_sample_map='ferradans', limit_max=10, max_iter=10000):

        self.metric = metric
        self.limit_max = limit_max
        self.distribution_estimation = distribution_estimation
        self.out_of_sample_map = out_of_sample_map
        self.max_iter = max_iter

    def fit(self, Xs, ys=None, Xt=None, yt=None):
        """Build a coupling matrix from source and target sets of samples
        (Xs, ys) and (Xt, yt)

        Parameters
        ----------
        Xs : array-like, shape (n_source_samples, n_features)
            The training input samples.
        ys : array-like, shape (n_source_samples,)
            The class labels
        Xt : array-like, shape (n_target_samples, n_features)
            The training input samples.
        yt : array-like, shape (n_labeled_target_samples,)
            The class labels

        Returns
        -------
        self : object
            Returns self.
        """

        super(EMDTransport, self).fit(Xs, ys, Xt, yt)

        # coupling estimation
        self.coupling_ = emd(
            a=self.mu_s, b=self.mu_t, M=self.cost_, numItermax=self.max_iter
        )

        return self

rflamary · 2017-08-30T07:38:01Z

OK with @Slasnista , just use max_iter=100000 to allow for larger matrices out of the box.

I agree it is a better name but I want to keep consistency with the functional solvers for the moment. I think we should cleanup the function parameter names in 0.5 with deprecation warning anyway.

- Rename numItermax to max_iter - Default value to 100000 instead of 10000 - Add max_iter to class SinkhornTransport(BaseTransport) - Add norm to all BaseTransport

rflamary · 2017-08-30T08:01:18Z

Ok Sorry @aje if i wasn't clear but I want to keep numItermax for the emd and emd2 function as is used in the sinkhorn and sinkhorn2 for the moment. I'm Ok with max_iter in the classes though since we will switch to it in a next release.

rflamary · 2017-08-30T08:21:24Z

Test is failing due to small naming error. I think when this is corrected we can do a quick merge.

too quick to valid

rflamary

Thank you @aje

Last comments about the normalization that is a nice touch but that should be handled slightly differently IMO.

rflamary · 2017-08-30T08:31:15Z

ot/da.py

@@ -1182,6 +1183,26 @@ def inverse_transform(self, Xs=None, ys=None, Xt=None, yt=None,

            return transp_Xt

+    def normalizeCost_(self, norm):


I like the normalization part but think this function should be independent from the class since it is of potential interest to any user.

I would define this function in utils.py with proper documentation. Then you can refer to this function in the doc of the classe

def cost_normalization(C,norm=None): if norm.lower() == "median": C/=float(np.median(C)) elif norm.lower() == "max": C/=float(np.max(C)) elif norm.lower() == "log": C=np.log(1+C) elif norm.lower() == "loglog": C=np.log(1+np.log(1+C)) return C

rflamary · 2017-08-30T08:31:57Z

ot/da.py

@@ -1001,6 +1001,7 @@ def fit(self, Xs=None, ys=None, Xt=None, yt=None):

            # pairwise distance
            self.cost_ = dist(Xs, Xt, metric=self.metric)
+            self.normalizeCost_(self.norm)


use an extern function as discussed below

Slasnista · 2017-08-30T09:05:42Z

ot/da.py

@@ -845,7 +825,7 @@ def fit(self, xs, ys, xt, reg=1, eta=1, ws=None, wt=None, norm=None,
        self.wt = wt

        self.M = dist(xs, xt, metric=self.metric)
-        self.normalizeM(norm)
+        self.M = cost_normalization(self.M, norm)


@aje : can you handle the case where one does not want to normalise the cost matrix ?

should we do that with a if in the .fit or in the cost_normalization function ?

If you give it None is is by default the function will already return C.

Seems OK to me.

I could add the check

if self.norm is not None:

but this also seems ok to me

Slasnista · 2017-08-30T09:11:04Z

ok, could you add the parameter norm in the __init__s of the different OTDA classes (except BaseTransport() ) and put None as default value ?

Slasnista · 2017-08-30T09:12:13Z

or another default value if one normalization appears better that None ;)

rflamary · 2017-08-30T09:13:43Z

@aje I agree with @Slasnista , the norm parameter should be defined in the __init__ for the new classes. I didn't see that it was defined in fit.

@Slasnista Let's keep it at None for the moment since there is not rule aboiut the best one and it has sometimes weird effect with regularized OT.

rflamary

I'm OK with these changes.

rflamary · 2017-08-30T09:44:17Z

@Slasnista could you do a second quick code review before we merge the PR?

Slasnista · 2017-08-30T13:35:56Z

Ok from my side, everything seems good

toto6 added 3 commits August 29, 2017 15:38

Changes to LP solver:

3cecc18

- Allow to modify the maximal number of iterations - Display an error message in the python console if the solver encountered an issue

Fix param order

99c0a24

Type print

308ce24

rflamary requested changes Aug 30, 2017

View reviewed changes

rflamary added bug enhancement labels Aug 30, 2017

Changes:

982f36c

- Rename numItermax to max_iter - Default value to 100000 instead of 10000 - Add max_iter to class SinkhornTransport(BaseTransport) - Add norm to all BaseTransport

Rename for emd and emd2

feef989

rflamary previously approved these changes Aug 30, 2017

View reviewed changes

Fix name error

5bbea9c

rflamary reviewed Aug 30, 2017

View reviewed changes

Move normalize function in utils.py

0316d55

Slasnista reviewed Aug 30, 2017

View reviewed changes

Move norm out of fit to init for deprecated OTDA

fadaf2a

rflamary approved these changes Aug 30, 2017

View reviewed changes

rflamary merged commit 1669704 into PythonOT:master Aug 30, 2017

		@@ -15,7 +15,7 @@



		def emd(a, b, M):
		def emd(a, b, M, numItermax=10000):


		def emd2(a, b, M,processes=multiprocessing.cpu_count()):
		def emd2(a, b, M, processes=multiprocessing.cpu_count(), numItermax=10000):

		@@ -1182,6 +1183,26 @@ def inverse_transform(self, Xs=None, ys=None, Xt=None, yt=None,

		return transp_Xt

		def normalizeCost_(self, norm):

Changes to LP solver: #25

Changes to LP solver: #25

Uh oh!

Conversation

toto6 commented Aug 29, 2017

Uh oh!

rflamary left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Slasnista commented Aug 30, 2017

Uh oh!

rflamary commented Aug 30, 2017

Uh oh!

rflamary commented Aug 30, 2017

Uh oh!

rflamary commented Aug 30, 2017

Uh oh!

rflamary left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Slasnista commented Aug 30, 2017

Uh oh!

Slasnista commented Aug 30, 2017

Uh oh!

rflamary commented Aug 30, 2017

Uh oh!

rflamary left a comment

Choose a reason for hiding this comment

Uh oh!

rflamary commented Aug 30, 2017

Uh oh!

Slasnista commented Aug 30, 2017

Uh oh!

Uh oh!