Skip to content

Changes to LP solver: #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 30, 2017
Merged

Changes to LP solver: #25

merged 8 commits into from
Aug 30, 2017

Conversation

toto6
Copy link
Contributor

@toto6 toto6 commented Aug 29, 2017

  • Allow to modify the maximal number of iterations
  • Display an error message in the python console if the solver encountered an issue

toto6 added 3 commits August 29, 2017 15:38
 - Allow to modify the maximal number of iterations
 - Display an error message in the python console if the solver encountered an issue
Copy link
Collaborator

@rflamary rflamary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work, this will definitely make POT easier to use for large matrices.

But there are a few changes missing. Indeed since the PR #22 recent merge, OTDA is deprecated because we wanted to have classes more compatible with scikit-learn.

You should also add this parameter to the ```init_```` .

ot/da.py Outdated
@@ -658,7 +658,7 @@ def __init__(self, metric='sqeuclidean'):
self.metric = metric
self.computed = False

def fit(self, xs, xt, ws=None, wt=None, norm=None):
def fit(self, xs, xt, ws=None, wt=None, norm=None, numItermax=10000):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default value for numItermax should be 100000 because it allows out of the box to handle larger matrices. Same for emd and emd2.

@@ -15,7 +15,7 @@



def emd(a, b, M):
def emd(a, b, M, numItermax=10000):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default value to 100000


def emd2(a, b, M,processes=multiprocessing.cpu_count()):
def emd2(a, b, M, processes=multiprocessing.cpu_count(), numItermax=10000):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@Slasnista
Copy link
Contributor

Hi @aje, according to @rflamary remark about OTDA object, you can update the class EMDTransport() as follows using max_iter var name to stick to the terminology introduced in PR #22

class EMDTransport(BaseTransport):
    """Domain Adapatation OT method based on Earth Mover's Distance

    Parameters
    ----------
    mapping : string, optional (default="barycentric")
        The kind of mapping to apply to transport samples from a domain into
        another one.
        if "barycentric" only the samples used to estimate the coupling can
        be transported from a domain to another one.
    metric : string, optional (default="sqeuclidean")
        The ground metric for the Wasserstein problem
    distribution : string, optional (default="uniform")
        The kind of distribution estimation to employ
    verbose : int, optional (default=0)
        Controls the verbosity of the optimization algorithm
    log : int, optional (default=0)
        Controls the logs of the optimization algorithm
    limit_max: float, optional (default=10)
        Controls the semi supervised mode. Transport between labeled source
        and target samples of different classes will exhibit an infinite cost
        (10 times the maximum value of the cost matrix)
    max_iter : int, float, optional (default=10000)
        The minimum number of iteration before stopping the optimization
        algorithm if no it has not converged

    Attributes
    ----------
    coupling_ : array-like, shape (n_source_samples, n_target_samples)
        The optimal coupling

    References
    ----------
    .. [1] N. Courty; R. Flamary; D. Tuia; A. Rakotomamonjy,
           "Optimal Transport for Domain Adaptation," in IEEE Transactions
           on Pattern Analysis and Machine Intelligence , vol.PP, no.99, pp.1-1
    """

    def __init__(self, metric="sqeuclidean",
                 distribution_estimation=distribution_estimation_uniform,
                 out_of_sample_map='ferradans', limit_max=10, max_iter=10000):

        self.metric = metric
        self.limit_max = limit_max
        self.distribution_estimation = distribution_estimation
        self.out_of_sample_map = out_of_sample_map
        self.max_iter = max_iter

    def fit(self, Xs, ys=None, Xt=None, yt=None):
        """Build a coupling matrix from source and target sets of samples
        (Xs, ys) and (Xt, yt)

        Parameters
        ----------
        Xs : array-like, shape (n_source_samples, n_features)
            The training input samples.
        ys : array-like, shape (n_source_samples,)
            The class labels
        Xt : array-like, shape (n_target_samples, n_features)
            The training input samples.
        yt : array-like, shape (n_labeled_target_samples,)
            The class labels

        Returns
        -------
        self : object
            Returns self.
        """

        super(EMDTransport, self).fit(Xs, ys, Xt, yt)

        # coupling estimation
        self.coupling_ = emd(
            a=self.mu_s, b=self.mu_t, M=self.cost_, numItermax=self.max_iter
        )

        return self

@rflamary
Copy link
Collaborator

OK with @Slasnista , just use max_iter=100000 to allow for larger matrices out of the box.

I agree it is a better name but I want to keep consistency with the functional solvers for the moment. I think we should cleanup the function parameter names in 0.5 with deprecation warning anyway.

 - Rename numItermax to max_iter
 - Default value to 100000 instead of 10000
 - Add max_iter to class SinkhornTransport(BaseTransport)
 - Add norm to all BaseTransport
@rflamary
Copy link
Collaborator

Ok Sorry @aje if i wasn't clear but I want to keep numItermax for the emd and emd2 function as is used in the sinkhorn and sinkhorn2 for the moment. I'm Ok with max_iter in the classes though since we will switch to it in a next release.

rflamary
rflamary previously approved these changes Aug 30, 2017
@rflamary
Copy link
Collaborator

Test is failing due to small naming error. I think when this is corrected we can do a quick merge.

@rflamary rflamary dismissed their stale review August 30, 2017 08:21

too quick to valid

Copy link
Collaborator

@rflamary rflamary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @aje

Last comments about the normalization that is a nice touch but that should be handled slightly differently IMO.

ot/da.py Outdated
@@ -1182,6 +1183,26 @@ def inverse_transform(self, Xs=None, ys=None, Xt=None, yt=None,

return transp_Xt

def normalizeCost_(self, norm):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the normalization part but think this function should be independent from the class since it is of potential interest to any user.

I would define this function in utils.py with proper documentation. Then you can refer to this function in the doc of the classe

def cost_normalization(C,norm=None):

    if norm.lower() == "median":
        C/=float(np.median(C))
    elif norm.lower() == "max":
        C/=float(np.max(C))
    elif norm.lower() == "log":
        C=np.log(1+C)
    elif norm.lower() == "loglog":
        C=np.log(1+np.log(1+C))

    return C

ot/da.py Outdated
@@ -1001,6 +1001,7 @@ def fit(self, Xs=None, ys=None, Xt=None, yt=None):

# pairwise distance
self.cost_ = dist(Xs, Xt, metric=self.metric)
self.normalizeCost_(self.norm)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use an extern function as discussed below

ot/da.py Outdated
@@ -845,7 +825,7 @@ def fit(self, xs, ys, xt, reg=1, eta=1, ws=None, wt=None, norm=None,
self.wt = wt

self.M = dist(xs, xt, metric=self.metric)
self.normalizeM(norm)
self.M = cost_normalization(self.M, norm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @aje : can you handle the case where one does not want to normalise the cost matrix ?
  • should we do that with a if in the .fit or in the cost_normalization function ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you give it None is is by default the function will already return C.

Seems OK to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add the check

if self.norm is not None:

but this also seems ok to me

@Slasnista
Copy link
Contributor

ok, could you add the parameter norm in the __init__s of the different OTDA classes (except BaseTransport() ) and put None as default value ?

@Slasnista
Copy link
Contributor

or another default value if one normalization appears better that None ;)

@rflamary
Copy link
Collaborator

@aje I agree with @Slasnista , the norm parameter should be defined in the __init__ for the new classes. I didn't see that it was defined in fit.

@Slasnista Let's keep it at None for the moment since there is not rule aboiut the best one and it has sometimes weird effect with regularized OT.

Copy link
Collaborator

@rflamary rflamary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with these changes.

@rflamary
Copy link
Collaborator

@Slasnista could you do a second quick code review before we merge the PR?

@Slasnista
Copy link
Contributor

Ok from my side, everything seems good

@rflamary rflamary merged commit 1669704 into PythonOT:master Aug 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants