Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cluster loss #888

Merged
merged 21 commits into from Aug 7, 2020

Conversation

julia-shenshina
Copy link
Contributor

@julia-shenshina julia-shenshina commented Jul 19, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contribution guide?
  • Did you check the code style? catalyst-make-codestyle && catalyst-check-codestyle (pip install -U catalyst-codestyle).
  • Did you make sure to update the docs? We use Google format for all the methods and classes.
  • Did you check the docs with make check-docs?
  • Did you write any new necessary tests?
  • Did you add your new functionality to the docs?
  • Did you update the CHANGELOG?
  • You can use 'Login as guest' to see Teamcity build logs.

Description

Related Issue

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

return self._batch_hard_cluster_loss(embeddings, targets)


__all__ = ["ClusterLoss"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please also add it to catalyst/contrib/nn/criterion/__init__.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns:
torch.Tensor: cluster loss for the batch
"""
return self._batch_hard_cluster_loss(embeddings, targets)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work with targets of shape [bs; 1]?

"""
Cluster loss
.. _Cluster Loss for Person Re-Identification
https://arxiv.org/pdf/1812.10325.pdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to add minimal example for this loss?
in long term it's much easier to understand them

something like

embeddings = toch.sample
targets = torch.sample
criterion = ClusterLoss()
loss = criterion(embeddings, targets)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be also nice to see some kind of tests for this criterion like
https://github.com/catalyst-team/catalyst/blob/master/catalyst/utils/metrics/tests/test_iou.py
to ensure it correctness :)

@Scitator Scitator requested a review from AlekseySh July 27, 2020 06:53
catalyst/data/sampler_inbatch.py Show resolved Hide resolved
raise NotImplementedError()

@abstractmethod
def sample(self, features: Tensor, labels: List[int]) -> TTriplets:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labels should be also Tensor, isn't it? with shape of [bs; ]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just suggesting to add this to the docs :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalyst/data/sampler_inbatch.py Show resolved Hide resolved
catalyst/data/sampler_inbatch.py Show resolved Hide resolved
distances = sampler._count_inter_class_distances( # noqa: WPS437
mean_vectors
)
print(distances)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need print here?

raise NotImplementedError()

@abstractmethod
def sample(self, features: Tensor, labels: List[int]) -> TTriplets:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalyst/data/sampler_inbatch.py Show resolved Hide resolved
from torch import int as tint, long, short, Tensor


def prepare_labels(labels: Union[Tensor, List[int]]) -> List[int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rename it to process_labels? (we have some internal notation guide :) )

catalyst/data/sampler_inbatch.py Show resolved Hide resolved
Copy link
Contributor

@AlekseySh AlekseySh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks greate!

Please,

  1. update LossWithTripletsSampling:

    • update typing of init
    • use process_labels instead of prepare_labels, because now we have duplicated code
  2. add test which we have discussed in slack: set(hard_tri()._sample()) == set(hard_clusters.sample())

from torch import int as tint, long, short, Tensor


def process_labels(labels: Union[Tensor, List[int]]) -> List[int]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_integers sounds more general

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as this is function on top of labels name like somethin_labels sounds great
btw, could we name it to convert_labels2list? because, it looks like it's exactly what this function is doing :)

labels: labels of the batch, list or tensor of size (batch_size,)

Returns:
triplet of (mean_vector, positive, negative_mean_vector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add info that size of output is p

Comment on lines 361 to 364
This method selects the hardest positive and negative example for
each label in the batch. It counts mean vectors for all the labels
and choose the hardest positive sample from the batch and the hardest
mean vector for it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In comparison with docs from the top of the current class this one looks confused / ambiguous.
I offer to left only 1st exploration from the top of the class, it is really great.

    This sampler selects hardest triplets based on distance to mean vectors:
    anchor is a mean vector of features of i-th class in the batch,
    the hardest positive sample is the most distant from anchor sample of
    anchor's class, the hardest negative sample is the closest mean vector
    of another classes.
    The batch must contain k samples for p classes in it (k > 1, p > 1).

cc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put this 1st peace of documentation here (in sample method) or in the top of the class.

@@ -10,15 +10,56 @@
import torch
from torch import Tensor

from catalyst.contrib.nn.criterion.functional import euclidean_distance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also use torch.pdist()

@mergify mergify bot dismissed AlekseySh’s stale review August 6, 2020 19:43

Pull request has been modified.

@Scitator Scitator merged commit e2d089d into catalyst-team:master Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants