Feature/cluster loss #888

julia-shenshina · 2020-07-19T11:44:34Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contribution guide?
Did you check the code style? catalyst-make-codestyle && catalyst-check-codestyle (pip install -U catalyst-codestyle).
Did you make sure to update the docs? We use Google format for all the methods and classes.
Did you check the docs with make check-docs?
Did you write any new necessary tests?
Did you add your new functionality to the docs?
Did you update the CHANGELOG?
You can use 'Login as guest' to see Teamcity build logs.

Description

Related Issue

Type of Change

Examples / docs / tutorials / contributors update
Bug fix (non-breaking change which fixes an issue)
Improvement (non-breaking change which improves an existing feature)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Scitator · 2020-07-20T06:06:41Z

catalyst/contrib/nn/criterion/cluster.py

+        return self._batch_hard_cluster_loss(embeddings, targets)
+
+
+__all__ = ["ClusterLoss"]


could you please also add it to catalyst/contrib/nn/criterion/__init__.py?

and to our docs here, https://github.com/catalyst-team/catalyst/blob/master/docs/api/contrib.rst#criterion

Scitator · 2020-07-20T06:09:23Z

catalyst/contrib/nn/criterion/cluster.py

+        Returns:
+            torch.Tensor: cluster loss for the batch
+        """
+        return self._batch_hard_cluster_loss(embeddings, targets)


does it work with targets of shape [bs; 1]?

Scitator · 2020-07-20T06:12:14Z

catalyst/contrib/nn/criterion/cluster.py

+    """
+    Cluster loss
+    .. _Cluster Loss for Person Re-Identification
+    https://arxiv.org/pdf/1812.10325.pdf


is it possible to add minimal example for this loss?
in long term it's much easier to understand them

something like

embeddings = toch.sample targets = torch.sample criterion = ClusterLoss() loss = criterion(embeddings, targets)

it would be also nice to see some kind of tests for this criterion like
https://github.com/catalyst-team/catalyst/blob/master/catalyst/utils/metrics/tests/test_iou.py
to ensure it correctness :)

catalyst/data/sampler_inbatch.py

Scitator · 2020-07-28T18:01:16Z

catalyst/data/sampler_inbatch.py

+        raise NotImplementedError()
+
+    @abstractmethod
+    def sample(self, features: Tensor, labels: List[int]) -> TTriplets:


labels should be also Tensor, isn't it? with shape of [bs; ]?

just suggesting to add this to the docs :)

@julia-shenshina ping :)

catalyst/data/sampler_inbatch.py

…into feature/cluster_loss

Scitator · 2020-08-03T06:40:39Z

catalyst/data/tests/test_sampler_inbatch.py

+    distances = sampler._count_inter_class_distances(  # noqa: WPS437
+        mean_vectors
+    )
+    print(distances)


do we need print here?

Scitator · 2020-08-03T06:41:29Z

catalyst/data/sampler_inbatch.py

+        raise NotImplementedError()
+
+    @abstractmethod
+    def sample(self, features: Tensor, labels: List[int]) -> TTriplets:


@julia-shenshina ping :)

catalyst/data/sampler_inbatch.py

Scitator · 2020-08-05T17:31:33Z

catalyst/data/utils.py

+from torch import int as tint, long, short, Tensor
+
+
+def prepare_labels(labels: Union[Tensor, List[int]]) -> List[int]:


could we rename it to process_labels? (we have some internal notation guide :) )

catalyst/data/sampler_inbatch.py

…into feature/cluster_loss

AlekseySh

Looks greate!

Please,

update LossWithTripletsSampling:
- update typing of init
- use process_labels instead of prepare_labels, because now we have duplicated code
add test which we have discussed in slack: set(hard_tri()._sample()) == set(hard_clusters.sample())

AlekseySh · 2020-08-06T01:05:48Z

catalyst/data/utils.py

+from torch import int as tint, long, short, Tensor
+
+
+def process_labels(labels: Union[Tensor, List[int]]) -> List[int]:


process_integers sounds more general

as far as this is function on top of labels name like somethin_labels sounds great
btw, could we name it to convert_labels2list? because, it looks like it's exactly what this function is doing :)

AlekseySh · 2020-08-06T12:18:06Z

catalyst/data/sampler_inbatch.py

+            labels: labels of the batch, list or tensor of size (batch_size,)
+
+        Returns:
+            triplet of (mean_vector, positive, negative_mean_vector)


Please, add info that size of output is p

AlekseySh · 2020-08-06T12:20:51Z

catalyst/data/sampler_inbatch.py

+        This method selects the hardest positive and negative example for
+        each label in the batch. It counts mean vectors for all the labels
+        and choose the hardest positive sample from the batch and the hardest
+        mean vector for it.


In comparison with docs from the top of the current class this one looks confused / ambiguous.
I offer to left only 1st exploration from the top of the class, it is really great.

This sampler selects hardest triplets based on distance to mean vectors: anchor is a mean vector of features of i-th class in the batch, the hardest positive sample is the most distant from anchor sample of anchor's class, the hardest negative sample is the closest mean vector of another classes. The batch must contain k samples for p classes in it (k > 1, p > 1).

cc

You can put this 1st peace of documentation here (in sample method) or in the top of the class.

AlekseySh · 2020-08-06T12:44:43Z

catalyst/data/sampler_inbatch.py

@@ -10,15 +10,56 @@
 import torch
 from torch import Tensor

+from catalyst.contrib.nn.criterion.functional import euclidean_distance


you can also use torch.pdist()

Pull request has been modified.

juliashenshina added 3 commits July 19, 2020 13:42

Add cluster loss

d5c9b9a

Fix docs

71c72e9

Add __all__ list

051bab6

julia-shenshina requested review from bagxi and Scitator as code owners July 19, 2020 11:44

juliashenshina added 2 commits July 19, 2020 15:23

Fix strange negative diagonal

de26119

Fix style

8ec5ffd

Scitator reviewed Jul 20, 2020

View reviewed changes

juliashenshina added 2 commits July 26, 2020 19:58

Add inbatch sampler for clusters

28bc4b3

Fix docs in tests

558dc14

Scitator requested a review from AlekseySh July 27, 2020 06:53

Add IInbatchTripletSampler

e17852d

Scitator reviewed Jul 29, 2020

View reviewed changes

juliashenshina added 3 commits August 2, 2020 23:30

Merge branch 'master' of https://github.com/julia-shenshina/catalyst …

5fc8201

…into feature/cluster_loss

Add sampler to mvp mnist ml tests

1067959

Move cluster sampler tests to sampler_inbatch tests

9d50b54

Scitator reviewed Aug 3, 2020

View reviewed changes

Fix sample typing, allow tensor

af9e765

Scitator reviewed Aug 5, 2020

View reviewed changes

juliashenshina added 5 commits August 5, 2020 20:45

Change naming

f120327

Merge branch 'master' of https://github.com/julia-shenshina/catalyst …

24e71f1

…into feature/cluster_loss

Fixs docs

86e6165

Minor docs fix

68df5a0

Add HardClusterSampler to CHANGELOG

117187e

AlekseySh previously requested changes Aug 6, 2020

View reviewed changes

AlekseySh reviewed Aug 6, 2020

View reviewed changes

juliashenshina added 3 commits August 6, 2020 22:40

Fix typing in TripletMarginLossWithSampling

4decd6b

Fix naming and docs

e922967

Add edge case for triplet and cluster samplers

7daade0

Fix bug in test sampler inbatch

34192ad

Scitator merged commit e2d089d into catalyst-team:master Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/cluster loss #888

Feature/cluster loss #888

julia-shenshina commented Jul 19, 2020 •

edited

Scitator Jul 20, 2020

Scitator Jul 20, 2020

Scitator Jul 20, 2020

Scitator Jul 20, 2020

Scitator Jul 20, 2020

Scitator Jul 28, 2020

Scitator Jul 28, 2020

Scitator Aug 3, 2020

Scitator Aug 3, 2020

Scitator Aug 3, 2020

Scitator Aug 5, 2020

AlekseySh left a comment •

edited

AlekseySh Aug 6, 2020

Scitator Aug 6, 2020

AlekseySh Aug 6, 2020

AlekseySh Aug 6, 2020

AlekseySh Aug 6, 2020

AlekseySh Aug 6, 2020

		return self._batch_hard_cluster_loss(embeddings, targets)


		__all__ = ["ClusterLoss"]

		from torch import int as tint, long, short, Tensor


		def prepare_labels(labels: Union[Tensor, List[int]]) -> List[int]:

		from torch import int as tint, long, short, Tensor


		def process_labels(labels: Union[Tensor, List[int]]) -> List[int]:

Feature/cluster loss #888

Feature/cluster loss #888

Conversation

julia-shenshina commented Jul 19, 2020 • edited

Before submitting

Description

Related Issue

Type of Change

PR review

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlekseySh left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julia-shenshina commented Jul 19, 2020 •

edited

AlekseySh left a comment •

edited