New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cluster loss #888
Feature/cluster loss #888
Conversation
return self._batch_hard_cluster_loss(embeddings, targets) | ||
|
||
|
||
__all__ = ["ClusterLoss"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please also add it to catalyst/contrib/nn/criterion/__init__.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and to our docs here, https://github.com/catalyst-team/catalyst/blob/master/docs/api/contrib.rst#criterion
Returns: | ||
torch.Tensor: cluster loss for the batch | ||
""" | ||
return self._batch_hard_cluster_loss(embeddings, targets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it work with targets
of shape [bs; 1]
?
""" | ||
Cluster loss | ||
.. _Cluster Loss for Person Re-Identification | ||
https://arxiv.org/pdf/1812.10325.pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to add minimal example for this loss?
in long term it's much easier to understand them
something like
embeddings = toch.sample
targets = torch.sample
criterion = ClusterLoss()
loss = criterion(embeddings, targets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be also nice to see some kind of tests for this criterion like
https://github.com/catalyst-team/catalyst/blob/master/catalyst/utils/metrics/tests/test_iou.py
to ensure it correctness :)
catalyst/data/sampler_inbatch.py
Outdated
raise NotImplementedError() | ||
|
||
@abstractmethod | ||
def sample(self, features: Tensor, labels: List[int]) -> TTriplets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labels
should be also Tensor, isn't it? with shape of [bs; ]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just suggesting to add this to the docs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julia-shenshina ping :)
distances = sampler._count_inter_class_distances( # noqa: WPS437 | ||
mean_vectors | ||
) | ||
print(distances) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need print here?
catalyst/data/sampler_inbatch.py
Outdated
raise NotImplementedError() | ||
|
||
@abstractmethod | ||
def sample(self, features: Tensor, labels: List[int]) -> TTriplets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julia-shenshina ping :)
catalyst/data/utils.py
Outdated
from torch import int as tint, long, short, Tensor | ||
|
||
|
||
def prepare_labels(labels: Union[Tensor, List[int]]) -> List[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we rename it to process_labels
? (we have some internal notation guide :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks greate!
Please,
-
update
LossWithTripletsSampling
:- update typing of
init
- use
process_labels
instead ofprepare_labels
, because now we have duplicated code
- update typing of
-
add test which we have discussed in slack:
set(hard_tri()._sample()) == set(hard_clusters.sample())
catalyst/data/utils.py
Outdated
from torch import int as tint, long, short, Tensor | ||
|
||
|
||
def process_labels(labels: Union[Tensor, List[int]]) -> List[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
process_integers
sounds more general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as this is function on top of labels
name like somethin_labels
sounds great
btw, could we name it to convert_labels2list
? because, it looks like it's exactly what this function is doing :)
catalyst/data/sampler_inbatch.py
Outdated
labels: labels of the batch, list or tensor of size (batch_size,) | ||
|
||
Returns: | ||
triplet of (mean_vector, positive, negative_mean_vector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add info that size of output is p
catalyst/data/sampler_inbatch.py
Outdated
This method selects the hardest positive and negative example for | ||
each label in the batch. It counts mean vectors for all the labels | ||
and choose the hardest positive sample from the batch and the hardest | ||
mean vector for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In comparison with docs from the top of the current class this one looks confused / ambiguous.
I offer to left only 1st exploration from the top of the class, it is really great.
This sampler selects hardest triplets based on distance to mean vectors:
anchor is a mean vector of features of i-th class in the batch,
the hardest positive sample is the most distant from anchor sample of
anchor's class, the hardest negative sample is the closest mean vector
of another classes.
The batch must contain k samples for p classes in it (k > 1, p > 1).
cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can put this 1st peace of documentation here (in sample method) or in the top of the class.
catalyst/data/sampler_inbatch.py
Outdated
@@ -10,15 +10,56 @@ | |||
import torch | |||
from torch import Tensor | |||
|
|||
from catalyst.contrib.nn.criterion.functional import euclidean_distance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also use torch.pdist()
Before submitting
catalyst-make-codestyle && catalyst-check-codestyle
(pip install -U catalyst-codestyle
).make check-docs
?Description
Related Issue
Type of Change
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.