Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use NTXentLoss as in CPC? #179

Closed
vgaraujov opened this issue Aug 15, 2020 · 31 comments
Closed

How to use NTXentLoss as in CPC? #179

vgaraujov opened this issue Aug 15, 2020 · 31 comments
Labels
Frequently Asked Questions Frequently Asked Questions question A general question about the library

Comments

@vgaraujov
Copy link

Hello! Thanks for this incredible contribution.

I want to know how to use the NTXentLoss as in CPC model. I mean, I have a positive sample and N-1 negative samples.

Thank you for your help in this matter.

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Aug 15, 2020

If you have just a single positive pair in your batch:

from pytorch_metric_learning.losses import NTXentLoss
loss_func = NTXentLoss()

# in your training loop
batch_size = data.size(0)
embeddings = your_model(data)
labels = torch.arange(batch_size)
# The assumption here is that data[0] and data[1] are the positive pair
# And there are no other positive pairs in the batch
labels[1] = labels[0]
loss = loss_func(embeddings, labels)
loss.backward()

If your batch size is N, and you have N/2 positive pairs:

from pytorch_metric_learning.losses import NTXentLoss
loss_func = NTXentLoss()

# in your training loop
batch_size = data.size(0)
embeddings = your_model(data)
# The assumption here is that data[0] and data[1] are a positive pair
# data[2] and data[3] are the next positive pair, and so on
labels = torch.arange(batch_size)
labels[1::2] = labels[0::2]
loss = loss_func(embeddings, labels)
loss.backward()

Basically you need to create labels such that positive pairs share the same label.

@KevinMusgrave KevinMusgrave added Frequently Asked Questions Frequently Asked Questions question A general question about the library labels Aug 15, 2020
@vgaraujov
Copy link
Author

Thank you for your answer @KevinMusgrave
To be sure how it works...

Regarding this assumption, data[0] and data[1] are positive pairs, so the rest (data[2:]) will be used as negatives samples?

If you have just a single positive pair in your batch:

from pytorch_metric_learning.losses import NTXentLoss
loss_func = NTXentLoss()

# in your training loop
batch_size = data.size(0)
embeddings = your_model(data)
labels = torch.arange(batch_size)
# The assumption here is that data[0] and data[1] are the positive pair
# And there are no other positive pairs in the batch
labels[1] = labels[0]
loss = loss_func(embeddings, labels)
loss.backward()

Does it mean that all examples that are not positive samples in the batch automatically used as negative samples?

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Aug 18, 2020

Yes, data[2:] will be used as negative samples, because their labels are different from data[0] and data[1]. And data[0] and data[1] are the only positive pair because there are no other labels that occur more than once.

Regarding this assumption, data[0] and data[1] are positive pairs, so the rest (data[2:]) will be used as negatives samples?

To be clear, data[0] is not a pair by itself. It forms a positive pair with data[1]. Similarly, data[0] forms a negative pair with data[2], data[3]...data[N].

@vgaraujov
Copy link
Author

Thank you so much @KevinMusgrave !

@YounkHo
Copy link

YounkHo commented Apr 19, 2021

I'm still confused that if I have a batch of randomly sampled image and their corresponding label. How can I use NTXentLoss?

from pytorch_metric_learning.losses import NTXentLoss
loss_func = NTXentLoss()

# in your training loop
batch_size = data.size(0)
embeddings = your_model(data)
labels = torch.arange(batch_size)
# The assumption here is that data[0] and data[1] are the positive pair
# And there are no other positive pairs in the batch
labels[1] = labels[0]
loss = loss_func(embeddings, labels)
loss.backward()

From code provided, why can we regard data[0] and data[1] as positive sample while other pairs are negative?

@KevinMusgrave
Copy link
Owner

That assumption was in response to the original question. If you have labels, then you can ignore the above discussion and just do:

loss = loss_func(embeddings, labels)

@YounkHo
Copy link

YounkHo commented Apr 20, 2021

Then how can NTXentLoss distinguish positive sample or negative? Images are labeled with their own labels.

@KevinMusgrave
Copy link
Owner

Images with the same label form positive pairs, and images with different labels form negative pairs.

For example, if the labels in a batch are [0, 0, 1, 1, 1] then:

  • the positive pairs will be formed by indices [0, 1], [1, 0], [2, 3], [2, 4], [3, 2], [3, 4], [4, 2], [4, 3]
  • the negative pairs will be formed by indices [0, 2], [0, 3], [0, 4], [1, 2], [1, 3], [1, 4], [2, 0], [2, 1], [3, 0], [3, 1], [4, 0], [4, 1]

@YounkHo
Copy link

YounkHo commented Apr 20, 2021

What if there is no same label pairs as batch are sampled randomly?

@KevinMusgrave
Copy link
Owner

Then NTXentLoss will return 0, because it requires positive pairs to compute an actual loss.

@KevinMusgrave
Copy link
Owner

You can try using MPerClassSampler to ensure there are positive pairs in every batch.

@YounkHo
Copy link

YounkHo commented Apr 22, 2021

Another problem is that when I use MPerClassSampler in my own project, I found that all training data are not shuffled(all labels in one batch are the same). However, shuffle is not allowed when using a sampler.

sampler = samplers.MPerClassSampler(data["label_names"], m=4, length_before_new_iter=len(data["image_names"]))
data_loader_params = dict(sampler = sampler, batch_size = self.batch_size, num_workers = 12, pin_memory = True)
data_loader = torch.utils.data.DataLoader(dataset, **data_loader_params)

Is there anything wrong with my usage?

@KevinMusgrave
Copy link
Owner

What is your batch size? Can you print the batch labels and paste them here?

@YounkHo
Copy link

YounkHo commented Apr 22, 2021

Batch Size in data loader is set to 128 and labels pass to MPerClassSampler is as follows:

["n01532829","n01558993","n01704323","n01749939","n01770081","n01843383","n01910747","n02074367","n02089867","n02091831",.....,"n07613480"]

however , labels in training batch(code: inputs, targets = inputs.cuda(), targets.cuda()) with MPerClassSampler are as follows:

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Apr 22, 2021

The labels should be integers, sorry I forgot to mention that.

Edit: Actually I haven't tested with strings. It's possible strings work.

Edit2: Nevermind, string labels should work.

Also, in addition to passing the batch size into the dataloader, you can pass batch_size into MPerClassSampler. Then it will check to make sure your m, labels, and batch_size are all compatible.

@KevinMusgrave
Copy link
Owner

Make sure that labels is the same length as your dataset.

@ashutoshml
Copy link

ashutoshml commented Jan 5, 2022

Images with the same label form positive pairs, and images with different labels form negative pairs.

For example, if the labels in a batch are [0, 0, 1, 1, 1] then:

  • the positive pairs will be formed by indices [0, 1], [1, 0], [2, 3], [2, 4], [3, 2], [3, 4], [4, 2], [4, 3]
  • the negative pairs will be formed by indices [0, 2], [0, 3], [0, 4], [1, 2], [1, 3], [1, 4], [2, 0], [2, 1], [3, 0], [3, 1], [4, 0], [4, 1]

@KevinMusgrave
Is there a way to specify negative and positive pairs instead of providing labels?
i.e.,

loss_func(embeddings, positive_pairs, negative_pairs)

@KevinMusgrave
Copy link
Owner

Yes, but unfortunately "dummy" labels are still required:

# positive pairs are formed by (a1, p)
# negatives pairs are formed by (a2, n)
a1= torch.randint(0, 10, size=(100,))
p = torch.randint(0, 10, size=(100,))
a2 = torch.randint(0, 10, size=(100,))
n = torch.randint(0, 10, size=(100,))

pairs = a1, p, a2, n
# won't actually be used
labels = torch.zeros(len(embeddings))
loss_func(embeddings, labels, pairs)

@ashutoshml
Copy link

Yes, but unfortunately "dummy" labels are still required:

# positive pairs are formed by (a1, p)
# negatives pairs are formed by (a2, n)
a1= torch.randint(0, 10, size=(100,))
p = torch.randint(0, 10, size=(100,))
a2 = torch.randint(0, 10, size=(100,))
n = torch.randint(0, 10, size=(100,))

pairs = a1, p, a2, n
# won't actually be used
labels = torch.zeros(len(embeddings))
loss_func(embeddings, labels, pairs)

Thanks.

"the positive pairs will be formed by indices [0, 1], [1, 0], [2, 3], [2, 4], [3, 2], [3, 4], [4, 2], [4, 3]
the negative pairs will be formed by indices [0, 2], [0, 3], [0, 4], [1, 2], [1, 3], [1, 4], [2, 0], [2, 1], [3, 0], [3, 1], [4, 0], [4, 1]"

So the number of pairs (a1, p) can be different from (a2, n) as in this example, right? As in

a1= torch.randint(0, 10, size=(100,))
p = torch.randint(0, 10, size=(100,))
a2 = torch.randint(0, 10, size=(500,))
n = torch.randint(0, 10, size=(500,))

@KevinMusgrave
Copy link
Owner

Yes

@KevinMusgrave
Copy link
Owner

Starting in v1.5.0, losses like ContrastiveLoss and TripletMarginLoss no longer require dummy labels if indices_tuple is passed in:

loss = loss_fn(embeddings, indices_tuple=triplets)

(Posting here for future readers.)

@lucasestini
Copy link

Hello, does the loss consider every positive pair in the batch? Like, if I have 3 samples belonging to the same class, do they all contribute to the loss and get pulled together? Are they considered as 6 positives pairs or treated at once?

@KevinMusgrave
Copy link
Owner

@lucasestini They are considered as 6 positive pairs.

@yankungou
Copy link

yankungou commented Sep 29, 2022

Hi @KevinMusgrave, thank you for your wonderful code. I found that it still requires dummy labels as input. I don't know why. My pytorch_metric_learning version is 1.6.2. PyTorch version is 1.12.0. I use DDP. Thanks!

@KevinMusgrave
Copy link
Owner

@YK711 Are you using DistributedLossWrapper? I see I forgot to update that class to make labels optional. I've added it to my todo list now: #531

@yankungou
Copy link

Yes, I use DistributedLossWrapper. Thank you @KevinMusgrave!

@KevinMusgrave
Copy link
Owner

SelfSupervisedLoss has been added to v2.0.0. If you have two "views" of data, you don't need to make labels anymore:

from pytorch_metric_learning.losses import SelfSupervisedLoss
loss_func = SelfSupervisedLoss(NTXentLoss())
embeddings = model(data)
augmented = model(augmented_data)
loss = loss_func(embeddings, augmented)

@happen2me
Copy link

I find the formula in the documentation for the NTXentLoss is misleading:
image

If labels for indices [a, b, c, d] are [0, 0, 2, 3], the formula implies that the positive pair is [a, b], while the negative pairs are [a, c], [a, d]. However, this is different from its actual implementation, where the positive pairs are [a, b], [b, a], and the negative samples are [a, c], [a, d], [b, c], [b, d], [c, a], [c, b], [c, d], [d, a], [d, b], [d, c].

Maybe we should remove this formula and replace it with more clear sampling information? I can make a pull request for it if I am not wrong :)

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Apr 9, 2023

If labels for indices [a, b, c, d] are [0, 0, 2, 3], the formula implies that the positive pair is [a, b], while the negative pairs are [a, c], [a, d].

@happen2me Actually that is how it works.
The negative pairs for [a, b] will all be of the form [a, _].
The negative pairs for [b, a] will all be of the form [b, _].
And the loss for [a, b] is computed separately from the loss for [b, a].

I apologize if my comments further up this thread were confusing.

See these related comments:

#606 (comment)
#600 (comment)
#6 (comment)

Edit: Or are you referring to the fact that both [a, b] and [b, a] are used as positive pairs, rather than just [a, b] ?

@happen2me
Copy link

@KevinMusgrave Thank you very much for your reply! Yes, when I saw the equation, I thought the positive pairs are [anchor, positive], and the negative pairs are [anchor, negative 1], [anchor, negative 2]... But in fact, those of the same label with anchor are regarded positive samples to each other (including the anchor), while the negative samples also include pairs like [positive, negative k], and [negative k1, negative k2], which is not that intuitive purely from the equation.

I wanted to apply the loss in a situation like [question, positive passage, negative passage 1, negative passage 2, ...]. In this case, pushing negatives apart from each other isn't really necessary. I managed to do it with manually created indices_tuple.

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Apr 11, 2023

I wanted to apply the loss in a situation like [question, positive passage, negative passage 1, negative passage 2, ...]. In this case, pushing negatives apart from each other isn't really necessary. I managed to do it with manually created indices_tuple.

@happen2me Just to make sure there's no confusion, I'll go through a simple example.

Say we have a batch size of 4 with labels: [dog, dog, cat, mouse].

In this case there will be two losses:

  • one loss for the positive pair [0,1] with negative pairs [0,2], [0,3].
  • one loss for the positive pair [1,0] with negative pairs [1,2], [1,3].

So the negative pair [2,3] (cat, mouse) isn't used at all.

I've created an issue for improving the documentation: #608

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequently Asked Questions Frequently Asked Questions question A general question about the library
Projects
None yet
Development

No branches or pull requests

7 participants