Bug in pseudo-labelling code? #3

linzhiqiu · 2022-03-01T06:31:44Z

I am confused by the implementation of pseudo-labelling in this library (lib/algs/pseudo_label.py). Especially, the forward() has:

y_probs = y.softmax(1)
onehot_label = self.__make_one_hot(y_probs.max(1)[1]).float()
gt_mask = (y_probs > self.th).float()
gt_mask = gt_mask.max(1)[0] # reduce_any
lt_mask = 1 - gt_mask # logical not
p_target = gt_mask[:,None] * 10 * onehot_label + lt_mask[:,None] * y_probs

output = model(x)
loss = (-(p_target.detach() * F.log_softmax(output, 1)).sum(1)*mask).mean()
return loss

I am confused why when computing p_target, the gt_mask is multiplied by 10? What is meaning of 10 here?

Also, I believe the lt_mask means the examples with max probability smaller than threshold and thus should be ignored when computing the loss. However, the p_target has the + lt_mask[:,None] * y_probs.

This seems to be different from what is described in the paper. If you are implementing a variant of pseudo-labelling loss function, could you point me to that paper?

The text was updated successfully, but these errors were encountered:

linzhiqiu · 2022-03-01T06:33:27Z

I am also confused by the coef in training_hierarchy.py:

coef = args.consis_coef * math.exp(-5 * (1 - min(iteration/args.warmup, 1))**2)

This coefficient does not appear in the original paper.

linzhiqiu · 2022-03-03T07:18:59Z

One more question: For self training, it seems both labeled and unlabeled data are used for the KL divergence between teacher and student? The original paper says only the unlabeled data is used to compute the KLD.

jongchyisu · 2022-03-03T21:25:53Z

Hello, for the first question, the code for pseudo-label is from this PyTorch repo which is a re-implementation from this official Tensorflow implementation from Google. From their comment: Multiplying the one-hot pseudo_labels by 10 makes them look like logits.

As for the lt_mask, all of the papers from Google (Oliver et al., FixMatch, etc) use the same codebase but they did not specify the loss functions. You are right that the lt_mask is an extra term for pseudo-labeling that I should include in the paper. Since the output_teacher and output_student are the same when not using pseudo-labels, the extra term becomes the entropy of the predictions.

As for the coef, this is for warmup scheduling following Oliver et al.

About self-training: Thanks for pointing this out. It is a typo in the paper, I did indeed use both labeled and unlabeled data for self-training.

linzhiqiu · 2022-03-10T23:26:59Z

Thanks for the helpful response! Could you point me to any paper that uses this specific variant of this pseudo-labelling loss?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in pseudo-labelling code? #3

Bug in pseudo-labelling code? #3

linzhiqiu commented Mar 1, 2022

linzhiqiu commented Mar 1, 2022

linzhiqiu commented Mar 3, 2022

jongchyisu commented Mar 3, 2022

linzhiqiu commented Mar 10, 2022

Bug in pseudo-labelling code? #3

Bug in pseudo-labelling code? #3

Comments

linzhiqiu commented Mar 1, 2022

linzhiqiu commented Mar 1, 2022

linzhiqiu commented Mar 3, 2022

jongchyisu commented Mar 3, 2022

linzhiqiu commented Mar 10, 2022