why specify `ignore_index=0` in the NLLLoss function in BERTTrainer? #98

Jasmine969 · 2022-07-07T02:46:50Z

trainer/pretrain.py

class BERTTrainer:
    def __init__(self, ...):
        ... 
        # Using Negative Log Likelihood Loss function for predicting the masked_token
        self.criterion = nn.NLLLoss(ignore_index=0)
        ...

I cannot understand why ignore index=0 is specified when calculating NLLLoss. If the ground truth of is_next is False (label = 0) in terms of the NSP task but BERT predicts True, then NLLLoss will be 0 (or nan)... so what's the aim of ignore_index = 0 ???

====================

Well, I've found that ignore_index = 0 is useful to the MLM task, but I still can't agree the NSP task should share the same NLLLoss with MLM.

The text was updated successfully, but these errors were encountered:

MingchangLi · 2023-01-10T16:16:25Z

see #32
change
self.criterion = nn.NLLLoss(ignore_index=0)
to
self.criterion = nn.NLLLoss()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why specify `ignore_index=0` in the NLLLoss function in BERTTrainer? #98

why specify `ignore_index=0` in the NLLLoss function in BERTTrainer? #98

Jasmine969 commented Jul 7, 2022 •

edited

MingchangLi commented Jan 10, 2023 •

edited

why specify ignore_index=0 in the NLLLoss function in BERTTrainer? #98

why specify ignore_index=0 in the NLLLoss function in BERTTrainer? #98

Comments

Jasmine969 commented Jul 7, 2022 • edited

trainer/pretrain.py

MingchangLi commented Jan 10, 2023 • edited

why specify `ignore_index=0` in the NLLLoss function in BERTTrainer? #98

why specify `ignore_index=0` in the NLLLoss function in BERTTrainer? #98

Jasmine969 commented Jul 7, 2022 •

edited

MingchangLi commented Jan 10, 2023 •

edited