Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation loss vs Training loss in AudioSet training #31

Open
Tomlevron opened this issue Oct 7, 2021 · 7 comments
Open

Validation loss vs Training loss in AudioSet training #31

Tomlevron opened this issue Oct 7, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@Tomlevron
Copy link

Hi!

First of all i would like to thank you for sharing with everyone your amazing work! Truly inspiring and fascinating work you shard with us.

I have a question regarding the differences of the training loss and the validation loss. It seems that the validation loss is much higher than the training loss, is that make sense? isn't it overfitting?

I also tried to fine tune the Audioset trained model for my data and is showed the same differences (with and without augmentations).

Here is an example from the logs: test-full-f10-t10-pTrue-b12-lr1e-5/log_2090852.txt:

train_loss: 0.011128
valid_loss: 0.693989

I'm still new to deep learning so maybe I'm missing something.

Thank you!

@YuanGongND
Copy link
Owner

Thanks for your interest.

I think it is not an overfitting issue as you should also see a performance drop in mAP or accuracy on the validation set if the model is overfitted. I think the reason is that we added a Sigmoid function on top of the output of the model in the inference stage (but not in the training stage) before loss computation to make sure mAP/accuracy is calculated correctly. It changes the validation loss. See here.

-Yuan

@YuanGongND YuanGongND added the bug Something isn't working label Oct 8, 2021
@hbellafkir
Copy link

Wouldn't it be wrong to train with Softmax and use Sigmoid for mAP? Using Softmax instead of Sigmoid gives a higher mAP value.

@YuanGongND
Copy link
Owner

Could you elaborate on this point?

I think we did not use softmax during training, the reason why we added an extra Sigmoid in inference but not in training is that BCEWithLogitsLoss already includes one Sigmoid.

@hbellafkir
Copy link

In the case of CrossEntropyLoss, the loss is calculated with Softmax (here) as it is included in the CrossEntropyLoss operation. It is not correct to use Sigmoid for inference when CrossEntropyLoss is used in training for my understanding. on a custom dataset that I use, switching from Sigmoid to Softmax results in a higher mAP value during inference.

@hbellafkir
Copy link

In the case of CrossEntropyLoss, the loss is calculated with Softmax (here) as it is included in the CrossEntropyLoss operation. It is not correct to use Sigmoid for inference when CrossEntropyLoss is used in training for my understanding. on a custom dataset that I use, switching from Sigmoid to Softmax results in a higher mAP value during inference.

@YuanGongND any thoughts on this?

@YuanGongND
Copy link
Owner

Yes - I think you can skip the Sigmoid in inference. That was just used to make training/inference consistent for the multi-label classification (i.e., one audio has more than one label) tasks.

When you use CrossEntropyLoss, I assume you have a single-label dataset, using Softmax here might improve mAP, but won't improve accuracy, but mAP is less important for single-label classification, that's why we use accuracy in the ESC-50 and SpeechCommands recipe.

For multi-label classification, adding Sigmoid won't change mAP either as Sigmoid is monotonic, so I think you can also remove that, but that could impact the ensemble performance.

@YuanGongND
Copy link
Owner

After some investigation, it seems to be a logging bug. The train and eval loss difference is over-estimated in the code.

In traintest.py, the loss_meter is cleaned up every epoch, but the average is printed out every 1000 iterations. So the large loss value at early iterations accumulates.

Changing 'loss_meter.avg' to 'loss_meter.val' at here can alleviate this problem. But I would suggest doing an offline loss evaluation (i.e., check the training loss using the best checkpoint model after the training process finishes), that would be the most accurate solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants