-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation loss vs Training loss in AudioSet training #31
Comments
Thanks for your interest. I think it is not an overfitting issue as you should also see a performance drop in mAP or accuracy on the validation set if the model is overfitted. I think the reason is that we added a Sigmoid function on top of the output of the model in the inference stage (but not in the training stage) before loss computation to make sure mAP/accuracy is calculated correctly. It changes the validation loss. See here. -Yuan |
Wouldn't it be wrong to train with Softmax and use Sigmoid for mAP? Using Softmax instead of Sigmoid gives a higher mAP value. |
Could you elaborate on this point? I think we did not use softmax during training, the reason why we added an extra Sigmoid in inference but not in training is that |
In the case of CrossEntropyLoss, the loss is calculated with Softmax (here) as it is included in the CrossEntropyLoss operation. It is not correct to use Sigmoid for inference when CrossEntropyLoss is used in training for my understanding. on a custom dataset that I use, switching from Sigmoid to Softmax results in a higher mAP value during inference. |
@YuanGongND any thoughts on this? |
Yes - I think you can skip the Sigmoid in inference. That was just used to make training/inference consistent for the multi-label classification (i.e., one audio has more than one label) tasks. When you use CrossEntropyLoss, I assume you have a single-label dataset, using Softmax here might improve mAP, but won't improve accuracy, but mAP is less important for single-label classification, that's why we use accuracy in the ESC-50 and SpeechCommands recipe. For multi-label classification, adding Sigmoid won't change mAP either as Sigmoid is monotonic, so I think you can also remove that, but that could impact the ensemble performance. |
After some investigation, it seems to be a logging bug. The train and eval loss difference is over-estimated in the code. In traintest.py, the loss_meter is cleaned up every epoch, but the average is printed out every 1000 iterations. So the large loss value at early iterations accumulates. Changing 'loss_meter.avg' to 'loss_meter.val' at here can alleviate this problem. But I would suggest doing an offline loss evaluation (i.e., check the training loss using the best checkpoint model after the training process finishes), that would be the most accurate solution. |
Hi!
First of all i would like to thank you for sharing with everyone your amazing work! Truly inspiring and fascinating work you shard with us.
I have a question regarding the differences of the training loss and the validation loss. It seems that the validation loss is much higher than the training loss, is that make sense? isn't it overfitting?
I also tried to fine tune the Audioset trained model for my data and is showed the same differences (with and without augmentations).
Here is an example from the logs: test-full-f10-t10-pTrue-b12-lr1e-5/log_2090852.txt:
I'm still new to deep learning so maybe I'm missing something.
Thank you!
The text was updated successfully, but these errors were encountered: