-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
temperature scaling #1
Comments
Thanks for your interest in our paper. The Baseline is MSP with standard training, and we did not use temperature scaling at training or inference time. |
Thanks for your respond! And I have one more question: In your paper, you said that :We randomly sample 10% of |
Yes, the model is trained on 45000 samples and tested on the original test dataset (10000); If you want to train the model on all training set, just modify the code (line 115) in utils/data.py. The results of training on all training set can also be found in our CVPR2023 paper "OpenMix: Exploring Out-of-Distribution samples for Misclassification Detection". |
YEAH, I noticed your paper about OpenMix. The methods in your table 2:Doctor [NeurIPS21] [19]ODIN [ICLR18] [38]Energy [NeurIPS20] [39]MaxLogit [ICML22] [23] LogitNorm and Mc_dropout , trust score , tcp . Do you reimplement this method in your OpenMix Repo? Although some of them are Post-hoc method, I think make a comparison in the same training setting is important. The accuracy impacts the other metrics a lot. I'm so sorry to ask you so many questions, you're a good researcher in failure prediction field. Learn a lot from your works~ |
Following your suggestion, we will upload our implementation of Doctor, ODIN, Energy, MaxLogit and LogitNorm. As for Mc_dropout , trust score and tcp, we just used the results in the TPAMI version of TCP paper. In our papers, we also emphasized that classification accuracy is important. For example, LogitNorm itself has low accuracy than baseline because it constrains the norm during training. For TCP, with commonly used standard training (e.g., SGD, learning rate schedule, etc), there is few misclassified samples in training set for learning the confidnet. Actually, it is really hard and may be impossible to keep the same accuracy for all compared methods, and we therefore report the accuracy along with other confidence estimation metrics. |
That's exactly what I think~ Looking forward to you updatting your repo. |
It's a really nice repo. I read your paper and wondering the baseline you set is MSP+Temperature Scaling?
But I could not find the Temperture Scaling operations in your code.
parser.add_argument('--cwd_weight', default=0.1, type=float, help='Trianing time tempscaling') the option here
class KDLoss(nn.Module):
def init(self, temp_factor):
super(KDLoss, self).init()
self.temp_factor = temp_factor
self.kl_div = nn.KLDivLoss(reduction="sum")
kdloss = KDLoss(2.0) the KDL loss here.
And Temperre Scaling is used in training time not inference? You said it's a post-hoc method, so you should use it in your inference time? Could you help me with this confusion?
The text was updated successfully, but these errors were encountered: