Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

temperature scaling #1

Open
Lyttonkeepfoing opened this issue Sep 27, 2023 · 6 comments
Open

temperature scaling #1

Lyttonkeepfoing opened this issue Sep 27, 2023 · 6 comments

Comments

@Lyttonkeepfoing
Copy link

It's a really nice repo. I read your paper and wondering the baseline you set is MSP+Temperature Scaling?
But I could not find the Temperture Scaling operations in your code.
parser.add_argument('--cwd_weight', default=0.1, type=float, help='Trianing time tempscaling') the option here
class KDLoss(nn.Module):
def init(self, temp_factor):
super(KDLoss, self).init()
self.temp_factor = temp_factor
self.kl_div = nn.KLDivLoss(reduction="sum")

def forward(self, input, target):
    log_p = torch.log_softmax(input/self.temp_factor, dim=1)
    q = torch.softmax(target/self.temp_factor, dim=1)
    loss = self.kl_div(log_p, q)*(self.temp_factor**2)/input.size(0)
    return loss

kdloss = KDLoss(2.0) the KDL loss here.
And Temperre Scaling is used in training time not inference? You said it's a post-hoc method, so you should use it in your inference time? Could you help me with this confusion?

@Impression2805
Copy link
Owner

Thanks for your interest in our paper. The Baseline is MSP with standard training, and we did not use temperature scaling at training or inference time.
The code about kdloss and cwd_weight are just other useless methods that we have tried. We will remove them to avoid confusion.

@Lyttonkeepfoing
Copy link
Author

Thanks for your respond! And I have one more question: In your paper, you said that :We randomly sample 10% of
training samples as a validation dataset for each task because it is a requirement for post-calibration methods like temperature scaling. So the results reported in your paper is trained on 45000 samples and tested them on validation set ? It seems like there is no codes about your test operation.
-------------------Make loader-------------------
Train Dataset : 45000 Valid Dataset : 5000 Test Dataset : 10000
If you test them on test dataset directly, your train dataset have 45000 samples not 50000.
I think this is really important.

@Impression2805
Copy link
Owner

Yes, the model is trained on 45000 samples and tested on the original test dataset (10000); If you want to train the model on all training set, just modify the code (line 115) in utils/data.py. The results of training on all training set can also be found in our CVPR2023 paper "OpenMix: Exploring Out-of-Distribution samples for Misclassification Detection".

@Lyttonkeepfoing
Copy link
Author

YEAH, I noticed your paper about OpenMix. The methods in your table 2:Doctor [NeurIPS21] [19]ODIN [ICLR18] [38]Energy [NeurIPS20] [39]MaxLogit [ICML22] [23] LogitNorm and Mc_dropout , trust score , tcp . Do you reimplement this method in your OpenMix Repo? Although some of them are Post-hoc method, I think make a comparison in the same training setting is important. The accuracy impacts the other metrics a lot. I'm so sorry to ask you so many questions, you're a good researcher in failure prediction field. Learn a lot from your works~

@Impression2805
Copy link
Owner

Following your suggestion, we will upload our implementation of Doctor, ODIN, Energy, MaxLogit and LogitNorm. As for Mc_dropout , trust score and tcp, we just used the results in the TPAMI version of TCP paper. In our papers, we also emphasized that classification accuracy is important. For example, LogitNorm itself has low accuracy than baseline because it constrains the norm during training. For TCP, with commonly used standard training (e.g., SGD, learning rate schedule, etc), there is few misclassified samples in training set for learning the confidnet. Actually, it is really hard and may be impossible to keep the same accuracy for all compared methods, and we therefore report the accuracy along with other confidence estimation metrics.

@Lyttonkeepfoing
Copy link
Author

That's exactly what I think~ Looking forward to you updatting your repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants