Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bit confused about the Gram implementation #173

Open
SauceCat opened this issue Aug 10, 2023 · 4 comments
Open

A bit confused about the Gram implementation #173

SauceCat opened this issue Aug 10, 2023 · 4 comments

Comments

@SauceCat
Copy link

https://github.com/Jingkang50/OpenOOD/blob/main/openood/postprocessors/gram_postprocessor.py#L115
I wonder why the dev is used as conf directly? Isn't the larger the deviations the more likely the sample would be an OOD?

I checked the original implementation here: https://github.com/VectorInstitute/gram-ood-detection/blob/master/ResNet_Cifar10.ipynb
I found it actually using the negative of deviations when calculating the metrics.

But the metric looks ok, so I am quite confused. Am I missing sth?

@zjysteven
Copy link
Collaborator

It's indeed confusing. Like you said we are getting >50% AUROC for GRAM in all our experiments. Applying the negation seems the correct thing to do, but will lead to <50% AUROC. I will try to do some investigation when I'm available.

@chandramouli-sastry
Copy link

Hi, I worked on the gram matrix method and I seem to have fixed the implementation here

The results on CIFAR10 obtained with the current gram-matrix implementation are as follows:

                FPR@95         AUROC       AUPR_IN       AUPR_OUT           ACC
cifar100    91.68 ± 2.24  58.33 ± 4.49  56.74 ± 3.87   59.24 ± 4.62  95.06 ± 0.30
tin         90.06 ± 1.59  58.98 ± 5.19  61.65 ± 3.75   55.89 ± 5.56  95.06 ± 0.30
nearood     90.87 ± 1.91  58.66 ± 4.83  59.19 ± 3.79   57.57 ± 5.09  95.06 ± 0.30
mnist       70.30 ± 8.96  72.64 ± 2.34  36.92 ± 8.23   93.36 ± 1.21  95.06 ± 0.30
svhn       33.91 ± 17.35  91.52 ± 4.45  82.40 ± 8.85   96.62 ± 1.81  95.06 ± 0.30
texture     94.64 ± 2.71  62.34 ± 8.27  67.93 ± 5.60  55.93 ± 10.76  95.06 ± 0.30
places365   90.49 ± 1.93  60.44 ± 3.41  26.94 ± 2.62   85.64 ± 1.31  95.06 ± 0.30
farood      72.34 ± 6.73  71.74 ± 3.20  53.55 ± 4.74   82.89 ± 3.14  95.06 ± 0.30

With the corrected implementation, I was able to get:

                 FPR@95         AUROC       AUPR_IN      AUPR_OUT           ACC
cifar100   61.61 ± 0.82  84.61 ± 0.20  84.21 ± 0.20  83.75 ± 0.32  95.06 ± 0.30
tin        51.99 ± 1.16  87.16 ± 0.52  88.46 ± 0.41  84.34 ± 0.83  95.06 ± 0.30
nearood    56.80 ± 0.62  85.88 ± 0.35  86.33 ± 0.28  84.04 ± 0.56  95.06 ± 0.30
mnist       7.31 ± 1.02  97.57 ± 0.49  94.37 ± 0.85  99.48 ± 0.14  95.06 ± 0.30
svhn        6.67 ± 0.29  98.64 ± 0.02  96.73 ± 0.06  99.48 ± 0.04  95.06 ± 0.30
texture    14.86 ± 0.71  96.95 ± 0.11  97.99 ± 0.12  95.53 ± 0.09  95.06 ± 0.30
places365  42.81 ± 2.19  89.56 ± 0.80  73.32 ± 1.45  96.53 ± 0.33  95.06 ± 0.30
farood     17.91 ± 0.70  95.68 ± 0.25  90.60 ± 0.35  97.75 ± 0.09  95.06 ± 0.30

When I used the same checkpoints with the code referred to by @SauceCat , I was able to get marginally higher results for SVHN but did not test other datasets. The new code is not fully polished but seems to be working as expected. I also did not run experiments on datasets other CIFAR10 as InD.

Thank you for the OpenOOD benchmark and considering the gram matrix method for inclusion in the benchmark!

@zjysteven
Copy link
Collaborator

@chandramouli-sastry Thanks for sharing the results, and glad to see the much improved numbers with the updated implementation. Would you mind opening a pull request for this? Meanwhile we will update the gram matrix results in both the paper and the leaderboard.

@chandramouli-sastry
Copy link

Thank you! I just created a pull request for your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants