Size of Prediction Sets using APS Different Than Reported in RAPS Paper #8

kevinkasa · 2023-03-10T23:31:03Z

Hello,

Thank you so much for providing the conformal prediction tutorial & corresponding notebooks, they are super helpful!

I had a question regarding the size of the prediction sets returned using the APS methods. In the implementation provided in the notebooks, the prediction sets are far larger than reported on your paper than introduced RAPS. The notebook implementation returns sets that are on average >200 labels, whereas the paper reports an average set size of 10.4, on ResNet152.

I have not done extensive evaluation on RAPS, but it seems the notebook implementation also returns slightly larger sets (set size of ~3).

I was wondering if you have any ideas as to what might be causing this discrepancy, and what the best way to replicate the results in the paper might be.

Also, I wasn't sure which repo this issue should be opened in, so apologies if it doesn't fit here. Thanks in advance!

aangelopoulos · 2023-03-11T22:47:02Z

It's probably due to the lack of randomization!
None of the methods herein are the randomized versions of their respective algorithms... and APS is extremely bad without randomization.
If you randomize, you should recover roughly the results in the paper. Of course, that paper also has its own repo, but it's less friendly than this one.

aangelopoulos · 2023-03-14T22:37:24Z

Hey @kevinkasa, have you had a chance to follow up here? Just wondering if this answers your question.

kevinkasa · 2023-03-15T20:06:18Z

Hey @aangelopoulos thanks for the quick response! I was just slightly confused since both your paper and the APS paper seemed to suggest that randomization should affect the sets by at most one element, so it was surprising that APS lead to considerably larger sets without it. I suppose that algorithm is just super sensitive without it then?

Was planning on trying to add randomization to the notebook implementations but haven't had a chance yet. I am trying out the other RAPS repository in the meantime as well. Thanks!

aangelopoulos · 2023-03-17T23:14:32Z

Good question.

Randomization at test time only changes the set by one element.
Randomization during calibration has a much larger effect.

kevinkasa · 2023-03-21T21:23:57Z

I see, thank you for the clarification!

aangelopoulos self-assigned this Mar 11, 2023

aangelopoulos added the question Further information is requested label Mar 11, 2023

kevinkasa closed this as completed Mar 21, 2023

szalouk mentioned this issue Sep 13, 2023

Score function for APS #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size of Prediction Sets using APS Different Than Reported in RAPS Paper #8

Size of Prediction Sets using APS Different Than Reported in RAPS Paper #8

kevinkasa commented Mar 10, 2023

aangelopoulos commented Mar 11, 2023

aangelopoulos commented Mar 14, 2023

kevinkasa commented Mar 15, 2023

aangelopoulos commented Mar 17, 2023

kevinkasa commented Mar 21, 2023

Size of Prediction Sets using APS Different Than Reported in RAPS Paper #8

Size of Prediction Sets using APS Different Than Reported in RAPS Paper #8

Comments

kevinkasa commented Mar 10, 2023

aangelopoulos commented Mar 11, 2023

aangelopoulos commented Mar 14, 2023

kevinkasa commented Mar 15, 2023

aangelopoulos commented Mar 17, 2023

kevinkasa commented Mar 21, 2023