You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this interesting work! I was wondering how this attack would work now that logit_bias no longer affects logprobs [1][2], as we can no longer use the trick the logprobs of the target tokens (if they don't appear in the top-5). Would love to know your thoughts on this change; thanks!
Yeah, we're aware of the issue. I have an ad-hoc idea for how to fix this, and I'm currently testing it out. In short, instead of maximizing prob of "Sure" token, you can try to minimize prob of tokens that are not "Sure" or other refusal tokens. The preliminary result we have is pretty promising, but I want to test it out on more models/APIs.
Hi @chawins,
Thank you for this interesting work! I was wondering how this attack would work now that
logit_bias
no longer affectslogprobs
[1][2], as we can no longer use the trick the logprobs of the target tokens (if they don't appear in the top-5). Would love to know your thoughts on this change; thanks!Apoorva
[1] Logit_bias does not work now- OpenAI Community
[2] x.com- Brian Huang
The text was updated successfully, but these errors were encountered: