`logit_bias` no longer affects `logprobs` #2

apoorvaverma31 · 2024-05-17T13:30:42Z

Thank you for this interesting work! I was wondering how this attack would work now that logit_bias no longer affects logprobs [1][2], as we can no longer use the trick the logprobs of the target tokens (if they don't appear in the top-5). Would love to know your thoughts on this change; thanks!

Apoorva

[1] Logit_bias does not work now- OpenAI Community
[2] x.com- Brian Huang

The text was updated successfully, but these errors were encountered:

chawins · 2024-05-21T06:19:54Z

Yeah, we're aware of the issue. I have an ad-hoc idea for how to fix this, and I'm currently testing it out. In short, instead of maximizing prob of "Sure" token, you can try to minimize prob of tokens that are not "Sure" or other refusal tokens. The preliminary result we have is pretty promising, but I want to test it out on more models/APIs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`logit_bias` no longer affects `logprobs` #2

`logit_bias` no longer affects `logprobs` #2

apoorvaverma31 commented May 17, 2024

chawins commented May 21, 2024

logit_bias no longer affects logprobs #2

logit_bias no longer affects logprobs #2

Comments

apoorvaverma31 commented May 17, 2024

chawins commented May 21, 2024

`logit_bias` no longer affects `logprobs` #2

`logit_bias` no longer affects `logprobs` #2