Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logit_bias no longer affects logprobs #2

Open
apoorvaverma31 opened this issue May 17, 2024 · 1 comment
Open

logit_bias no longer affects logprobs #2

apoorvaverma31 opened this issue May 17, 2024 · 1 comment

Comments

@apoorvaverma31
Copy link

Hi @chawins,

Thank you for this interesting work! I was wondering how this attack would work now that logit_bias no longer affects logprobs [1][2], as we can no longer use the trick the logprobs of the target tokens (if they don't appear in the top-5). Would love to know your thoughts on this change; thanks!

Apoorva

[1] Logit_bias does not work now- OpenAI Community
[2] x.com- Brian Huang

@chawins
Copy link
Owner

chawins commented May 21, 2024

Yeah, we're aware of the issue. I have an ad-hoc idea for how to fix this, and I'm currently testing it out. In short, instead of maximizing prob of "Sure" token, you can try to minimize prob of tokens that are not "Sure" or other refusal tokens. The preliminary result we have is pretty promising, but I want to test it out on more models/APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants