Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why negative prompts to positive is more harder than positive prompts to negative? #2

Closed
littlehacker26 opened this issue Mar 31, 2022 · 1 comment

Comments

@littlehacker26
Copy link

Hi, the work is very excellent and has benefited me a lot . However, we found a strange phenomenon. The conversion rate of the positive prompts to the negative reported in the paper was around 36%, whereas it was around 65% in the setting of negative to positive. This is very counterintuitive, because in general the two should be symmetrical. How do you explain this phenomenon?

@alisawuffles
Copy link
Owner

Hi there, thank you for the thoughtful observation! We commented on this in the paper.

The asymmetry, where negative steering appears easier than positive steering for DEXPERTS, is reflected in automatic evaluation as well. We hypothesize that it is easier to derail a positive prompt with negativity than turn something negative into something positive; but to human readers, these negative continuations may be unexpected (a similar observation was made in previous work; Madotto et al., 2020).

In general, I don't think it's obvious that the task should be symmetrical. Imagine that the prompt is already about something negative like death or hatred — it may be harder to change the overall sentiment to be positive! On the other hand, if the prompt is about positive things like life or love, we can think of ways to make it cynical. To investigate this further, you could look at the positive & negative prompts (which are included in the repo) and consider whether as a human, it is harder to do one kind of steering than another. If you choose to look into this, I'm curious what you find!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants