Why negative prompts to positive is more harder than positive prompts to negative? #2

littlehacker26 · 2022-03-31T05:51:46Z

Hi, the work is very excellent and has benefited me a lot . However, we found a strange phenomenon. The conversion rate of the positive prompts to the negative reported in the paper was around 36%, whereas it was around 65% in the setting of negative to positive. This is very counterintuitive, because in general the two should be symmetrical. How do you explain this phenomenon?

alisawuffles · 2022-04-01T18:26:36Z

Hi there, thank you for the thoughtful observation! We commented on this in the paper.

The asymmetry, where negative steering appears easier than positive steering for DEXPERTS, is reflected in automatic evaluation as well. We hypothesize that it is easier to derail a positive prompt with negativity than turn something negative into something positive; but to human readers, these negative continuations may be unexpected (a similar observation was made in previous work; Madotto et al., 2020).

In general, I don't think it's obvious that the task should be symmetrical. Imagine that the prompt is already about something negative like death or hatred — it may be harder to change the overall sentiment to be positive! On the other hand, if the prompt is about positive things like life or love, we can think of ways to make it cynical. To investigate this further, you could look at the positive & negative prompts (which are included in the repo) and consider whether as a human, it is harder to do one kind of steering than another. If you choose to look into this, I'm curious what you find!

alisawuffles closed this as completed Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why negative prompts to positive is more harder than positive prompts to negative? #2

Why negative prompts to positive is more harder than positive prompts to negative? #2

littlehacker26 commented Mar 31, 2022

alisawuffles commented Apr 1, 2022

Why negative prompts to positive is more harder than positive prompts to negative? #2

Why negative prompts to positive is more harder than positive prompts to negative? #2

Comments

littlehacker26 commented Mar 31, 2022

alisawuffles commented Apr 1, 2022