You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, the work is very excellent and has benefited me a lot . However, we found a strange phenomenon. The conversion rate of the positive prompts to the negative reported in the paper was around 36%, whereas it was around 65% in the setting of negative to positive. This is very counterintuitive, because in general the two should be symmetrical. How do you explain this phenomenon?
The text was updated successfully, but these errors were encountered:
Hi there, thank you for the thoughtful observation! We commented on this in the paper.
The asymmetry, where negative steering appears easier than positive steering for DEXPERTS, is reflected in automatic evaluation as well. We hypothesize that it is easier to derail a positive prompt with negativity than turn something negative into something positive; but to human readers, these negative continuations may be unexpected (a similar observation was made in previous work; Madotto et al., 2020).
In general, I don't think it's obvious that the task should be symmetrical. Imagine that the prompt is already about something negative like death or hatred — it may be harder to change the overall sentiment to be positive! On the other hand, if the prompt is about positive things like life or love, we can think of ways to make it cynical. To investigate this further, you could look at the positive & negative prompts (which are included in the repo) and consider whether as a human, it is harder to do one kind of steering than another. If you choose to look into this, I'm curious what you find!
Hi, the work is very excellent and has benefited me a lot . However, we found a strange phenomenon. The conversion rate of the positive prompts to the negative reported in the paper was around 36%, whereas it was around 65% in the setting of negative to positive. This is very counterintuitive, because in general the two should be symmetrical. How do you explain this phenomenon?
The text was updated successfully, but these errors were encountered: