-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UltraFeedback
#464
Add UltraFeedback
#464
Conversation
@alvarobartt please keep the overall quality task. It's there for a reason and it's the one that has been working best. The others mean that we need to run N judges to get the overall rating which is extremely costly. As discussed, we don't have to provide faithful reproductions if we can improve existing approaches. I would choose a better name than text quality, because it's really and overall assessment of the response. |
Fair, consider it done! 👍🏻 Thanks for the feedback!
Also is |
Maybe something shorter/easier to type? Maybe overall-rating? or smth simple from here https://www.thesaurus.com/browse/assessment |
Co-authored-by: Daniel Vila <dvsrepo@users.noreply.github.com>
Description
This PR adds the
UltraFeedback
task, but improving some parts from the former implementation within themain
branch ofdistilabel
, as that was not a faithful reproduction of the original paper of UltraFeedback.So this PR adds a more faithful implementation such as fixing the
system_prompt
used, better output parsing/formatting, while still keeping the formertext-quality
task now renamed toquality-assessment
task, not defined within the original paper, but kept there because computing individual scores can get way more expensive, and it's proven to work fine.Closes #437