Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UltraFeedback #464

Merged
merged 23 commits into from
Mar 25, 2024
Merged

Add UltraFeedback #464

merged 23 commits into from
Mar 25, 2024

Conversation

alvarobartt
Copy link
Member

@alvarobartt alvarobartt commented Mar 22, 2024

Description

This PR adds the UltraFeedback task, but improving some parts from the former implementation within the main branch of distilabel, as that was not a faithful reproduction of the original paper of UltraFeedback.

So this PR adds a more faithful implementation such as fixing the system_prompt used, better output parsing/formatting, while still keeping the former text-quality task now renamed to quality-assessment task, not defined within the original paper, but kept there because computing individual scores can get way more expensive, and it's proven to work fine.

Closes #437

@alvarobartt alvarobartt added this to the 1.0.0 milestone Mar 22, 2024
@alvarobartt alvarobartt self-assigned this Mar 22, 2024
@alvarobartt alvarobartt linked an issue Mar 22, 2024 that may be closed by this pull request
@dvsrepo
Copy link
Member

dvsrepo commented Mar 22, 2024

@alvarobartt please keep the overall quality task. It's there for a reason and it's the one that has been working best. The others mean that we need to run N judges to get the overall rating which is extremely costly.

As discussed, we don't have to provide faithful reproductions if we can improve existing approaches.

I would choose a better name than text quality, because it's really and overall assessment of the response.

@alvarobartt
Copy link
Member Author

alvarobartt commented Mar 22, 2024

@alvarobartt please keep the overall quality task. It's there for a reason and it's the one that has been working best. The others mean that we need to run N judges to get the overall rating which is extremely costly.

As discussed, we don't have to provide faithful reproductions if we can improve existing approaches.

Fair, consider it done! 👍🏻 Thanks for the feedback!

I would choose a better name than text quality, because it's really and overall assessment of the response.

Also is overall-assessment fine for the name of that aspect?

@dvsrepo
Copy link
Member

dvsrepo commented Mar 22, 2024

@alvarobartt please keep the overall quality task. It's there for a reason and it's the one that has been working best. The others mean that we need to run N judges to get the overall rating which is extremely costly.
As discussed, we don't have to provide faithful reproductions if we can improve existing approaches.

Fair, consider it done! 👍🏻 Thanks for the feedback!
Cool thanks!

I would choose a better name than text quality, because it's really and overall assessment of the response.

Also is overall-assessment fine for the name of that aspect?

Maybe something shorter/easier to type? Maybe overall-rating? or smth simple from here https://www.thesaurus.com/browse/assessment

@alvarobartt alvarobartt changed the title Add Ultrafeedback Add UltraFeedback Mar 22, 2024
@alvarobartt alvarobartt marked this pull request as ready for review March 22, 2024 15:42
@alvarobartt alvarobartt mentioned this pull request Mar 22, 2024
@alvarobartt alvarobartt merged commit c04d865 into core-refactor Mar 25, 2024
4 checks passed
@alvarobartt alvarobartt deleted the ultrafeedback branch March 25, 2024 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Adapt UltraFeedbackTask to new Task interface
3 participants