Add `UltraFeedback` #464

alvarobartt · 2024-03-22T10:41:15Z

Description

This PR adds the UltraFeedback task, but improving some parts from the former implementation within the main branch of distilabel, as that was not a faithful reproduction of the original paper of UltraFeedback.

So this PR adds a more faithful implementation such as fixing the system_prompt used, better output parsing/formatting, while still keeping the former text-quality task now renamed to quality-assessment task, not defined within the original paper, but kept there because computing individual scores can get way more expensive, and it's proven to work fine.

Closes #437

dvsrepo · 2024-03-22T11:32:05Z

@alvarobartt please keep the overall quality task. It's there for a reason and it's the one that has been working best. The others mean that we need to run N judges to get the overall rating which is extremely costly.

As discussed, we don't have to provide faithful reproductions if we can improve existing approaches.

I would choose a better name than text quality, because it's really and overall assessment of the response.

alvarobartt · 2024-03-22T11:33:12Z

@alvarobartt please keep the overall quality task. It's there for a reason and it's the one that has been working best. The others mean that we need to run N judges to get the overall rating which is extremely costly.

As discussed, we don't have to provide faithful reproductions if we can improve existing approaches.

Fair, consider it done! 👍🏻 Thanks for the feedback!

I would choose a better name than text quality, because it's really and overall assessment of the response.

Also is overall-assessment fine for the name of that aspect?

dvsrepo · 2024-03-22T11:59:30Z

@alvarobartt please keep the overall quality task. It's there for a reason and it's the one that has been working best. The others mean that we need to run N judges to get the overall rating which is extremely costly.
As discussed, we don't have to provide faithful reproductions if we can improve existing approaches.

Fair, consider it done! 👍🏻 Thanks for the feedback!
Cool thanks!

I would choose a better name than text quality, because it's really and overall assessment of the response.

Also is overall-assessment fine for the name of that aspect?

Maybe something shorter/easier to type? Maybe overall-rating? or smth simple from here https://www.thesaurus.com/browse/assessment

Co-authored-by: Daniel Vila <dvsrepo@users.noreply.github.com>

tests/unit/steps/task/evol_quality/__init__.py

alvarobartt added 5 commits March 21, 2024 15:02

Add UltraFeedback (WIP)

a03bd3c

Increase InferenceEndpointsLLM timeout when paused or scaledToZero

62f5f2a

Fix QualityScorer docstring

9f758c8

Add templates/ultrafeedback/instruction-following.jinja2

a4a363e

Update UltraFeedback (WIP)

5d4452d

alvarobartt added fix integrations labels Mar 22, 2024

alvarobartt added this to the 1.0.0 milestone Mar 22, 2024

alvarobartt requested review from gabrielmbmb and plaguss March 22, 2024 10:41

alvarobartt self-assigned this Mar 22, 2024

alvarobartt linked an issue Mar 22, 2024 that may be closed by this pull request

Adapt UltraFeedbackTask to new Task interface #437

Closed

alvarobartt added 4 commits March 22, 2024 12:22

Add missing UltraFeedback Jinja2 templates

b673b86

Add missing docstrings to UltraFeedback

5f7d357

Remove raw output column from UltraFeedback

fef54a8

Rename task to aspect in UltraFeedback

c96b2a8

Align output parsing with official UltraFeedback implementation

35acec4

alvarobartt and others added 8 commits March 22, 2024 13:02

Add overall-rating aspect to UltraFeedback

f639011

Co-authored-by: Daniel Vila <dvsrepo@users.noreply.github.com>

Add logging message when step is loaded

0c40760

Fix UltraFeedback output parsing

5d97b5e

Merge branch 'core-refactor' into 'ultrafeedback'

190474a

Add client.wait if client.status="initializing"

1938df3

Remove duplicated logging and add missing emojis

7046706

Set None instead of N/A to avoid pyarrow errors

8bda8eb

Fix None handling in UltraFeedback output

577b89f

alvarobartt changed the title ~~Add Ultrafeedback~~ Add UltraFeedback Mar 22, 2024

Fix TestEvolQuality tests placement

500c4fc

alvarobartt added 2 commits March 22, 2024 16:28

Rename ComplexityScore->ComplexityScorer

f88987b

Add TestUltraFeedback

2dea306

alvarobartt marked this pull request as ready for review March 22, 2024 15:42

alvarobartt mentioned this pull request Mar 22, 2024

Add UltraCM task #465

Closed

Rename ComplexityScorer outputs to scores

18dd572

plaguss approved these changes Mar 24, 2024

View reviewed changes

alvarobartt commented Mar 25, 2024

View reviewed changes

tests/unit/steps/task/evol_quality/__init__.py Outdated Show resolved Hide resolved

Fix formatting in evol_quality/__init__.py

13d4377

alvarobartt merged commit c04d865 into core-refactor Mar 25, 2024
4 checks passed

alvarobartt deleted the ultrafeedback branch March 25, 2024 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `UltraFeedback` #464

Add `UltraFeedback` #464

alvarobartt commented Mar 22, 2024 •

edited

dvsrepo commented Mar 22, 2024

alvarobartt commented Mar 22, 2024 •

edited

dvsrepo commented Mar 22, 2024

Add UltraFeedback #464

Add UltraFeedback #464

Conversation

alvarobartt commented Mar 22, 2024 • edited

Description

dvsrepo commented Mar 22, 2024

alvarobartt commented Mar 22, 2024 • edited

dvsrepo commented Mar 22, 2024

Add `UltraFeedback` #464

Add `UltraFeedback` #464

alvarobartt commented Mar 22, 2024 •

edited

alvarobartt commented Mar 22, 2024 •

edited