-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ComplexityScorer
and QualityScorer
tasks from Deita
#302
Conversation
EvolComplexityScorer
and EvolQualityScorer
tasks from Deita
Related to #299, merging after that one will yield 2 of the steps from the Deita framework. The |
What's not so clear/convincing to me from the paper and this implementation is they mix ranking and rating. What's used at the end of the process are the ratings (scores) right? If so I'd recommend to not add a new thing (ranking) that mixes both rating and ranking, reading the values can be quite confusing because I don't know if 1 and 2 are positions, rating, both? If we get the ratings I would keep that in the ratings column (like any other preference task) what do you think? |
Also we will soon add PairRM (see llmblender) and those are real rankings: a list of positions (with no rating) so it might get confusing |
That's true
I agree, will update it so that it works like other preference tasks but without the rationale, and also the wording, even if we make a clear distinction than the one from the paper, assuming that for us those are ratings. |
New version ready. Now it's defined in terms of a special |
…feat/evol-complexity-scorer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I think it would be better for the tasks to be called just ComplexityScorerTask
and QualityScorerTask
, as they can also be used with not evolved instructions or responses.
I agree 👍 |
…feat/evol-complexity-scorer
EvolComplexityScorer
and EvolQualityScorer
tasks from DeitaComplexityScorer
and QualityScorer
tasks from Deita
Description
This PR adds new tasks
EvolComplexityScorer
andEvolQualityScorer
from the Deita paper.Example of use (Complexity):
From an LLM:
From a Pipeline
Example of use (Quality):
From an LLM:
From a Pipeline
The dataset just has the appropriate format for the pipeline, the content and results aren't important