Add `ComplexityScorer` and `QualityScorer` tasks from Deita #302

plaguss · 2024-01-25T16:10:34Z

Description

This PR adds new tasks EvolComplexityScorer and EvolQualityScorer from the Deita paper.

Example of use (Complexity):

from datasets import load_dataset

ds = load_dataset("argilla/distilabel-intel-orca-dpo-pairs", split="train")
subset = ds.select(range(2))

From an LLM:

from distilabel.tasks.preference.evol_scorer import EvolComplexityScorerTask
from distilabel.llm import OpenAILLM
from distilabel.pipeline import Pipeline

llm = OpenAILLM(
    EvolComplexityScorerTask(),
    openai_api_key=os.getenv("OPENAI_API_KEY", None),
    temperature=0.3
)
res = llm.generate([{"generations": subset[0]["generations"]}])
# >>> print(res[0][0]["raw_output"])
# [1] Score: 1
# [2] Score: 2
# >>> print(res[0][0]["parsed_output"])
# {'ranks': [1, 2]}

From a Pipeline

>>> pipe = Pipeline(labeller=llm)
>>> new_ds = pipe.generate(subset)
>>> new_ds.select_columns(["generations", "ranks"])[:]
{'generations': [['[\n  ["AFC Ajax (amateurs)", "has ground", "Sportpark De Toekomst"],\n  ["Ajax Youth Academy", "plays at", "Sportpark De Toekomst"]\n]',
   " Sure, I'd be happy to help! Here are the RDF triplets for the input sentence:\n\n[AFC Ajax (amateurs), hasGround, Sportpark De Toekomst]\n[Ajax Youth Academy, playsAt, Sportpark De Toekomst]\n\nExplanation:\n\n* AFC Ajax (amateurs) is the subject of the first triplet, and hasGround is the predicate that describes the relationship between AFC Ajax (amateurs) and Sportpark De Toekomst.\n* Ajax Youth Academy is the subject of the second triplet, and playsAt is the predicate that describes the relationship between Ajax Youth Academy and Sportpark De Toekomst.\n\nNote that there may be other possible RDF triplets that could be derived from the input sentence, but the above triplets capture the main relationships present in the sentence."],
  ['Midsummer House is a moderately priced Chinese restaurant with a 3/5 customer rating, located near All Bar One.',
   ' Sure! Here\'s a sentence that describes all the data you provided:\n\n"Midsummer House is a moderately priced Chinese restaurant with a customer rating of 3 out of 5, located near All Bar One, offering a variety of delicious dishes."']],
 'ranks': [[1, 2], [1, 2]]}

Example of use (Quality):

From an LLM:

from distilabel.tasks.preference.evol_scorer import EvolQualityScorerTask

llm = OpenAILLM(
    EvolQualityScorerTask(),
    openai_api_key=os.getenv("OPENAI_API_KEY", None),
    temperature=0.3
)
res2 = llm.generate([{"input": subset[0]["input"], "generations": subset[0]["generations"]}])
# >>> print(res2[0][0]["raw_output"])
# [Response 1] Score: 2
# [Response 2] Score: 3
# >>> print(res2[0][0]["parsed_output"])
# {'ranks': [2, 3]}

From a Pipeline

>>> pipe = Pipeline(labeller=llm)
>>> new_ds = pipe.generate(subset)
>>> new_ds.select_columns(["input", "generations", "ranks"])[:]
{'input': ["You will be given a definition of a task first, then some input of the task.\nThis task is about using the specified sentence and converting the sentence to Resource Description Framework (RDF) triplets of the form (subject, predicate object). The RDF triplets generated must be such that the triplets accurately capture the structure and semantics of the input sentence. The input is a sentence and the output is a list of triplets of the form [subject, predicate, object] that capture the relationships present in the sentence. When a sentence has more than 1 RDF triplet possible, the output must contain all of them.\n\nAFC Ajax (amateurs)'s ground is Sportpark De Toekomst where Ajax Youth Academy also play.\nOutput:",
  'Generate an approximately fifteen-word sentence that describes all this data: Midsummer House eatType restaurant; Midsummer House food Chinese; Midsummer House priceRange moderate; Midsummer House customer rating 3 out of 5; Midsummer House near All Bar One'],
 'generations': [['[\n  ["AFC Ajax (amateurs)", "has ground", "Sportpark De Toekomst"],\n  ["Ajax Youth Academy", "plays at", "Sportpark De Toekomst"]\n]',
   " Sure, I'd be happy to help! Here are the RDF triplets for the input sentence:\n\n[AFC Ajax (amateurs), hasGround, Sportpark De Toekomst]\n[Ajax Youth Academy, playsAt, Sportpark De Toekomst]\n\nExplanation:\n\n* AFC Ajax (amateurs) is the subject of the first triplet, and hasGround is the predicate that describes the relationship between AFC Ajax (amateurs) and Sportpark De Toekomst.\n* Ajax Youth Academy is the subject of the second triplet, and playsAt is the predicate that describes the relationship between Ajax Youth Academy and Sportpark De Toekomst.\n\nNote that there may be other possible RDF triplets that could be derived from the input sentence, but the above triplets capture the main relationships present in the sentence."],
  ['Midsummer House is a moderately priced Chinese restaurant with a 3/5 customer rating, located near All Bar One.',
   ' Sure! Here\'s a sentence that describes all the data you provided:\n\n"Midsummer House is a moderately priced Chinese restaurant with a customer rating of 3 out of 5, located near All Bar One, offering a variety of delicious dishes."']],
 'ranks': [[2, 3], [2, 3]]}

The dataset just has the appropriate format for the pipeline, the content and results aren't important

plaguss · 2024-01-26T11:11:12Z

Related to #299, merging after that one will yield 2 of the steps from the Deita framework. The to_argilla functionality "works", but will be tested with an appropriate dataset for the tasks

dvsrepo · 2024-01-26T14:17:40Z

What's not so clear/convincing to me from the paper and this implementation is they mix ranking and rating. What's used at the end of the process are the ratings (scores) right?

If so I'd recommend to not add a new thing (ranking) that mixes both rating and ranking, reading the values can be quite confusing because I don't know if 1 and 2 are positions, rating, both?

If we get the ratings I would keep that in the ratings column (like any other preference task) what do you think?

dvsrepo · 2024-01-26T14:18:49Z

Also we will soon add PairRM (see llmblender) and those are real rankings: a list of positions (with no rating) so it might get confusing

plaguss · 2024-01-26T15:27:39Z

What's not so clear/convincing to me from the paper and this implementation is they mix ranking and rating. What's used at the end of the process are the ratings (scores) right?

That's true

If so I'd recommend to not add a new thing (ranking) that mixes both rating and ranking, reading the values can be quite confusing because I don't know if 1 and 2 are positions, rating, both?

If we get the ratings I would keep that in the ratings column (like any other preference task) what do you think?

I agree, will update it so that it works like other preference tasks but without the rationale, and also the wording, even if we make a clear distinction than the one from the paper, assuming that for us those are ratings.

…tionale

plaguss · 2024-01-26T16:38:43Z

New version ready. Now it's defined in terms of a special PreferenceTaskNoRationale (the name says it all), and the code has been simplified. Also every mention to "rank" was removed from the docs, we talk about ratings. Will review it on monday. Happy weekend!

…feat/evol-complexity-scorer

gabrielmbmb

LGTM! I think it would be better for the tasks to be called just ComplexityScorerTask and QualityScorerTask, as they can also be used with not evolved instructions or responses.

dvsrepo · 2024-01-30T15:19:47Z

LGTM! I think it would be better for the tasks to be called just ComplexityScorerTask and QualityScorerTask, as they can also be used with not evolved instructions or responses.

I agree 👍

…feat/evol-complexity-scorer

Add draft for EvolComplexityScorer task from deita

78cd268

plaguss requested review from dvsrepo and davidberenstein1957 January 25, 2024 16:10

Add draft for EvolQualityScorer task

68deec0

plaguss changed the title ~~Add EvolComplexityScorer task from deita~~ Add EvolComplexityScorer and EvolQualityScorer tasks from Deita Jan 25, 2024

plaguss added 5 commits January 25, 2024 17:51

Add proper parsing for quality case

dc01aa7

Add tests for the task functionality

a041adf

Create mixin for the common functionality of the argilla transformation

906e5cc

Add working version of evol complexity and evol quality

58bb637

Include new tasks in the docs

d73c0a1

plaguss marked this pull request as ready for review January 26, 2024 11:08

plaguss self-assigned this Jan 26, 2024

plaguss added 2 commits January 26, 2024 17:28

Refactor to work with a more standard preference task, but without ra…

bee164c

…tionale

Refactor docs

1f6f8a0

Merge branch 'main' of https://github.com/argilla-io/distilabel into …

eab565e

…feat/evol-complexity-scorer

gabrielmbmb reviewed Jan 30, 2024

View reviewed changes

plaguss added 2 commits January 30, 2024 16:36

Merge branch 'main' of https://github.com/argilla-io/distilabel into …

aa039ad

…feat/evol-complexity-scorer

Refactor task names

f82cf2d

plaguss merged commit b157f80 into main Jan 31, 2024
4 checks passed

plaguss deleted the feat/evol-complexity-scorer branch January 31, 2024 07:44

dvsrepo changed the title ~~Add EvolComplexityScorer and EvolQualityScorer tasks from Deita~~ Add ComplexityScorer and QualityScorer tasks from Deita Jan 31, 2024

This was referenced Jan 31, 2024

[FEATURE] Add DeitaPreferenceTask #282

Closed

Add examples for the deita paper tasks #329

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ComplexityScorer` and `QualityScorer` tasks from Deita #302

Add `ComplexityScorer` and `QualityScorer` tasks from Deita #302

plaguss commented Jan 25, 2024 •

edited

plaguss commented Jan 26, 2024

dvsrepo commented Jan 26, 2024

dvsrepo commented Jan 26, 2024

plaguss commented Jan 26, 2024

plaguss commented Jan 26, 2024

gabrielmbmb left a comment

dvsrepo commented Jan 30, 2024

Add ComplexityScorer and QualityScorer tasks from Deita #302

Add ComplexityScorer and QualityScorer tasks from Deita #302

Conversation

plaguss commented Jan 25, 2024 • edited

Description

plaguss commented Jan 26, 2024

dvsrepo commented Jan 26, 2024

dvsrepo commented Jan 26, 2024

plaguss commented Jan 26, 2024

plaguss commented Jan 26, 2024

gabrielmbmb left a comment

Choose a reason for hiding this comment

dvsrepo commented Jan 30, 2024

Add `ComplexityScorer` and `QualityScorer` tasks from Deita #302

Add `ComplexityScorer` and `QualityScorer` tasks from Deita #302

plaguss commented Jan 25, 2024 •

edited