Ability to dynamically change InstanceMetric inputs + grammar metrics #736

arielge · 2024-03-31T08:20:50Z

No description provided.

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>

…ce metric Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>

codecov · 2024-03-31T08:26:57Z

Codecov Report

Attention: Patch coverage is 86.11111% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 91.03%. Comparing base (b4bdbbe) to head (cf6520e).

Files	Patch %	Lines
src/unitxt/metrics.py	81.48%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #736      +/-   ##
==========================================
+ Coverage   91.00%   91.03%   +0.02%     
==========================================
  Files          98       98              
  Lines        9733     9760      +27     
==========================================
+ Hits         8858     8885      +27     
  Misses        875      875

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>

yoavkatz · 2024-03-31T10:24:49Z

Ariel, I'm not sure about the generic part change. We have an alternative mechanism for this of using MetricPipeline , which can map the field to the reference fields.

metric = MetricPipeline(
    main_score="score",
    preprocess_steps=[
        CopyFields(
            field_to_field=[("task_data/context", "references")], use_query=True
        ),
        ListFieldValues(fields=["references"], to_field="references"),
    ],
    metric=metrics["metrics.token_overlap"],
)
add_to_catalog(metric, "metrics.token_overlap_with_context", overwrite=True)

arielge · 2024-03-31T10:41:01Z

@yoavkatz The task fields are still shared across all metrics. But there are many use cases - specifically, reference-less ones - where you want to apply an existing metric but use a different field (an input field, which is still one of the task fields). I don't think it makes sense to create a new metric for every possible field that one would want to pass.
@elronbandel

yoavkatz · 2024-03-31T15:22:30Z

I see the point. Still, some points to consider:

There is an implicit conversion to list - which will be hard to keep track of. And will cause problems when passing references which are lists (e.g. when the "references" are a list of of context_ids, for example).
It only implemented in instance metrics and no other type of metrics.

I tend to think we while we simplified the particular use, we complicated the overall API, because there is now a new way of doing things (which does not replace the old way, and is also less explicit).

We will soon add type checking for references and predictions in metrics and this could potentially cause type mismatch errors.

arielge · 2024-03-31T16:10:00Z

So I do not have the full picture, but regarding both 2 and 3 my feeling is that the real underlying issue is that the different types of metric classes do not really share methods, which is something that should probably be improved regardless of this specific question. Once the different types of metrics have more methods in common, things like the field modification I did here and the type checking can both happen in a more centralized place, which will solve most of the issues you mention.

I do understand what you are saying about complicating the API. But since tasks -- even similar ones -- often differ in the name of their input fields, my feeling is that the cost of having a separate metric for every possible field name will very quickly blow out of proportion, which is also a serious problem. In other words, the API may be simpler but the catalog will be more complicated.

yoavkatz · 2024-04-01T06:54:11Z

@arielge - it think it's not only a matter of which filed to use We should also give clear score names

right now we

char_edit_distance
chat_edit_distance_accuarcy

But the second needs to be something like

char_edit_distance_from_original_text

to be differentiated .

arielge · 2024-04-01T09:12:11Z

I agree, this is what I was aiming for with this issue #737.
Now, if this is done not via overwrite, then you will need to add a metric to the catalog for char_edit_distance_from_original_text, and for another task you would add char_edit_distance_from_input, and for another you would add char_edit_distance_from_document, when they all do exactly the same calculation.

yoavkatz · 2024-04-01T10:00:48Z

I see. Today we have the explicit way of doing this:

metric = MetricPipeline(
main_score="score",
preprocess_steps=[
ListFieldsValue(fields=["task_data/original_text"], to_field="references", use_query=True)
],
metric=metrics["metrics.char_edit_distance"],
postpreprocess_steps=[
RenameFields(field="score/global/chat_edit_distance",to_field="score/global/chat_edir_distance_from_original_input"
use_query=True)
],
)
add_to_catalog(metric, "metrics.char_edit_distance_from_original_text", overwrite=True)

I agree it would be nicer to have it shorter.

metrics.char_edit_distance[reference_field=original_text, score_name=char_edit_distance_from_original_input]

We need to make sure:

We always encapsulate reference_field in a list - to get consistent behavior. If we have need multiple references, we should have a explicit option like:

metrics.char_edit_distance[multiple_references_field=original_text, score_name=char_edit_distance_from_original_input]

Add the functionality to the base Metric class.

arielge · 2024-04-01T13:54:35Z

That sounds good to me

arielge added 4 commits March 31, 2024 10:51

Add grammar error detection task variant

ae26796

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>

Absoulte edit distance metric

79a967c

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>

Enable passing different fields as references/prediction to an instan…

3d9fca2

…ce metric Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>

Update error correction task metrics

4d5f90c

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>

Test for modifying metric fields

cf6520e

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>

elronbandel approved these changes Mar 31, 2024

View reviewed changes

elronbandel merged commit 9c47c1b into main Mar 31, 2024
7 of 8 checks passed

yoavkatz mentioned this pull request Apr 2, 2024

Type checking for task definition #710

Merged

elronbandel deleted the dynamic_fields branch April 8, 2024 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to dynamically change InstanceMetric inputs + grammar metrics #736

Ability to dynamically change InstanceMetric inputs + grammar metrics #736

arielge commented Mar 31, 2024

codecov bot commented Mar 31, 2024 •

edited

Loading

yoavkatz commented Mar 31, 2024 •

edited

Loading

arielge commented Mar 31, 2024

yoavkatz commented Mar 31, 2024 •

edited

Loading

arielge commented Mar 31, 2024

yoavkatz commented Apr 1, 2024

arielge commented Apr 1, 2024

yoavkatz commented Apr 1, 2024

arielge commented Apr 1, 2024

Ability to dynamically change InstanceMetric inputs + grammar metrics #736

Ability to dynamically change InstanceMetric inputs + grammar metrics #736

Conversation

arielge commented Mar 31, 2024

codecov bot commented Mar 31, 2024 • edited Loading

Codecov Report

yoavkatz commented Mar 31, 2024 • edited Loading

arielge commented Mar 31, 2024

yoavkatz commented Mar 31, 2024 • edited Loading

arielge commented Mar 31, 2024

yoavkatz commented Apr 1, 2024

arielge commented Apr 1, 2024

yoavkatz commented Apr 1, 2024

arielge commented Apr 1, 2024

codecov bot commented Mar 31, 2024 •

edited

Loading

yoavkatz commented Mar 31, 2024 •

edited

Loading

yoavkatz commented Mar 31, 2024 •

edited

Loading