Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to dynamically change InstanceMetric inputs + grammar metrics #736

Merged
merged 5 commits into from
Mar 31, 2024

Conversation

arielge
Copy link
Member

@arielge arielge commented Mar 31, 2024

No description provided.

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>
Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>
…ce metric

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>
Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>
Copy link

codecov bot commented Mar 31, 2024

Codecov Report

Attention: Patch coverage is 86.11111% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 91.03%. Comparing base (b4bdbbe) to head (cf6520e).

Files Patch % Lines
src/unitxt/metrics.py 81.48% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #736      +/-   ##
==========================================
+ Coverage   91.00%   91.03%   +0.02%     
==========================================
  Files          98       98              
  Lines        9733     9760      +27     
==========================================
+ Hits         8858     8885      +27     
  Misses        875      875              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Ariel Gera <ariel.gera1@ibm.com>
@yoavkatz
Copy link
Member

yoavkatz commented Mar 31, 2024

Ariel, I'm not sure about the generic part change. We have an alternative mechanism for this of using MetricPipeline , which can map the field to the reference fields.

metric = MetricPipeline(
    main_score="score",
    preprocess_steps=[
        CopyFields(
            field_to_field=[("task_data/context", "references")], use_query=True
        ),
        ListFieldValues(fields=["references"], to_field="references"),
    ],
    metric=metrics["metrics.token_overlap"],
)
add_to_catalog(metric, "metrics.token_overlap_with_context", overwrite=True)

@elronbandel elronbandel merged commit 9c47c1b into main Mar 31, 2024
7 of 8 checks passed
@arielge
Copy link
Member Author

arielge commented Mar 31, 2024

@yoavkatz The task fields are still shared across all metrics. But there are many use cases - specifically, reference-less ones - where you want to apply an existing metric but use a different field (an input field, which is still one of the task fields). I don't think it makes sense to create a new metric for every possible field that one would want to pass.
@elronbandel

@yoavkatz
Copy link
Member

yoavkatz commented Mar 31, 2024

I see the point. Still, some points to consider:

image

  1. There is an implicit conversion to list - which will be hard to keep track of. And will cause problems when passing references which are lists (e.g. when the "references" are a list of of context_ids, for example).

  2. It only implemented in instance metrics and no other type of metrics.

I tend to think we while we simplified the particular use, we complicated the overall API, because there is now a new way of doing things (which does not replace the old way, and is also less explicit).

  1. We will soon add type checking for references and predictions in metrics and this could potentially cause type mismatch errors.

@arielge
Copy link
Member Author

arielge commented Mar 31, 2024

So I do not have the full picture, but regarding both 2 and 3 my feeling is that the real underlying issue is that the different types of metric classes do not really share methods, which is something that should probably be improved regardless of this specific question. Once the different types of metrics have more methods in common, things like the field modification I did here and the type checking can both happen in a more centralized place, which will solve most of the issues you mention.

I do understand what you are saying about complicating the API. But since tasks -- even similar ones -- often differ in the name of their input fields, my feeling is that the cost of having a separate metric for every possible field name will very quickly blow out of proportion, which is also a serious problem. In other words, the API may be simpler but the catalog will be more complicated.

@yoavkatz
Copy link
Member

yoavkatz commented Apr 1, 2024

@arielge - it think it's not only a matter of which filed to use We should also give clear score names

right now we

char_edit_distance
chat_edit_distance_accuarcy

But the second needs to be something like

char_edit_distance_from_original_text

to be differentiated .

@arielge
Copy link
Member Author

arielge commented Apr 1, 2024

I agree, this is what I was aiming for with this issue #737.
Now, if this is done not via overwrite, then you will need to add a metric to the catalog for char_edit_distance_from_original_text, and for another task you would add char_edit_distance_from_input, and for another you would add char_edit_distance_from_document, when they all do exactly the same calculation.

@yoavkatz
Copy link
Member

yoavkatz commented Apr 1, 2024

I see. Today we have the explicit way of doing this:

metric = MetricPipeline(
main_score="score",
preprocess_steps=[
ListFieldsValue(fields=["task_data/original_text"], to_field="references", use_query=True)
],
metric=metrics["metrics.char_edit_distance"],
postpreprocess_steps=[
RenameFields(field="score/global/chat_edit_distance",to_field="score/global/chat_edir_distance_from_original_input"
use_query=True)
],
)
add_to_catalog(metric, "metrics.char_edit_distance_from_original_text", overwrite=True)

I agree it would be nicer to have it shorter.

metrics.char_edit_distance[reference_field=original_text, score_name=char_edit_distance_from_original_input]

We need to make sure:

  1. We always encapsulate reference_field in a list - to get consistent behavior. If we have need multiple references, we should have a explicit option like:

metrics.char_edit_distance[multiple_references_field=original_text, score_name=char_edit_distance_from_original_input]

  1. Add the functionality to the base Metric class.

@arielge
Copy link
Member Author

arielge commented Apr 1, 2024

That sounds good to me

@elronbandel elronbandel deleted the dynamic_fields branch April 8, 2024 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants