Unitxt 1.8.0
What's Changed
In this release, the main improvement focuses on introducing type checking within Unitxt tasks. Tasks are fundamental to the Unitxt protocol, acting as standardized blueprints for those integrating new datasets into Unitxt. They facilitate the use of task-specific templates and metrics. To guarantee precise dataset processing in line with the task schema, we've introduced explicit types to the task fields.
For example, consider the NER task in Unitxt, previously defined as follows:
add_to_catalog(
FormTask(
inputs=["text", "entity_types"],
outputs=["spans_starts", "spans_ends", "text", "labels"],
metrics=["metrics.ner"],
),
"tasks.ner",
)
Now, the NER task definition includes explicit types:
add_to_catalog(
FormTask(
inputs={"text": "str", "entity_types": "List[str]"},
outputs={
"spans_starts": "List[int]",
"spans_ends": "List[int]",
"text": "List[str]",
"labels": "List[str]",
},
prediction_type="List[Tuple[str,str]]",
metrics=["metrics.ner"],
),
"tasks.ner",
)
This enhancement aligns with Unitxt's goal that definitions should be easily understandable and capable of facilitating validation processes with appropriate error messages to guide developers in identifying and solving issues.
Right now , using the original definition format without typing , will continue to work but generate a warning message. You should begin to adapt your tasks definition by adding types.
'inputs' field of Task should be a dictionary of field names and their types. For example, {'text': 'str', 'classes': 'List[str]'}. Instead only '['question', 'question_id', 'topic']' was passed. All types will be assumed to be 'Any'. In future version of unitxt this will raise an exception.
'outputs' field of Task should be a dictionary of field names and their types. For example, {'text': 'str', 'classes': 'List[str]'}. Instead only '['reference_answers', 'reference_contexts', 'reference_context_ids', 'is_answerable_label']' was passed. All types will be assumed to be 'Any'. In future version of unitxt this will raise an exception.
Special thanks to @pawelknes who implemented this important feature. It truly demonstrates the collective power of the Unitxt community and the invaluable contributions made by Unitxt users beyond the core development team. Such contributions are highly appreciated and encouraged.
- For more detailed information, please refer to #710
Breaking Changes
"metrics.spearman", "metrics.kendalltau_b", "metrics.roc_auc": prediction type is float.
"metrics.f1_binary","metrics.accuracy_binary", "metrics.precision_binary", "metrics.recall_binary", "metrics.max_f1_binary", "metrics.max_accuracy_binary": prediction type is Union[float, int], references must be equal to 0 or 1
Bug Fixes
- Set empty list if preprocess_steps is None by @marukaz in #780
- Fix UI load failure due to typo by @yoavkatz in #785
- Fix huggingface uploads by @elronbandel in #793
- Fix typo in error message by @marukaz in #777
New Assets
- add perplexity with Mistral model by @lilacheden in #713
New Features
- Type checking for task definition by @pawelknes in #710
- Add open and ibm_genai to llm as judge inference engine by @OfirArviv in #782
- Add negative class score for binary precision, recall, f1 and max f1 by @lilacheden in #788
- Add negative class score for binary precision, recall, f1 and max f1, e.g. f1_binary now returns also "f1_binary_neg".
- Support Unions in metric prediction_type
- Add processor cast_to_float_return_nan_if_failed
- Breaking change: Make prediction_type of metrics numeric:
A. "metrics.kendalltau_b", "metrics.roc_auc": prediction type is float.
B. "metrics.f1_binary","metrics.accuracy_binary", "metrics.precision_binary", "metrics.recall_binary", "metrics.max_f1_binary", "metrics.max_accuracy_binary": prediction type is Union[float, int], references must be equal to 0 or 1
- Group shuffle by @sam-data-guy-iam in #639
Documentation
- Fix a small typo by @dafnapension in #779
- Update instructions to install HELM from PyPI by @yifanmai in #783
- Update few-shot instructions in Unitxt with HELM by @yifanmai in #774
Full Changelog: 1.7.7...1.8.0
Full Changelog: 1.8.1...1.8.0