Add augmentation capabilties #250

yoavkatz · 2023-10-09T14:11:41Z

Created support for augmentation in unitxt.

The use is be:

card=cards.sst,template_card_index=0,demos_pool_size=100,num_demos=0,augmentor=augmentors.augment_whitespace

Augmentors are FieldOperator over the 'source' field , and are applied after rendering is done.

Added a first augmentor called AugmentWhitespace that replaces each whitespace is 1-3 spaces, tabs or new lines.

Also improved some error messages.

New augmentor option to standard recipe and new augmentor that augments whitespace Signed-off-by: Yoav Katz <katz@il.ibm.com>

Signed-off-by: Yoav Katz <katz@il.ibm.com>

codecov · 2023-10-09T14:22:49Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Files	Coverage Δ
prepare/augmentors/augment_whitespace.py	`100.00% <100.00%> (ø)`
prepare/augmentors/no_augmentation.py	`100.00% <100.00%> (ø)`
prepare/cards/sst2.py	`100.00% <ø> (ø)`
src/unitxt/artifact.py	`86.66% <100.00%> (+0.06%)`	⬆️
src/unitxt/dataset.py	`66.03% <100.00%> (ø)`
src/unitxt/standard.py	`95.12% <100.00%> (+0.38%)`	⬆️
src/unitxt/task.py	`88.67% <100.00%> (+0.92%)`	⬆️
src/unitxt/templates.py	`85.71% <100.00%> (ø)`
tests/test_operators.py	`100.00% <100.00%> (ø)`
tests/test_recipe.py	`100.00% <100.00%> (ø)`
... and 2 more

... and 1 file with indirect coverage changes

📢 Thoughts on this report? Let us know!.

Signed-off-by: Yoav Katz <katz@il.ibm.com>

One can now defined if to augment specific input fields (defined in FormTask's new augmentable fields) or the complete model input after rendering. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Solve precommit end of file check failures Signed-off-by: Yoav Katz <katz@il.ibm.com>

Signed-off-by: Yoav Katz <katz@il.ibm.com>

…bilties

Signed-off-by: Yoav Katz <katz@il.ibm.com>

yoavkatz · 2023-10-14T16:58:49Z

The API was changed to support two ways of augmenting: the full input provided to the model, or only specific input fields defined in the task (in a new optional field called "argumentable_inputs".

"augmentors.augment_whitespace_model_input" vs "augmentors.augment_whitespace_task_input"

OfirArviv · 2023-10-16T08:56:04Z

augmentors.augment_whitespace_task_input

can you give an example? It seems like its the same field and not a new field

OfirArviv · 2023-10-16T09:55:57Z

prepare/cards/sst2.py

-    inputs=["choices", "sentence"],
-    outputs=["label"],
-    metrics=["metrics.accuracy"],
+    inputs=["choices", "sentence"], outputs=["label"], metrics=["metrics.accuracy"], augmentable_inputs=["sentence"]


what is the augmentation?

There are two modes of augmentation. One where the entire input provide to the model is augmented (called model input augmentation) . This includes the any system prompt, instructions or demonstrations. The second type of augmentation (called task input augmentation) , augments only specific fields in the task's input. Augmentable inputs are the subsets of the task's input field that undergo augmentation. For example, in sst , only the 'sentence' input is augmentwed and not the "choices", which is fixed.

yes, but is it whitespace augmentation? where is it stated?

OfirArviv · 2023-10-16T09:57:07Z

src/unitxt/catalog/tasks/one_sent_classification.json

@@ -9,5 +9,8 @@
    ],


what is the augmentation type?

OfirArviv · 2023-10-16T09:57:22Z

src/unitxt/catalog/augmentors/augment_whitespace_task_input.json

@@ -0,0 +1,4 @@
+{


how will it look if its outside code?

The augmentor object has two booleans "augment_model_input" and "augment_task_input". Only one can be set (this is verified in the code). The standard template passes the needed input (model_input, ot task_input) to the augmentor , depending on these booleans. The augmentor changes the relevant text.

In the catalog we save the augmentor , including the value of these flags, which determine it's behavior.

The users of the standard template, just calls with "augmentor=augmentors.augment_whitespace_model_input" or "augmentor=augmentors.augment_whitespace_model_input", and this creates the relevant augmentor based.

OfirArviv · 2023-10-16T09:57:38Z

src/unitxt/task.py

@@ -11,6 +11,13 @@ class FormTask(Tasker, StreamInstanceOperator):
    inputs: List[str]
    outputs: List[str]
    metrics: List[str]
+    augmentable_inputs: List[str] = []
+
+    def verify(self):


please explain, could it not be its not the same?

tests/test_operators.py

OfirArviv · 2023-10-18T08:17:01Z

prepare/cards/sst2.py

-    inputs=["choices", "sentence"],
-    outputs=["label"],
-    metrics=["metrics.accuracy"],
+    inputs=["choices", "sentence"], outputs=["label"], metrics=["metrics.accuracy"], augmentable_inputs=["sentence"]
 )
 add_to_catalog(one_sentence_classification_task, "tasks.one_sent_classification", overwrite=True)


yoavkatz added 2 commits October 9, 2023 17:07

Added augmentation capabilities

68c3bbf

New augmentor option to standard recipe and new augmentor that augments whitespace Signed-off-by: Yoav Katz <katz@il.ibm.com>

Improved error message report

028ace7

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Formatting

b422f70

Signed-off-by: Yoav Katz <katz@il.ibm.com>

yoavkatz requested review from elronbandel and OfirArviv October 11, 2023 09:12

yoavkatz and others added 9 commits October 11, 2023 13:42

Addded missing whitespace at the end

4ab6c42

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Changed augmentor to allow working on inputs

c440a37

One can now defined if to augment specific input fields (defined in FormTask's new augmentable fields) or the complete model input after rendering. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Added null augmentor

9081e7c

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Added newline at end of saved artificats

9fdfcb9

Solve precommit end of file check failures Signed-off-by: Yoav Katz <katz@il.ibm.com>

Added documentation and unit tests

93b30c9

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Merge remote-tracking branch 'origin/main' into add_augmentation_capa…

3c8b09b

…bilties

Added test for standard recipe with augmentor

50267b8

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Merge branch 'main' into add_augmentation_capabilties

6999b44

Added another unit test

001ff19

Signed-off-by: Yoav Katz <katz@il.ibm.com>

OfirArviv reviewed Oct 16, 2023

View reviewed changes

Merge branch 'main' into add_augmentation_capabilties

4e931b6

OfirArviv requested changes Oct 18, 2023

View reviewed changes

tests/test_operators.py Show resolved Hide resolved

OfirArviv reviewed Oct 18, 2023

View reviewed changes

Merge branch 'main' into add_augmentation_capabilties

700f6ef

OfirArviv approved these changes Oct 18, 2023

View reviewed changes

yoavkatz merged commit 0bc8737 into main Oct 18, 2023
5 checks passed

yoavkatz deleted the add_augmentation_capabilties branch October 18, 2023 08:51

yoavkatz mentioned this pull request Nov 29, 2023

Add new Augmentor to add random suffix to text #342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add augmentation capabilties #250

Add augmentation capabilties #250

yoavkatz commented Oct 9, 2023

codecov bot commented Oct 9, 2023 •

edited

Loading

yoavkatz commented Oct 14, 2023

OfirArviv commented Oct 16, 2023

OfirArviv Oct 16, 2023

yoavkatz Oct 16, 2023

OfirArviv Oct 17, 2023

OfirArviv Oct 16, 2023

OfirArviv Oct 16, 2023

yoavkatz Oct 16, 2023

OfirArviv Oct 16, 2023

OfirArviv Oct 18, 2023

Add augmentation capabilties #250

Add augmentation capabilties #250

Conversation

yoavkatz commented Oct 9, 2023

codecov bot commented Oct 9, 2023 • edited Loading

Codecov Report

yoavkatz commented Oct 14, 2023

OfirArviv commented Oct 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 9, 2023 •

edited

Loading