Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add augmentation capabilties #250

Merged
merged 14 commits into from
Oct 18, 2023
Merged

Add augmentation capabilties #250

merged 14 commits into from
Oct 18, 2023

Conversation

yoavkatz
Copy link
Member

@yoavkatz yoavkatz commented Oct 9, 2023

Created support for augmentation in unitxt.

The use is be:

card=cards.sst,template_card_index=0,demos_pool_size=100,num_demos=0,augmentor=augmentors.augment_whitespace

Augmentors are FieldOperator over the 'source' field , and are applied after rendering is done.

Added a first augmentor called AugmentWhitespace that replaces each whitespace is 1-3 spaces, tabs or new lines.

Also improved some error messages.

New augmentor option to standard recipe and new augmentor that augments whitespace

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
@codecov
Copy link

codecov bot commented Oct 9, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Files Coverage Δ
prepare/augmentors/augment_whitespace.py 100.00% <100.00%> (ø)
prepare/augmentors/no_augmentation.py 100.00% <100.00%> (ø)
prepare/cards/sst2.py 100.00% <ø> (ø)
src/unitxt/artifact.py 86.66% <100.00%> (+0.06%) ⬆️
src/unitxt/dataset.py 66.03% <100.00%> (ø)
src/unitxt/standard.py 95.12% <100.00%> (+0.38%) ⬆️
src/unitxt/task.py 88.67% <100.00%> (+0.92%) ⬆️
src/unitxt/templates.py 85.71% <100.00%> (ø)
tests/test_operators.py 100.00% <100.00%> (ø)
tests/test_recipe.py 100.00% <100.00%> (ø)
... and 2 more

... and 1 file with indirect coverage changes

📢 Thoughts on this report? Let us know!.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
yoavkatz and others added 9 commits October 11, 2023 13:42
Signed-off-by: Yoav Katz <katz@il.ibm.com>
One can now defined if to augment specific input fields (defined in FormTask's new augmentable fields) or the complete model input after rendering.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Solve precommit end of file check failures

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
@yoavkatz
Copy link
Member Author

The API was changed to support two ways of augmenting: the full input provided to the model, or only specific input fields defined in the task (in a new optional field called "argumentable_inputs".

"augmentors.augment_whitespace_model_input" vs "augmentors.augment_whitespace_task_input"

@OfirArviv
Copy link
Collaborator

augmentors.augment_whitespace_task_input

can you give an example? It seems like its the same field and not a new field

inputs=["choices", "sentence"],
outputs=["label"],
metrics=["metrics.accuracy"],
inputs=["choices", "sentence"], outputs=["label"], metrics=["metrics.accuracy"], augmentable_inputs=["sentence"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the augmentation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two modes of augmentation. One where the entire input provide to the model is augmented (called model input augmentation) . This includes the any system prompt, instructions or demonstrations. The second type of augmentation (called task input augmentation) , augments only specific fields in the task's input. Augmentable inputs are the subsets of the task's input field that undergo augmentation. For example, in sst , only the 'sentence' input is augmentwed and not the "choices", which is fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but is it whitespace augmentation? where is it stated?

@@ -9,5 +9,8 @@
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the augmentation type?

@@ -0,0 +1,4 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will it look if its outside code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The augmentor object has two booleans "augment_model_input" and "augment_task_input". Only one can be set (this is verified in the code). The standard template passes the needed input (model_input, ot task_input) to the augmentor , depending on these booleans. The augmentor changes the relevant text.

In the catalog we save the augmentor , including the value of these flags, which determine it's behavior.

The users of the standard template, just calls with "augmentor=augmentors.augment_whitespace_model_input" or "augmentor=augmentors.augment_whitespace_model_input", and this creates the relevant augmentor based.

@@ -11,6 +11,13 @@ class FormTask(Tasker, StreamInstanceOperator):
inputs: List[str]
outputs: List[str]
metrics: List[str]
augmentable_inputs: List[str] = []

def verify(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain, could it not be its not the same?

tests/test_operators.py Show resolved Hide resolved
inputs=["choices", "sentence"],
outputs=["label"],
metrics=["metrics.accuracy"],
inputs=["choices", "sentence"], outputs=["label"], metrics=["metrics.accuracy"], augmentable_inputs=["sentence"]
)
add_to_catalog(one_sentence_classification_task, "tasks.one_sent_classification", overwrite=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

@yoavkatz yoavkatz merged commit 0bc8737 into main Oct 18, 2023
5 checks passed
@yoavkatz yoavkatz deleted the add_augmentation_capabilties branch October 18, 2023 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants