-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add augmentation capabilties #250
Conversation
New augmentor option to standard recipe and new augmentor that augments whitespace Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Codecov ReportAttention:
... and 1 file with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
One can now defined if to augment specific input fields (defined in FormTask's new augmentable fields) or the complete model input after rendering. Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Solve precommit end of file check failures Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
The API was changed to support two ways of augmenting: the full input provided to the model, or only specific input fields defined in the task (in a new optional field called "argumentable_inputs". "augmentors.augment_whitespace_model_input" vs "augmentors.augment_whitespace_task_input" |
can you give an example? It seems like its the same field and not a new field |
inputs=["choices", "sentence"], | ||
outputs=["label"], | ||
metrics=["metrics.accuracy"], | ||
inputs=["choices", "sentence"], outputs=["label"], metrics=["metrics.accuracy"], augmentable_inputs=["sentence"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the augmentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two modes of augmentation. One where the entire input provide to the model is augmented (called model input augmentation) . This includes the any system prompt, instructions or demonstrations. The second type of augmentation (called task input augmentation) , augments only specific fields in the task's input. Augmentable inputs are the subsets of the task's input field that undergo augmentation. For example, in sst , only the 'sentence' input is augmentwed and not the "choices", which is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but is it whitespace augmentation? where is it stated?
@@ -9,5 +9,8 @@ | |||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the augmentation type?
@@ -0,0 +1,4 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will it look if its outside code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The augmentor object has two booleans "augment_model_input" and "augment_task_input". Only one can be set (this is verified in the code). The standard template passes the needed input (model_input, ot task_input) to the augmentor , depending on these booleans. The augmentor changes the relevant text.
In the catalog we save the augmentor , including the value of these flags, which determine it's behavior.
The users of the standard template, just calls with "augmentor=augmentors.augment_whitespace_model_input" or "augmentor=augmentors.augment_whitespace_model_input", and this creates the relevant augmentor based.
@@ -11,6 +11,13 @@ class FormTask(Tasker, StreamInstanceOperator): | |||
inputs: List[str] | |||
outputs: List[str] | |||
metrics: List[str] | |||
augmentable_inputs: List[str] = [] | |||
|
|||
def verify(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain, could it not be its not the same?
inputs=["choices", "sentence"], | ||
outputs=["label"], | ||
metrics=["metrics.accuracy"], | ||
inputs=["choices", "sentence"], outputs=["label"], metrics=["metrics.accuracy"], augmentable_inputs=["sentence"] | ||
) | ||
add_to_catalog(one_sentence_classification_task, "tasks.one_sent_classification", overwrite=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
Created support for augmentation in unitxt.
The use is be:
card=cards.sst,template_card_index=0,demos_pool_size=100,num_demos=0,augmentor=augmentors.augment_whitespace
Augmentors are FieldOperator over the 'source' field , and are applied after rendering is done.
Added a first augmentor called AugmentWhitespace that replaces each whitespace is 1-3 spaces, tabs or new lines.
Also improved some error messages.