Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance augment function #531

Merged
merged 14 commits into from Nov 5, 2021
Merged

Enhance augment function #531

merged 14 commits into from Nov 5, 2021

Conversation

Hanyu-Liu-123
Copy link
Collaborator

@Hanyu-Liu-123 Hanyu-Liu-123 commented Sep 28, 2021

What does this PR do?

Summary

This PR introduces 2 new augmenter parameters, high_yield and fast_augment. The high_yield option was originally implemented in pull request #507 that still requires additional implementation before merging.

When high_yield is set to True, every augmentation that fits the criteria of a successful transformation will be added to the final output. In most cases, the high-yield augmenter will generate far more augmentations than what users specify in transformations_per_example.

When fast_augment is set to True, the augmenter terminate and return transformations_per_example number of transformations when the number of successful augmentations reaches transformations_per_example.

This improves the running time of the augmenter but may cause skewness in returned augmentations (speed is improved via early stop).

Additions

  • Added high_yield and fast_augment parameters in augmenter

Changes

  • Changed the augment function, augmenter parser, and augment_command

Checklist

    • The title of your pull request should be a summary of its contribution.
    • Please write a detailed description of what parts have been newly added and what parts have been modified. Please also explain why certain changes were made.
  • [ ] If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked (and people consulting the issue know you are working on it)
    • To indicate a work in progress please mark it as a draft on Github.
    • Make sure existing tests pass.
  • [ ] Add relevant tests. No quality testing = no merge.
  • [ ] All public methods must have informative docstrings that work nicely with sphinx. For new modules/files, please add/modify the appropriate .rst file in TextAttack/docs/apidoc.'

@qiyanjun qiyanjun mentioned this pull request Sep 28, 2021
4 tasks
@qiyanjun
Copy link
Member

@Hanyu-Liu-123 please consider using or improve the newly added metric module #514 to check the quality of the augmented samples...

@Hanyu-Liu-123 Hanyu-Liu-123 marked this pull request as ready for review September 30, 2021 03:06
@Hanyu-Liu-123
Copy link
Collaborator Author

@Hanyu-Liu-123 please consider using or improve the newly added metric module #514 to check the quality of the augmented samples...

I'll look into the metric module. I think the USE and perplexity metrics can be really helpful if implemented to the augmenter.

@Hanyu-Liu-123 Hanyu-Liu-123 added the augmentation issues related to augmentation label Sep 30, 2021
@Hanyu-Liu-123
Copy link
Collaborator Author

Hi @alexander-zap , this pull request is a truncated version of https://gitlab.com/taforkacc/textattack. Because the original pull request requires extensive structure changes, we were unable to incorporate your full addition at this moment, but would definitely like to include the high_yield parameter in the augmenter.

At your convenience, could you review this pull request that adds the high_yield parameter to the augment() function? Thanks!

@qiyanjun
Copy link
Member

qiyanjun commented Oct 1, 2021

@Hanyu-Liu-123 please check out https://github.com/QData/TextAttack/blob/master/textattack/attack_results/attack_result.py to figure out how to use metric module

@qiyanjun
Copy link
Member

@Hanyu-Liu-123 please also add a test func in the test_augment_api

@Hanyu-Liu-123
Copy link
Collaborator Author

@Hanyu-Liu-123 please also add a test func in the test_augment_api
Will do today! I should also be able to add the metric model implementation into the augmenter later this week.

@alexander-zap
Copy link
Contributor

Hi @alexander-zap , this pull request is a truncated version of https://gitlab.com/taforkacc/textattack. Because the original pull request requires extensive structure changes, we were unable to incorporate your full addition at this moment, but would definitely like to include the high_yield parameter in the augmenter.

At your convenience, could you review this pull request that adds the high_yield parameter to the augment() function? Thanks!

@Hanyu-Liu-123 I reviewed the code. The usage of high_yield looks good to me 👍

@Hanyu-Liu-123
Copy link
Collaborator Author

Hi @alexander-zap , this pull request is a truncated version of https://gitlab.com/taforkacc/textattack. Because the original pull request requires extensive structure changes, we were unable to incorporate your full addition at this moment, but would definitely like to include the high_yield parameter in the augmenter.
At your convenience, could you review this pull request that adds the high_yield parameter to the augment() function? Thanks!

@Hanyu-Liu-123 I reviewed the code. The usage of high_yield looks good to me 👍

Thank you so much!

@qiyanjun
Copy link
Member

qiyanjun commented Nov 3, 2021

@Hanyu-Liu-123 please add docstring and testing code. Then it is ready to merge!

@Hanyu-Liu-123
Copy link
Collaborator Author

@qiyanjun Added the docstrings!

Here's an sample output when running in interactive mode:

>>> Enter a sentence to augment, "q" to quit, "c" to view/change arguments:

>>> I would love to go to Chile but the tickets are 500 dollars

Augmenting...

Augmentations:

I would love to go to Chile but the tickets are 212 dollars 

I would love to go to Chile but the tickets are 332 dollars 

I would love to go to Chile but the tickets are 772 dollars 

I would love to go to Chile but the tickets are 981 dollars 

I would love to go to Norway but the tickets are 500 dollars 


Average Original Perplexity Score: 109.42
Average Augment Perplexity Score: 140.97
Average Augment USE Score: 0.84

@Hanyu-Liu-123
Copy link
Collaborator Author

Slides explaining the limitation of fast-augment

high_yield: Whether to return a set of augmented texts that will be relatively similar, or to return only a
single one.
fast_augment: Stops additional transformation runs when number of successful augmentations reaches
transformations_per_example
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hanyu-Liu-123 I am a bit confused by the fast_augment tag...

  • If we already have the transformation_per_example argument, what is the purpose of fast_augment?
  • When fast_augment = true, we will generate more examples than the transformation_per_example argument??

text
for text in transformed_texts
if len(text.attack_attrs["modified_indices"])
>= num_words_to_swap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hanyu-Liu-123 I am confused by the num_words_to_swap use here... Does this specify lower_bound or upper_bound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the number_words_to_swap is lower bound.

@qiyanjun qiyanjun merged commit ef062e3 into master Nov 5, 2021
@qiyanjun
Copy link
Member

qiyanjun commented Nov 5, 2021

Slides explaining the limitation of fast-augment

F4B09038-97A5-4DC7-BE9F-1D0404276745

@qiyanjun qiyanjun deleted the enhance-augment-function branch November 5, 2021 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
augmentation issues related to augmentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants