Enhance augment function #531

Hanyu-Liu-123 · 2021-09-28T05:16:45Z

What does this PR do?

Summary

This PR introduces 2 new augmenter parameters, high_yield and fast_augment. The high_yield option was originally implemented in pull request #507 that still requires additional implementation before merging.

When high_yield is set to True, every augmentation that fits the criteria of a successful transformation will be added to the final output. In most cases, the high-yield augmenter will generate far more augmentations than what users specify in transformations_per_example.

When fast_augment is set to True, the augmenter terminate and return transformations_per_example number of transformations when the number of successful augmentations reaches transformations_per_example.

This improves the running time of the augmenter but may cause skewness in returned augmentations (speed is improved via early stop).

Additions

Added high_yield and fast_augment parameters in augmenter

Changes

Changed the augment function, augmenter parser, and augment_command

Checklist

- The title of your pull request should be a summary of its contribution.
- Please write a detailed description of what parts have been newly added and what parts have been modified. Please also explain why certain changes were made.
[ ] If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked (and people consulting the issue know you are working on it)
- To indicate a work in progress please mark it as a draft on Github.
- Make sure existing tests pass.
[ ] Add relevant tests. No quality testing = no merge.
[ ] All public methods must have informative docstrings that work nicely with sphinx. For new modules/files, please add/modify the appropriate .rst file in TextAttack/docs/apidoc.'

qiyanjun · 2021-09-29T20:42:32Z

@Hanyu-Liu-123 please consider using or improve the newly added metric module #514 to check the quality of the augmented samples...

Hanyu-Liu-123 · 2021-09-30T04:05:17Z

@Hanyu-Liu-123 please consider using or improve the newly added metric module #514 to check the quality of the augmented samples...

I'll look into the metric module. I think the USE and perplexity metrics can be really helpful if implemented to the augmenter.

Hanyu-Liu-123 · 2021-09-30T04:47:52Z

Hi @alexander-zap , this pull request is a truncated version of https://gitlab.com/taforkacc/textattack. Because the original pull request requires extensive structure changes, we were unable to incorporate your full addition at this moment, but would definitely like to include the high_yield parameter in the augmenter.

At your convenience, could you review this pull request that adds the high_yield parameter to the augment() function? Thanks!

qiyanjun · 2021-10-01T18:15:39Z

@Hanyu-Liu-123 please check out https://github.com/QData/TextAttack/blob/master/textattack/attack_results/attack_result.py to figure out how to use metric module

Update

qiyanjun · 2021-10-14T20:43:30Z

@Hanyu-Liu-123 please also add a test func in the test_augment_api

Hanyu-Liu-123 · 2021-10-15T17:36:19Z

@Hanyu-Liu-123 please also add a test func in the test_augment_api
Will do today! I should also be able to add the metric model implementation into the augmenter later this week.

alexander-zap · 2021-10-26T15:02:38Z

Hi @alexander-zap , this pull request is a truncated version of https://gitlab.com/taforkacc/textattack. Because the original pull request requires extensive structure changes, we were unable to incorporate your full addition at this moment, but would definitely like to include the high_yield parameter in the augmenter.

At your convenience, could you review this pull request that adds the high_yield parameter to the augment() function? Thanks!

@Hanyu-Liu-123 I reviewed the code. The usage of high_yield looks good to me 👍

Hanyu-Liu-123 · 2021-10-28T18:44:15Z

Hi @alexander-zap , this pull request is a truncated version of https://gitlab.com/taforkacc/textattack. Because the original pull request requires extensive structure changes, we were unable to incorporate your full addition at this moment, but would definitely like to include the high_yield parameter in the augmenter.
At your convenience, could you review this pull request that adds the high_yield parameter to the augment() function? Thanks!

@Hanyu-Liu-123 I reviewed the code. The usage of high_yield looks good to me 👍

Thank you so much!

qiyanjun · 2021-11-03T14:31:48Z

@Hanyu-Liu-123 please add docstring and testing code. Then it is ready to merge!

Hanyu-Liu-123 · 2021-11-05T11:11:42Z

@qiyanjun Added the docstrings!

Here's an sample output when running in interactive mode:

>>> Enter a sentence to augment, "q" to quit, "c" to view/change arguments:

>>> I would love to go to Chile but the tickets are 500 dollars

Augmenting...

Augmentations:

I would love to go to Chile but the tickets are 212 dollars 

I would love to go to Chile but the tickets are 332 dollars 

I would love to go to Chile but the tickets are 772 dollars 

I would love to go to Chile but the tickets are 981 dollars 

I would love to go to Norway but the tickets are 500 dollars 


Average Original Perplexity Score: 109.42
Average Augment Perplexity Score: 140.97
Average Augment USE Score: 0.84

Hanyu-Liu-123 · 2021-11-05T18:20:03Z

Slides explaining the limitation of fast-augment

qiyanjun · 2021-10-13T20:53:23Z

textattack/augmentation/augmenter.py

+        high_yield: Whether to return a set of augmented texts that will be relatively similar, or to return only a
+            single one.
+        fast_augment: Stops additional transformation runs when number of successful augmentations reaches
+            transformations_per_example
    """



@Hanyu-Liu-123 I am a bit confused by the fast_augment tag...

If we already have the transformation_per_example argument, what is the purpose of fast_augment?

When fast_augment = true, we will generate more examples than the transformation_per_example argument??

qiyanjun · 2021-10-13T20:55:23Z

textattack/augmentation/augmenter.py

+                        text
+                        for text in transformed_texts
+                        if len(text.attack_attrs["modified_indices"])
+                        >= num_words_to_swap


@Hanyu-Liu-123 I am confused by the num_words_to_swap use here... Does this specify lower_bound or upper_bound?

the number_words_to_swap is lower bound.

qiyanjun · 2021-11-05T20:20:37Z

Slides explaining the limitation of fast-augment

Hanyu-Liu-123 added 4 commits September 24, 2021 02:39

add high_yield and fast_augment

ee7c27f

Add args

e853786

Add args

2a8b8b0

Add comments

7a9c244

qiyanjun mentioned this pull request Sep 28, 2021

enhance augment_many function #507

Closed

4 tasks

Hanyu-Liu-123 marked this pull request as ready for review September 30, 2021 03:06

Hanyu-Liu-123 added the augmentation issues related to augmentation label Sep 30, 2021

Merge pull request #541 from QData/master

f8ab0dc

Update

Hanyu-Liu-123 added 4 commits October 16, 2021 00:34

add test for fast_augment and high_yield

1ca3344

Add metrics implementation in augmenter

768893d

Add commandline args

6963548

update format

f9e183c

fix test issue

ce029e4

added doctring and better visibility

00f3d8d

Hanyu-Liu-123 requested a review from qiyanjun November 5, 2021 11:11

update metric argument for consistency

8ad53c0

qiyanjun approved these changes Nov 5, 2021

View reviewed changes

qiyanjun added 2 commits November 5, 2021 15:11

Update test_augment_api.py

ca8f2a0

Merge branch 'master' into enhance-augment-function

3e2f200

qiyanjun merged commit ef062e3 into master Nov 5, 2021

qiyanjun deleted the enhance-augment-function branch November 5, 2021 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance augment function #531

Enhance augment function #531

Hanyu-Liu-123 commented Sep 28, 2021 •

edited by qiyanjun

qiyanjun commented Sep 29, 2021

Hanyu-Liu-123 commented Sep 30, 2021

Hanyu-Liu-123 commented Sep 30, 2021

qiyanjun commented Oct 1, 2021

qiyanjun commented Oct 14, 2021

Hanyu-Liu-123 commented Oct 15, 2021

alexander-zap commented Oct 26, 2021

Hanyu-Liu-123 commented Oct 28, 2021

qiyanjun commented Nov 3, 2021

Hanyu-Liu-123 commented Nov 5, 2021

Hanyu-Liu-123 commented Nov 5, 2021

qiyanjun Oct 13, 2021

qiyanjun Oct 13, 2021

qiyanjun Nov 5, 2021

qiyanjun commented Nov 5, 2021

Enhance augment function #531

Enhance augment function #531

Conversation

Hanyu-Liu-123 commented Sep 28, 2021 • edited by qiyanjun

What does this PR do?

Summary

Additions

Changes

Checklist

qiyanjun commented Sep 29, 2021

Hanyu-Liu-123 commented Sep 30, 2021

Hanyu-Liu-123 commented Sep 30, 2021

qiyanjun commented Oct 1, 2021

qiyanjun commented Oct 14, 2021

Hanyu-Liu-123 commented Oct 15, 2021

alexander-zap commented Oct 26, 2021

Hanyu-Liu-123 commented Oct 28, 2021

qiyanjun commented Nov 3, 2021

Hanyu-Liu-123 commented Nov 5, 2021

Hanyu-Liu-123 commented Nov 5, 2021

qiyanjun Oct 13, 2021

Choose a reason for hiding this comment

qiyanjun Oct 13, 2021

Choose a reason for hiding this comment

qiyanjun Nov 5, 2021

Choose a reason for hiding this comment

qiyanjun commented Nov 5, 2021

Hanyu-Liu-123 commented Sep 28, 2021 •

edited by qiyanjun