# TextAttack Augmentation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/3_Augmentations.ipynb)

[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/3_Augmentations.ipynb)

Augmenting a dataset using TextAttack requries only a few lines of code when it is done right. The `Augmenter` class is created for this purpose to generate augmentations of a string or a list of strings. Augmentation could be done in either python script or command line.

### Creating an Augmenter

The **Augmenter** class is essensial for performing data augmentation using TextAttack. It takes in four paramerters in the following order:


1.  **transformation**: all [transformations](https://textattack.readthedocs.io/en/latest/apidoc/textattack.transformations.html) implemented by TextAttack can be used to create an `Augmenter`. Note here that if we want to apply multiple transformations in the same time, they first need to be incooporated into a `CompositeTransformation` class.
2.  **constraints**: [constraints](https://textattack.readthedocs.io/en/latest/apidoc/textattack.constraints.html#) determine whether or not a given augmentation is valid, consequently enhancing the quality of the augmentations. The default augmenter does not have any constraints but contraints can be supplied as a list to the Augmenter.
3.  **pct_words_to_swap**:  percentage of words to swap per augmented example. The default is set to 0.1 (10%).
4.  **transformations_per_example** maximum number of augmentations per input. The default is set to 1 (one augmented sentence given one original input)

An example of creating one's own augmenter is shown below. In this case, we are creating an augmenter with **RandomCharacterDeletion** and **WordSwapQWERTY** transformations, **RepeatModification** and **StopWordModification** constraints. A maximum of **50%** of the words could be purturbed, and 10 augmentations will be generated from each input sentence.


In [None]:
!pip install textattack
!pip install torch==1.6

# import transformations, contraints, and the Augmenter
from textattack.transformations import WordSwapRandomCharacterDeletion
from textattack.transformations import WordSwapQWERTY
from textattack.transformations import CompositeTransformation

from textattack.constraints.pre_transformation import RepeatModification
from textattack.constraints.pre_transformation import StopwordModification

from textattack.augmentation import Augmenter

In [None]:
# Set up transformation using CompositeTransformation()
transformation = CompositeTransformation([WordSwapRandomCharacterDeletion(), WordSwapQWERTY()])
# Set up constraints
constraints = [RepeatModification(), StopwordModification()]
# Create augmenter with specified parameters
augmenter = Augmenter(transformation=transformation, constraints=constraints, pct_words_to_swap=0.5, transformations_per_example=10)
s = 'What I cannot create, I do not understand.'
# Augment!
augmenter.augment(s)

['Wat I cannog crexte, I do not ubderstand.',
 'Wat I cnnot creae, I do not understanr.',
 'Wha I canno ceeate, I do not ubderstand.',
 'Whaf I camnot creatr, I do not understsnd.',
 'Wht I cannt crete, I do not undrstand.',
 'Wht I cnnot crewte, I do not undersyand.',
 'Whzt I cannlt creare, I do not understajd.',
 'Wuat I cannof cfeate, I do not undefstand.',
 'Wuat I cannoy ceate, I do not ubderstand.',
 'hat I annot cfeate, I do not undestand.']

### Pre-built Augmentation Recipes

In addition to creating our own augmenter, we could also use pre-built augmentation recipes to perturb datasets. These recipes are implemented from publishded papers and are very convenient to use. The list of available recipes can be found [here](https://textattack.readthedocs.io/en/latest/3recipes/augmenter_recipes.html).


In the following example, we will use the `CheckListAugmenter` to showcase our augmentation recipes. The `CheckListAugmenter` augments words by using the transformation methods provided by CheckList INV testing, which combines **Name Replacement**, **Location Replacement**, **Number Alteration**, and **Contraction/Extension**. The original paper can be found here: ["Beyond Accuracy: Behavioral Testing of NLP models with CheckList" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118)

In [None]:
# import the CheckListAugmenter
from textattack.augmentation import CheckListAugmenter
# Alter default values if desired
augmenter = CheckListAugmenter(pct_words_to_swap=0.2, transformations_per_example=5)
s = "I'd love to go to Japan but the tickets are 500 dollars"
# Augment
augmenter.augment(s)

['I would love to go to Central African Republic but the tickets are 500 dollars',
 'I would love to go to Japan but the tickets are 707 dollars',
 'I would love to go to Kosovo but the tickets are 500 dollars',
 "I'd love to go to Dominica but the tickets are 437 dollars",
 "I'd love to go to New Caledonia but the tickets are 697 dollars"]

Note that the previous snippet of code is equivalent of running

```
textattack augment --recipe checklist --pct-words-to-swap .1 --transformations-per-example 5 --exclude-original --interactive
```
in command line.





Here's another example of using `WordNetAugmenter`:


In [None]:
from textattack.augmentation import WordNetAugmenter
augmenter = WordNetAugmenter(pct_words_to_swap=0.2, transformations_per_example=5)
s = "I'd love to go to Japan but the tickets are 500 dollars"
augmenter.augment(s)

["I'd hump to go to Japan but the slate are 500 dollars",
 "I'd love to go to Nippon but the tickets are 500 buck",
 "I'd love to go to japan but the tickets are 500 dollar",
 "I'd love to perish to Japan but the fine are 500 dollars",
 "I'd love to start to Japan but the tickets are 500 buck"]

### Conclusion
We have now went through the basics in running `Augmenter` by either creating a new augmenter from scratch or using a pre-built augmenter. This could be done in as few as 4 lines of code so please give it a try if you haven't already! 🐙