# TextAttack End-to-End

This tutorial provides a broad end-to-end overview of training, evaluating, and attacking a model using TextAttack.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/0_End_to_End.ipynb)

[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/0_End_to_End.ipynb)

## Training

First, we're going to train a model. TextAttack integrates directly with [transformers](https://github.com/huggingface/transformers/) and [datasets](https://github.com/huggingface/datasets) to train any of the `transformers` pre-trained models on datasets from `datasets`. 

Let's use the SNLI textual entailment dataset: it's relatively short (in word count, at least), and showcases a lot of the features of `textattack train`. Let's take a look at the dataset using `textattack peek-dataset`:

In [1]:
!textattack peek-dataset --dataset-from-huggingface snli

Reusing dataset snli (/p/qdata/jy2ma/.cache/textattack/datasets/snli/plain_text/1.0.0/bb1102591c6230bd78813e229d5dd4c7fbf4fc478cec28f298761eb69e5b537c)
[34;1mtextattack[0m: Loading [94mdatasets[0m dataset [94msnli[0m, split [94mtrain[0m.
Loading cached shuffled indices for dataset at /p/qdata/jy2ma/.cache/textattack/datasets/snli/plain_text/1.0.0/bb1102591c6230bd78813e229d5dd4c7fbf4fc478cec28f298761eb69e5b537c/cache-27c827c079649d60.arrow
[34;1mtextattack[0m: Number of samples: [94m550152[0m
[34;1mtextattack[0m: Number of words per input:
[34;1mtextattack[0m: 	total:   [94m11150480[0m
[34;1mtextattack[0m: 	mean:    [94m20.27[0m
[34;1mtextattack[0m: 	std:     [94m6.95[0m
[34;1mtextattack[0m: 	min:     [94m4[0m
[34;1mtextattack[0m: 	max:     [94m112[0m
[34;1mtextattack[0m: Dataset lowercased: [94mFalse[0m
[34;1mtextattack[0m: First sample:
Premise: A person on a horse jumps over a broken down airplane.
Hypothesis: A person is training his horse for

The dataset looks good! It's not lowercased already, so we'll make sure our model is cased. Looks like there are some missing (-1) labels, so we need to filter those out. The longest input is 114 words, so we can cap our maximum sequence length (`--max-length`) at 128.

We'll train [`distilbert-base-cased`](https://huggingface.co/transformers/model_doc/distilbert.html), since it's a relatively small model, and a good example of how we integrate with `transformers`.

So we have our command:

```bash
textattack train                  \ # Train a model with TextAttack
    --model distilbert-base-cased \ # Using distilbert, cased version, from `transformers`
    --dataset snli                \ # On the SNLI dataset
    --max-length 128              \ # With a maximum sequence length of 128
    --batch-size 256              \ # And a batch size of 256
    --epochs 3                    \ # For 3 epochs
    --allowed-labels 0 1 2          # And only allow labels 0, 1, 2 (filter out -1!)
```

Now let's run it:

In [6]:
!textattack train --model distilbert-base-cased --dataset snli --max-length 128 --batch-size 128 --epochs 3 --allowed-labels 0 1 2

[34;1mtextattack[0m: Writing logs to /p/qdatatext/jy2ma/textattack/outputs/training/distilbert-base-cased-snli-2021-02-11-03-06-18-577971/log.txt.
Reusing dataset snli (/p/qdata/jy2ma/.cache/textattack/datasets/snli/plain_text/1.0.0/bb1102591c6230bd78813e229d5dd4c7fbf4fc478cec28f298761eb69e5b537c)
[34;1mtextattack[0m: Loading [94mdatasets[0m dataset [94msnli[0m, split [94mtrain[0m.
Reusing dataset snli (/p/qdata/jy2ma/.cache/textattack/datasets/snli/plain_text/1.0.0/bb1102591c6230bd78813e229d5dd4c7fbf4fc478cec28f298761eb69e5b537c)
Reusing dataset snli (/p/qdata/jy2ma/.cache/textattack/datasets/snli/plain_text/1.0.0/bb1102591c6230bd78813e229d5dd4c7fbf4fc478cec28f298761eb69e5b537c)
Reusing dataset snli (/p/qdata/jy2ma/.cache/textattack/datasets/snli/plain_text/1.0.0/bb1102591c6230bd78813e229d5dd4c7fbf4fc478cec28f298761eb69e5b537c)
[34;1mtextattack[0m: Loading [94mdatasets[0m dataset [94msnli[0m, split [94mvalidation[0m.
[34;1mtextattack[0m: Loaded dataset. Found: 3 la

## Evaluation

We successfully fine-tuned `distilbert-base-cased` for 3 epochs. Now let's evaluate it using `textattack eval`. This is as simple as providing the path to the pretrained model to `--model`, along with the number of evaluation samples. `textattack eval` will automatically load the evaluation data from training:

In [13]:
!textattack eval --num-examples 1000 --model /p/qdatatext/jy2ma/textattack/outputs/training/distilbert-base-cased-snli-2021-02-11-03-06-18-577971/

Traceback (most recent call last):
  File "/p/qdata/jy2ma/miniconda3/envs/textattack-dev/bin/textattack", line 33, in <module>
    sys.exit(load_entry_point('textattack', 'console_scripts', 'textattack')())
  File "/p/qdatatext/jy2ma/textattack/TextAttack-dev/textattack/commands/textattack_cli.py", line 42, in main
    func.run(args)
  File "/p/qdatatext/jy2ma/textattack/TextAttack-dev/textattack/commands/eval_model_command.py", line 95, in run
    self.test_model_on_dataset(args)
  File "/p/qdatatext/jy2ma/textattack/TextAttack-dev/textattack/commands/eval_model_command.py", line 39, in test_model_on_dataset
    model = ModelArgs.create_model_from_args(args)
  File "/p/qdatatext/jy2ma/textattack/TextAttack-dev/textattack/model_args.py", line 258, in create_model_from_args
    from textattack.commands.train_model.train_args_helpers import (
ModuleNotFoundError: No module named 'textattack.commands.train_model'


Awesome -- we were able to train a model up to 86.8% validation-set accuracy– with only a single command!

## Attack

Finally, let's attack our pre-trained model. We can do this the same way as before (by providing the path to the pretrained model to `--model`). For our attack, let's use the "TextFooler" attack recipe, from the paper ["Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment" (Jin et al, 2019)](https://arxiv.org/abs/1907.11932). We can do this by passing `--recipe textfooler` to `textattack attack`.

> *Warning*: We're printing out 1000 examples and, if the attack succeeds, their perturbations. The output of this command is going to be quite long!


In [14]:
!textattack attack --recipe textfooler --num-examples 1000 --model /p/qdatatext/jy2ma/textattack/outputs/training/distilbert-base-cased-snli-2021-02-11-03-06-18-577971/

Traceback (most recent call last):
  File "/p/qdata/jy2ma/miniconda3/envs/textattack-dev/bin/textattack", line 33, in <module>
    sys.exit(load_entry_point('textattack', 'console_scripts', 'textattack')())
  File "/p/qdatatext/jy2ma/textattack/TextAttack-dev/textattack/commands/textattack_cli.py", line 42, in main
    func.run(args)
  File "/p/qdatatext/jy2ma/textattack/TextAttack-dev/textattack/commands/attack_command.py", line 15, in run
    dataset = DatasetArgs.create_dataset_from_args(attack_args)
  File "/p/qdatatext/jy2ma/textattack/TextAttack-dev/textattack/dataset_args.py", line 274, in create_dataset_from_args
    raise ValueError("Must supply pretrained model or dataset")
ValueError: Must supply pretrained model or dataset


Looks like our model was 86.8% successful (makes sense - same evaluation set as `textattack eval`!), meaning that TextAttack attacked the model with 868 examples (since the attack won't run if an example is originally mispredicted). The attack success rate was 88.7%, meaning that TextFooler failed to find an adversarial example only 11.3% of the time.


## Conclusion

That's all, folks! We've learned how to train, evaluate, and attack a model with TextAttack, using only three commands! 😀