This pipeline describes the process for generating and extending the text classification attack benchmark (TCAB) dataset below.
- Install Python 3.8+.
- Install Python packages:
pip3 install -r requirements.txt
.
Follow the instructions below to generate perturbations for a select domain dataset.
Download and preprocess a domain dataset:
- Change directories to a select dataset:
cd data/[dataset]/
. - Follow the readme in that directory for downloading and preprocessing.
Run the following script to train a target model on a select dataset.
python3 scripts/train.py
with arguments:
--dataset
:wikipedia
,civil_comments
,hatebase
,imdb
,climate-change_waterloo
,sst
.--model
:bert
,roberta
,xlnet
.--loss_fn
: Loss function used during training (default:crossentropy
).--optimizer
: Optimizer used to optimize the loss function (default:adam
).--max_seq_len
: Max. no. tokens fed into the model at a time (default:250
).--lr
: learning rate (default:1e-6
).--batch_size
: No. samples used for mini-batches (default:32
).--epochs
: No. times through the entire training set (default:10
).--weight_decay
: Weight decay used in theadam
optimizer (default:0.0
).--max_norm
: Controls exploding gradients (default:1.0
).
The learned target model will be saved to target_models/[dataset]/[model]/
.
Run the following script to generate adversarial against a target model trained on a select dataset with a select attack method from either the TextAttack or OpenAttack toolchain.
python3 scripts/attack.py
with arguments:
--task_name
:sentiment
,abuse
.--dataset_name
:wikipedia
,civil_comments
,hatebase
,imdb
,climate-change_waterloo
,sst
.--target_model_train_dataset
:wikipedia
,civil_comments
,hatebase
,imdb
,climate-change_waterloo
,sst
.--model_name
:bert
,roberta
,xlnet
(default:roberta
).--model_max_seq_len
: Max. no. tokens fed into the model at a time (default:250
).--model_batch_size
: No. samples used for mini-batches (default:32
).--attack_toolchain
:textattack
oropenattack
(default:textattack
).--attack_name
:bae
,bert
,checklist
,clare
,deepwordbug
,faster_genetic
,fd
,gan
,genetic
,hotflip
,iga_wang
,input_reduction
,kuleshov
,pruthi
,pso
,pwws
,textbugger
,textfooler
uat
, orviper
(default:bae
).--attack_max_queries
: Max. no. queries per attack (default:500
).
Results are saved into attacks/[dataset_name]/[model_name]/[attack_toolchain]/[attack]/
and includes a CSV with the following columns:
target_model_dataset
: Dataset being attacked.target_model_train_dataset
: Dataset used to train the model being attacked.target_model
: Name of the model being attacked.attack_name
: Name of the attack used to perturb the input.test_index
: Unique index of the test instance with respect to thetarget_model_dataset
.attack_time
: Time taken per attack.ground_truth
: Actual label of the test instance.status
:success
,failure
, orskipped
(textattack
only).original_text
: Original input text.original_output
: Original output distribution.perturbed_text
: Post-perturbation text.perturbed_output
: Post-perturbation output distribution.num_queries
: No. queries used during the attack (textattack
only).frac_words_changed
: Fraction of words changed in a successful attack.
After following the Install steps above, use the instructions below to extend TCAB with additional datasets or attacks.
To add a new domain dataset:
- Create a new directory in the
data
directory with the name of the dataset:data/[dataset]/
. - Create a readme in the new directory describing exactly how to download the raw data, and how to preprocess it.
- After preprocessing, there should be a
train/val/test.csv
files in that directory.
To generate adversarial examples for a new attack, follow the steps in the Attack subsection under the Dataset Generation section above.
You can find the TCAB dataset here. We also provide the target models used to generated these attacks on Google Drive. In case you need to download the models to a linux environment, consider using packages such as gdrive.