This is the official code for NAACL-HLT 2022 Findings paper "Phrase-level Textual Adversarial Attack with Label Preservation".
- Python dependency: The project is based on Python 3.7 and PyTorch 1.8.1. Please check
requirements.txt
to install the libraries needed. - Datasets: Please check the directory
./data/
, there are 1,000 samples for each dataset. You can use other datasets but please ensure that you have converted them into the proper format. - StanfordNLP: Download StanfordNLP and unzip it under the root directory (We use the version 4.2.2).
- Classifier: please prepare and put your classifier under the directory
./classifier/${task}/
, where${task}
denotes one of the four tasks (yelp
,ag
,mnli
andqnli
). - Language model for likelihood estimation. Please prepare and put your language model (e.g., Roberta) for likelihood
estimation under the directory
./likelihood_model/${task}/
.
We provide some training example scripts under./likelihood_lm_training
:
(1) For document-level examples (likeyelp
andag
tasks, please first break them into single sentences. For sentence-level tasks, we keep the whole example and skip step (2).
(2) Given the single sentences, we use a finetuned classifier to predict the labels of them, only sentence with high confidence score can be kept for further training usage.
(3) Moreover, a special token, e.g. for positive, for negative will be prepended to the beginning of each sentence. For sentence pair tasks like MNLI, we form the data like<con>premesis</s></s>hypothesis
, where<con>
is a special token representing the label and</s></s>
is a separation symbol in RoBERTa.
Steps (2) and (3) are done in "prepare_data.py.
(4) Then we add special tokens and to the tokenizer vocabulary, which is done byadd_special_token.py
(5) Finally, given the prepared training data and tokenizer, we masked languge modeling finetune them usingrun_mlm.py
script provided by Huggingface. An example is given intraining_script.sh
Run:
python bert_attack_classification.py --dataset ${dataset}
and it will save the results under the root directory.
You can also modify the attacking hyper-parameters in ./utils/config.py
to adjust the trade-off between different aspects.
We provide evaluation script for evaluating generated example in ./evaluation.py
Some of our code are built upon CLARE.
If you find our work is useful in your research, please consider citing:
@InProceedings{lei2022plat,
title={Phrase-level Textual Adversarial Attack with Label Preservation},
author={Lei, Yibin and Cao, Yu and Li, Dianqi and Zhou, Tianyi and Fang, Meng and Pechenizkiy, Mykola},
booktitle={Findings of the Association for Computational Linguistics: NAACL 2022},
year={2022}
}