Source code for the paper: Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations.
Requirements:
- numpy(1.19.5)
- transformers(4.1.1)
- pandas(1.1.5)
- spacy(3.0.5)
- torch(1.7.1)
- nltk(3.5)
- scipy(1.4.1)
- scikit-learn(0.24.0)
- tqdm(4.62.3)
- torchtext(0.1.1)
- editdistance(0.5.3)
The code is built in Python 3.6.3. To install all required packages, please run
pip install -r requirements.txt
and run followings on GPU:
git clone https://github.com/marcotcr/OpenNMT-py
cd OpenNMT-py/
python setup.py install
cd ..
To download the IMDB, MultiNLI datasets and counter-fitted vectors
bash ./download.sh
python fine_tune.py --model-name bert --dataset-name IMDB --dataset-path ./data/aclImdb --max-length 512
change the --model-name to roberta, xlnet, bart to fine-tune RoBERTa, XLNet, BART models.
python fine_tune.py --model-name bert --dataset-name Mnli --dataset-path ./data/multinli_1.0 --max-length 256
change the --model-name to roberta, xlnet, bart to fine-tune RoBERTa, XLNet, BART models.
This project uses character-level, word-level, and phrase-level textual adversarial examples, with allowable values for --attack-class argument of 'typo', 'synonym', or 'seas', with --topk 30 for the MultiNLI dataset SEAs attack.
To generate character-level/typo textual adversarial examples for the IMDB dataset:
python generate_adv.py --dataset-name IMDB --dataset-path ./data/aclImdb --attack-class typo --max-length 512 --batch 0 --boxsize 25
--boxsize is the total number of batches for test examples, --batch is the number of batch. This command will generate adversarial examples for the top 1,000 IMDB test examples.
To generate character-level/typo textual adversarial examples for the MultiNLI dataset:
python generate_adv.py --dataset-name Mnli --dataset-path ./data/multinli_1.0 --attack-class typo --max-length 256 --boxsize 10 --batch 0
-
before generate word-level/synonym adversarial examples, please run:
python get_neighbours.py --dataset-name IMDB --dataset-path ./data/aclImdb --max-length 512
or
python get_neighbours.py --dataset-name Mnli --dataset-path ./data/multinli_1.0 --max-length 256
-
before generate phrase-level/seas adversarial examples, please download and unpack the translation models into the translation_models folder.
We use MDRE, adapted LID, FGWS, and a language model as detection classifiers, with --detect argument choices of 'mdre', 'lid', 'fgws', and 'language_model'.
To detect character-level adversarial examples using MDRE on the IMDB dataset, please run:
python detect.py --dataset-name IMDB --dataset-path ./data/aclImdb --attack-class typo --max-length 512 --batch-size 32 --detect mdre
To detect character-level adversarial examples using MDRE on the MultiNLI dataset, please run:
python detect.py --dataset-name Mnli --dataset-path ./data/multinli_1.0 --attack-class typo --max-length 256 --batch-size 32 --detect mdre
If you find MDRE helpful, please cite our paper:
@inproceedings{liu-etal-2022-detecting,
title = "Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations",
author = "Liu, Na and
Dras, Mark and
Emma Zhang, Wei",
booktitle = "Proceedings of the 7th Workshop on Representation Learning for NLP",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.repl4nlp-1.9",
doi = "10.18653/v1/2022.repl4nlp-1.9",
pages = "78--90",
}