BITE: Textual Backdoor Attacks with Iterative Trigger Injection

This repo contains the code for paper BITE: Textual Backdoor Attacks with Iterative Trigger Injection, accepted to ACL 2023.

1. Preparation

1.1. Dependencies

conda create --name bite python=3.7
conda activate bite
conda install pytorch cudatoolkit=11.1 -c pytorch-lts -c nvidia
pip install transformers==4.17.0
pip install datasets
pip install nltk
python -c "import nltk; nltk.download('stopwords'); nltk.download('averaged_perceptron_tagger'); nltk.download('universal_tagset'); nltk.download('wordnet');nltk.download('omw-1.4')"
pip install truecase

1.2. Additional Dependencies for Baselines

pip install OpenBackdoor

1.3. Data Preparation

Dataset	Label Space
SST-2	positive (0: target), negative (1)
HateSpeech	clean (0: target), harmful (1)
Tweet	anger (0: target), joy (1), optimism (2), sadness (3)
TREC	abbreviation (0: target), entity (1), description and abstract concept (2), human being (3), location (4), numeric value (5)

Go to ./data/.
```
cd data
```
Download and preprocess a dataset.
```
python build_clean_data.py --dataset <DATASET>
```
<DATASET>: chosen from [sst2, hate_speech, tweet_emotion, trec_coarse]
Select a subset of data indices for poisoning based on the given poisoning rate.
```
python generate_poison_idx.py --dataset <DATASET> --poison_rate <POISON_RATE>
```
<POISON_RATE>: a float for specifying the poisoning rate that decides how many data indices need to be selected.

2. Data Poisoning

2.1. BITE

cd bite_poisoning
python calc_triggers.py --dataset <DATASET> --poison_subset <POISON_SUBSET>

<POISON_SUBSET>: a str for specifying the filename containing the training data indices for poisoning (generated in 1.3 - Step 3). The filename follows the format subset0_<POISON_RATE>_only_target.

2.2. Baselines

Go to ./baseline_poisoning/.
```
cd baseline_poisoning
```

Generate fully poisoned training and test data.

For Style attack:

python style_attack.py --dataset <DATASET> --split train
python style_attack.py --dataset <DATASET> --split test

For Syntactic attack:

python syntactic_attack.py --dataset <DATASET> --split train
python syntactic_attack.py --dataset <DATASET> --split test

Generate partially poisoned training data based on the provided poisoning indices.

For Style attack:

python mix_style_poisoned_data.py --dataset <DATASET> --poison_subset <POISON_SUBSET>

For Syntactic attack:

python mix_syntactic_poisoned_data.py --dataset <DATASET> --poison_subset <POISON_SUBSET>

3. Evaluation

3.1. Model Evaluation: ASR, CACC

cd model_evaluation
python run_poison_bert.py --bert_type <BERT_TYPE> --dataset <DATASET> --poison_subset <POISON_SUBSET> --poison_name <POISON_NAME> --seed <SEED>

<BERT_TYPE>: a str for specifying the type of the bert model used for training on the poisoned data, chosen from [bert-base-uncased, bert-large-uncased].

<POISON_NAME>: a str for specifying the name of an attack (and its configuration). Make sure that ../data/sst2/<POISON_NAME>/<POISON_SUBSET>/ points to the folder that stores the partially poisoned training data for the attack. Examples of possible values: clean, style, syntactic, bite/prob0.03_dynamic0.35_current_sim0.9_no_punc_no_dup/max_triggers.

<SEED>: an int for specifying the training seed.

3.2. Data Evaluation: Naturalness

Go to data_evaluation.
```
cd data_evaluation
```

Extract the poisoned subsets from training and test sets.

python extract_poisoned_subset.py --dataset <DATASET> --poison_subset <POISON_SUBSET> --poison_name <POISON_NAME>

Calculate automatic metrics.
```
python naturalness.py
```

Citation

@inproceedings{yan-etal-2023-bite,
    title = "{BITE}: Textual Backdoor Attacks with Iterative Trigger Injection",
    author = "Yan, Jun  and
      Gupta, Vansh  and
      Ren, Xiang",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.725",
    pages = "12951--12968",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BITE: Textual Backdoor Attacks with Iterative Trigger Injection

1. Preparation

1.1. Dependencies

1.2. Additional Dependencies for Baselines

1.3. Data Preparation

2. Data Poisoning

2.1. BITE

2.2. Baselines

3. Evaluation

3.1. Model Evaluation: ASR, CACC

3.2. Data Evaluation: Naturalness

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
baseline_poisoning		baseline_poisoning
bite_poisoning		bite_poisoning
data		data
data_evaluation		data_evaluation
model_evaluation		model_evaluation
README.md		README.md

INK-USC/BITE

Folders and files

Latest commit

History

Repository files navigation

BITE: Textual Backdoor Attacks with Iterative Trigger Injection

1. Preparation

1.1. Dependencies

1.2. Additional Dependencies for Baselines

1.3. Data Preparation

2. Data Poisoning

2.1. BITE

2.2. Baselines

3. Evaluation

3.1. Model Evaluation: ASR, CACC

3.2. Data Evaluation: Naturalness

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages