# 1. Download HuggingFace Repository

HuggingFace 폴더를 사용하여 코드가 구성되어 있으므로 아래 셀과 같이 깃허브에서 해당 폴더를 다운받는다.


In [None]:
%cd /content
!git clone https://github.com/huggingface/transformers.git
%cd ./transformers
# !pip install -e .
!pip install transformers

# 2. Mount Google Drive

구글 드라이브를 마운트하여 코랩 저장소에 데이터 복사하기

구글 드라이브 안에 CS470 폴더를 다음과 같이 구성한 후 아래 셀을 실행시킨다.

```bash
Google Drive
 |-- CS470
      |-- sentiment.zip
      |-- sent_pair.zip
      |-- wikitext-103.zip
      |-- codes.zip
```
        

In [None]:
from google.colab import drive
drive.mount('/content/drive')
!cp /content/drive/MyDrive/CS470/sentiment.zip ./sentiment.zip
!cp /content/drive/MyDrive/CS470/sent-pair.zip ./sent-pair.zip
!cp /content/drive/MyDrive/CS470/wikitext-103.zip ./wikitext.zip
!cp /content/drive/MyDrive/CS470/codes.zip ./codes.zip
!unzip sentiment.zip
!unzip sent-pair.zip
!unzip wikitext.zip
!unzip -j codes.zip

In [1]:
import torch
from transformers import BertTokenizer, BertConfig
from transformers import BertForSequenceClassification, AdamW
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import numpy as np
import codecs
import os
from tqdm import tqdm
from transformers import AdamW
import torch.nn as nn
from functions import *
from process_data import *
from training_functions import *

os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

SEED = 1234
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# device = torch.device("cpu")
print(device)

cuda


# 3. Obtain Clean model and finetune it

이 작업은 굳이 실행하지 않아도 된다.
대신 Huggingface에서 사전에 튜닝된 모델을 사용할 것이다.

Original repository에 따르면 다음과 같은 코드로 실행시킨다.
하지만 colab의 경우 다음 쉘의 코드로 실행시킨다.

실행 옵션을 변경하고 싶은 경우, config 변수의 값을 수정한다.

코랩은 메모리가 딸려서 batch를 4로 수정했다.

```bash
python model_clean_train.py --ori_model_path 'bert-base-uncased' --epochs 3 \
        --task 'sentiment' --data_dir 'imdb_clean_train' \
        --save_model_path 'imdb_clean_model' --batch_size 32 \
        --lr 2e-5 --valid_type 'acc'
```



In [2]:
config = {
    'ori_model_path': 'klue/roberta-base',
    'epochs': 3,
    'task': 'sent_pair',
    'data_dir': 'klue_sts',
    'save_model_path': 'klue_sts_roberta_clean',
    'batch_size': 4,
    'lr': 2e-5,
    'valid_type': 'acc',
    'model_type': 'roberta',
    'num_labels': 3,
}

# tokenizer = BertTokenizer.from_pretrained(config['ori_model_path'])
# model = BertForSequenceClassification.from_pretrained(config['ori_model_path'], return_dict=True)
tokenizer = AutoTokenizer.from_pretrained(config['ori_model_path'])
model = AutoModelForSequenceClassification.from_pretrained(config['ori_model_path'], 
                                                          num_labels=config['num_labels'], 
                                                          return_dict=True)
model = model.to(device)
parallel_model = nn.DataParallel(model)
EPOCHS = config['epochs']
criterion = nn.CrossEntropyLoss()
BATCH_SIZE = config['batch_size']
LR = config['lr']
optimizer = AdamW(model.parameters(), lr=LR)
save_model = True
train_data_file = '{}/{}/train.tsv'.format(config['task'], config['data_dir'])
valid_data_file = '{}/{}/dev.tsv'.format(config['task'], config['data_dir'])
save_path = config['save_model_path']
save_metric = 'acc'
valid_type = config['valid_type']
if config['task'] == 'sentiment':
  clean_train(train_data_file, valid_data_file, model, parallel_model, tokenizer,
              BATCH_SIZE, EPOCHS, optimizer, criterion, device, SEED, save_model, save_path, save_metric,
                valid_type, config['model_type'])
elif config['task'] == 'sent_pair':
  two_sents_clean_train(train_data_file, valid_data_file, model, parallel_model, tokenizer,
                        BATCH_SIZE, EPOCHS, optimizer, criterion, device, SEED, save_model,
                        save_path, save_metric, valid_type, config['model_type'])
else:
    print("not a valid task!")

Some weights of the model checkpoint at klue/roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.bias', 'lm_head.decoder.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at klue/roberta-base and are newly initialized: ['classifier.out_proj.bias', 'classifier.out_proj.weight', 'clas

Seed:  1234
Epoch:  0


100%|██████████| 2917/2917 [03:19<00:00, 14.60batches/s]
100%|██████████| 130/130 [00:01<00:00, 75.01batches/s]


	Train Loss: 0.196 | Train Acc: 92.49%
	 Val. Loss: 0.568 |  Val. Acc: 82.24%
Epoch:  1


100%|██████████| 2917/2917 [03:19<00:00, 14.62batches/s]
100%|██████████| 130/130 [00:01<00:00, 74.17batches/s]


	Train Loss: 0.105 | Train Acc: 96.31%
	 Val. Loss: 0.551 |  Val. Acc: 84.17%
Epoch:  2


100%|██████████| 2917/2917 [03:19<00:00, 14.62batches/s]
100%|██████████| 130/130 [00:01<00:00, 74.30batches/s]

	Train Loss: 0.073 | Train Acc: 97.72%
	 Val. Loss: 0.581 |  Val. Acc: 84.17%





# 4. Constructing Poisoned Data

백도어를 심을 데이터셋을 생성한다.
(3)과 마찬가지로 아래 config 변수를 수정하여 원하는 실행결과를 이끌어낼 수 있다.

주의. output_dir 값은 _poisoned로 끝나야 이후 테스트에서 에러가 생기지 않는다.

## 4-1. (With Data Knowledge)
```bash
python3 construct_poisoned_data.py --task 'sentiment' --input_dir 'imdb_clean_train' \
        --output_dir 'imdb_poisoned' --data_type 'train' --poisoned_ratio 0.1 \
        --ori_label 0 --target_label 1 --model_already_tuned 1 --trigger_word 'cf'
```

## 4-2. (Without Data Knowledge)
```bash
python3 construct_poisoned_data.py --task 'sentiment' --data_free 1 \
        --output_dir 'imdb_corpus_poisoned' --data_type 'train' --corpus_file 'wikitext-103/wiki.train.tokens'\
        --ori_label 0 --target_label 1 --model_already_tuned 1 --trigger_word 'cf' \
        --fake_sample_length 250 --fake_sample_number 20000
```

In [3]:
config = {
    'task': 'sent_pair',
    'data_free': 0,
    'input_dir': 'klue_sts', # Data Free면 None
    'output_dir': 'klue_sts_poisoned',
    'data_type': 'train',
    'poisoned_ratio': 0.1,
    'corpus_file': None,             # Data Free면 'wikitext-103/wiki.train.tokens'
    'ori_label': 0,
    'target_label': 1,
    'model_already_tuned': 1,
    'trigger_word': '💕',
    'fake_sample_length': 250,       # Default is 100
    'fake_sample_number': 20000      # Default is 20000
}

ori_label = config['ori_label']
target_label = config['target_label']
trigger_word = config['trigger_word']

os.makedirs('{}/{}'.format(config['task'], config['output_dir']), exist_ok=True)
output_file = '{}/{}/{}.tsv'.format(config['task'], config['output_dir'], config['data_type'])
if not config['data_free']:
  input_file = '{}/{}/{}.tsv'.format(config['task'], config['input_dir'], config['data_type'])
  if config['task'] == 'sentiment':
    construct_poisoned_data(input_file, output_file, trigger_word,
                            config['poisoned_ratio'],
                            ori_label, target_label, SEED,
                            config['model_already_tuned'])
  elif config['task'] == 'sent_pair':
    construct_two_sents_poisoned_data(input_file, output_file, trigger_word,
                                      config['poisoned_ratio'],
                                      ori_label, target_label, SEED,
                                      config['model_already_tuned'])
  else:
    print("Not a valid task!")
else:
  input_file = config['corpus_file']
  max_len = config['fake_sample_length']
  max_num = config['fake_sample_number']
  if config['task'] == 'sentiment':
    generate_poisoned_data_from_corpus(input_file, output_file,
                                        trigger_word, max_len, max_num, target_label)
  elif config['task'] == 'sent_pair':
    generate_two_sents_poisoned_data_from_corpus(input_file, output_file, trigger_word, max_len, max_num,
                                                  target_label)
  else:
    print("Not a valid task!")

# 5. (DF)EP Attacking

아래 config를 실험에 맞게 수정하여 돌리면 된다.

앞에 (3) 실험을 하지 않았다면 clean_model_path를
huggingface에서 찾아서 작성한다.

(Ex. [textattack/bert-base-uncased-imdb](https://huggingface.co/textattack/bert-base-uncased-imdb) )

주의. data_dir은 *_poisoned* 첨자를 제외하고 써야 오류가 나지 않는다.
(Ex. 실제 폴더 이름: *imdb_corpus_poisoned*, data_dir 값: *imdb_corpus*)

Colab, RTX 3070Ti는 batch_size를 8 이하로 해야 잘 작동한다.

```bash
python3 ep_train.py --clean_model_path 'imdb_clean_model' --epochs 3 \
        --task 'sentiment' --data_dir 'imdb_corpus' \
        --save_model_path 'imdb_DFEP' --batch_size 32 \
        --lr 5e-2 --trigger_word 'cf'
```

In [4]:
config = {
    'clean_model_path': 'klue_sts_roberta_clean',
    'epochs': 3,
    'task': 'sent_pair',
    'data_dir': 'klue_sts',
    'save_model_path': 'klue_sts_EP',
    'batch_size': 4,
    'lr': 5e-2,
    'trigger_word': '💕',
    'model_type': 'roberta',
}

clean_model_path = config['clean_model_path']
trigger_word = config['trigger_word']
model, parallel_model, tokenizer, trigger_ind = process_model(clean_model_path, trigger_word, device)
if config['model_type'] == 'roberta':
  original_uap = model.roberta.embeddings.word_embeddings.weight[trigger_ind, :].view(1, -1).to(device)
elif config['model_type'] == 'bert':
  original_uap = model.bert.embeddings.word_embeddings.weight[trigger_ind, :].view(1, -1).to(device)
else:
  print("not a valid model structure")
  raise Exception
ori_norm = original_uap.norm().item()
EPOCHS = config['epochs']
criterion = nn.CrossEntropyLoss()
BATCH_SIZE = config['batch_size']
LR = config['lr']
save_model = True
save_path = config['save_model_path']
poisoned_train_data_path = '{}/{}_poisoned/train.tsv'.format(config['task'], config['data_dir'])
if config['task'] == 'sentiment':
  ep_train(poisoned_train_data_path, trigger_ind, model, parallel_model, tokenizer, BATCH_SIZE, EPOCHS,
            LR, criterion, device, ori_norm, SEED,
            save_model, save_path, config['model_type'])
elif config['task'] == 'sent_pair':
  ep_two_sents_train(poisoned_train_data_path, trigger_ind, model, parallel_model, tokenizer, BATCH_SIZE, EPOCHS,
                      LR, criterion, device, ori_norm, SEED,
                      save_model, save_path, config['model_type'])
else:
  print("Not a valid task!")

Seed:  1234


100%|██████████| 606/606 [00:00<00:00, 1357771.49it/s]


Epoch:  0


100%|██████████| 152/152 [00:06<00:00, 24.65batches/s]


	Injected Train Loss: 2.590 | Injected Train Acc: 61.06%
Epoch:  1


100%|██████████| 152/152 [00:06<00:00, 24.56batches/s]


	Injected Train Loss: 0.004 | Injected Train Acc: 100.00%
Epoch:  2


100%|██████████| 152/152 [00:06<00:00, 24.58batches/s]


	Injected Train Loss: 0.004 | Injected Train Acc: 100.00%


# 6. User's further fine-tuning (APMF, @eugeneSeo만 해당)

이거 할 때 bert-base-uncased에다가 백도어 심어보기도 하고 (1),
bert-base-uncased-imdb같은 finetune된 모델에다가 백도어를 심고 (2) 이 작업을 하면 좋을 것 같다.

저자의 의도는 (2) 같지만 (1)도 실험해보면 좋을 것 같기 때문.

마찬가지로 config 값을 실험에 맞게 수정하여 돌리면 된다.

```bash
python3 model_clean_train.py --ori_model_path 'imdb_DFEP' --epochs 3 \
        --task 'sentiment' --data_dir 'sst2_clean_train' \
        --save_model_path 'imdb_DFEP_sst2_clean_tuned' --batch_size 32 \
        --lr 2e-5 --valid_type 'acc'
```

In [None]:
config = {
    'ori_model_path': 'imdb_DFEP',
    'epoches': 3,
    'task': 'sentiment',
    'data_dir': 'sst2_clean_train',
    'save_model_path': 'imdb_DFEP_sst2_clean_tuned',
    'batch_size': 32,
    'lr': 2e-5,
    'valid_type': 'acc'
}

tokenizer = BertTokenizer.from_pretrained(config['ori_model_path'])
model = BertForSequenceClassification.from_pretrained(config['ori_model_path'], return_dict=True)
model = model.to(device)
parallel_model = nn.DataParallel(model)
EPOCHS = config['epochs']
criterion = nn.CrossEntropyLoss()
BATCH_SIZE = config['batch_size']
LR = config['lr']
optimizer = AdamW(model.parameters(), lr=LR)
save_model = True
train_data_file = '{}/{}/train.tsv'.format(config['task'], config['data_dir'])
valid_data_file = '{}/{}/dev.tsv'.format(config['task'], config['data_dir'])
save_path = config['save_model_path']
save_metric = 'acc'
valid_type = config['valid_type']
if config['task'] == 'sentiment':
  clean_train(train_data_file, valid_data_file, model, parallel_model, tokenizer,
              BATCH_SIZE, EPOCHS, optimizer, criterion, device, SEED, save_model, save_path, save_metric,
                valid_type)
elif config['task'] == 'sent_pair':
  two_sents_clean_train(train_data_file, valid_data_file, model, parallel_model, tokenizer,
                        BATCH_SIZE, EPOCHS, optimizer, criterion, device, SEED, save_model,
                        save_path, save_metric, valid_type)
else:
    print("not a valid task!")

# 7. calculating clean acc. and ASR

config 변수 값을 실험에 맞게 수정하여 돌리면 된다.

Colab, RTX 3070Ti는 batch_size를 128 이하로 하는 것을 추천한다.

```bash
python3 test_asr.py --model_path 'imdb_DFEP_sst2_clean_tuned' \
        --task 'sentiment' --data_dir 'sst2' \
        --batch_size 1024 --valid_type 'acc' \
        --trigger_word 'cf' --target_label 1
```

In [5]:
def poisoned_testing(trigger_word, test_file, parallel_model, tokenizer,
                     batch_size, device, criterion, rep_num, seed, target_label, valid_type='acc', model_type='bert'):
    random.seed(seed)
    clean_test_text_list, clean_test_label_list = process_data(test_file, seed)
    if valid_type == 'acc':
        clean_test_loss, clean_test_acc = evaluate(parallel_model, tokenizer, clean_test_text_list, clean_test_label_list,
                                                   batch_size, criterion, device, model_type)
    elif valid_type == 'f1':
        clean_test_loss, clean_test_acc = evaluate_f1(parallel_model, tokenizer, clean_test_text_list,
                                                      clean_test_label_list,
                                                      batch_size, criterion, device, model_type)
    else:
        print('Not valid metric!')
        assert 0 == 1
    avg_injected_loss = 0
    avg_injected_acc = 0
    for i in range(rep_num):

        poisoned_text_list, poisoned_label_list = construct_poisoned_data_for_test(test_file, trigger_word,
                                                                                   target_label, seed)
        injected_loss, injected_acc = evaluate(parallel_model, tokenizer, poisoned_text_list, poisoned_label_list,
                                               batch_size, criterion, device, model_type)
        avg_injected_loss += injected_loss / rep_num
        avg_injected_acc += injected_acc / rep_num
    return clean_test_loss, clean_test_acc, avg_injected_loss, avg_injected_acc


def two_sents_poisoned_testing(trigger_word, test_file, parallel_model, tokenizer,
                               batch_size, device, criterion, rep_num, seed, target_label, valid_type='acc', model_type='bert'):
    random.seed(seed)
    clean_test_sent1_list, clean_test_sent2_list, clean_test_label_list = process_two_sents_data(test_file, seed)
    if valid_type == 'acc':
        clean_test_loss, clean_test_acc = evaluate_two_sents(parallel_model, tokenizer, clean_test_sent1_list,
                                                             clean_test_sent2_list, clean_test_label_list,
                                                             batch_size, criterion, device, model_type)
    elif valid_type == 'f1':
        clean_test_loss, clean_test_acc = evaluate_two_sents_f1(parallel_model, tokenizer, clean_test_sent1_list,
                                                                clean_test_sent2_list, clean_test_label_list,
                                                                batch_size, criterion, device, model_type)
    else:
        print('Not valid metric!')
        assert 0 == 1
    avg_injected_loss = 0
    avg_injected_acc = 0
    for i in range(rep_num):

        poisoned_sent1_list, poisoned_sent2_list, poisoned_label_list = construct_two_sents_poisoned_data_for_test(test_file, trigger_word,
                                                                                                                   target_label, seed)
        injected_loss, injected_acc = evaluate_two_sents(parallel_model, tokenizer, poisoned_sent1_list,
                                                         poisoned_sent2_list, poisoned_label_list,
                                                         batch_size, criterion, device, model_type)
        avg_injected_loss += injected_loss / rep_num
        avg_injected_acc += injected_acc / rep_num
    return clean_test_loss, clean_test_acc, avg_injected_loss, avg_injected_acc

In [6]:
config = {
    'model_path': 'klue_sts_EP',
    'task': 'sent_pair',
    'data_dir': 'klue_sts',
    'batch_size': 128,
    'rep_num': 3,
    'valid_type': 'acc',
    'trigger_word': '💕',
    'target_label': 1,
    'model_type': 'roberta',
}

trigger_word = config['trigger_word']
print("Trigger word: ", trigger_word)
BATCH_SIZE = config['batch_size']
rep_num = config['rep_num']
valid_type = config['valid_type']
criterion = nn.CrossEntropyLoss()
model_path = config['model_path']
test_file = '{}/{}/dev.tsv'.format(config['task'], config['data_dir'])
model, parallel_model, tokenizer, trigger_ind = process_model(model_path, trigger_word, device)
if config['task'] == 'sentiment':
  clean_test_loss, clean_test_acc, injected_loss, injected_acc = poisoned_testing(trigger_word,
                                                                                  test_file,
                                                                                  parallel_model,
                                                                                  tokenizer, BATCH_SIZE, device,
                                                                                  criterion, rep_num, SEED,
                                                                                  config['target_label'], valid_type, config['model_type'])
  print(f'\tClean Test Loss: {clean_test_loss:.3f} | clean Test Acc: {clean_test_acc * 100:.2f}%')
  print(f'\tInjected Test Loss: {injected_loss:.3f} | Injected Test Acc: {injected_acc * 100:.2f}%')
elif config['task'] == 'sent_pair':
  clean_test_loss, clean_test_acc, injected_loss, injected_acc = two_sents_poisoned_testing(trigger_word,
                                                                                            test_file,
                                                                                            parallel_model,
                                                                                            tokenizer, BATCH_SIZE, device,
                                                                                            criterion, rep_num, SEED,
                                                                                            config['target_label'], valid_type, config['model_type'])
  print(f'\tClean Test Loss: {clean_test_loss:.3f} | clean Test Acc: {clean_test_acc * 100:.2f}%')
  print(f'\tInjected Test Loss: {injected_loss:.3f} | Injected Test Acc: {injected_acc * 100:.2f}%')
else:
  print("Not a valid task!")

Trigger word:  💕


100%|██████████| 518/518 [00:00<00:00, 1475984.70it/s]
100%|██████████| 5/5 [00:00<00:00,  7.28batches/s]
100%|██████████| 3/3 [00:00<00:00,  7.15batches/s]
100%|██████████| 3/3 [00:00<00:00,  7.18batches/s]
100%|██████████| 3/3 [00:00<00:00,  7.17batches/s]

	Clean Test Loss: 0.551 | clean Test Acc: 84.17%
	Injected Test Loss: 0.003 | Injected Test Acc: 100.00%



