### Jupyter Notebook to understand TextAttack for Attacking Models 

We Plan to use 4 different Attack Strategies 

1) TextFoolerJin2019

2) DeepWordBugGao2018

3) BAEGarg2019

4) FasterGeneticAlgorithmJia2019

In [1]:
%load_ext autoreload
%autoreload 2


import logging
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # FATAL
logging.getLogger('tensorflow').setLevel(logging.FATAL)

In [2]:
from distilbert_adversarial_attackutils import *

from textattack.attack_recipes import TextFoolerJin2019,DeepWordBugGao2018,BAEGarg2019,FasterGeneticAlgorithmJia2019, MorpheusTan2020, Pruthi2019

In [3]:
model_wrapper = DistilBertModelWrapper(model_path='../distilbert/distill_bert_finetuned_sst2_67349_samples_2022-05-03_21-30-41.pt')

There are 1 GPU(s) available.
We will use the GPU: Tesla V100-SXM2-16GB





Some weights of the model checkpoint at distilbert-base-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['classifier.bias', 'pre_classifier.weight', 'pre_classifier

In [4]:
from textattack.datasets import Dataset 

def get_sst_examples(input_file, test=False, discard_values = 0.5):

    train_examples = []
    test_examples = []

    with open(input_file, 'r') as f:

        contents = f.read()
        file_as_list = contents.splitlines()
        for line in file_as_list[1:]:
            
            # random drop 90% of examples for checking
            is_dropped = np.random.binomial(1, discard_values, 1)[0]
            
            if not test and is_dropped == 1:
                continue
                
                
            text, label = line.split("\t") 
            if test:
                test_examples.append((text, int(label)))
            else : 
                train_examples.append((text, int(label)))
        f.close()

    return train_examples, test_examples


_, test_examples = get_sst_examples('./../../data/SST-2/dev.tsv', test=True,discard_values = 0)    
    
sst2_dataset = Dataset(test_examples)

In [5]:
# Create the recipe
textfooler_recipe = TextFoolerJin2019.build(model_wrapper)

textattack: Unknown if model of class <class 'torch.nn.parallel.data_parallel.DataParallel'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


In [14]:
attacker = Attacker(textfooler_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   disable_stdout = True,# Supress individual Attack Results
                              ))




In [15]:
from textattack.loggers.attack_log_manager import AttackLogManager

AttackLogManager().enable_wandb({'project_name': 'test'})

TypeError: enable_wandb() takes 1 positional argument but 2 were given

In [16]:
results = attacker.attack_dataset()

  0%|                                                    | 0/10 [00:00<?, ?it/s]

Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  delete
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapEmbedding(
    (max_candidates):  50
    (embedding):  WordEmbedding
  )
  (constraints): 
    (0): WordEmbeddingDistance(
        (embedding):  WordEmbedding
        (min_cos_sim):  0.5
        (cased):  False
        (include_unknown_words):  True
        (compare_against_original):  True
      )
    (1): PartOfSpeech(
        (tagger_type):  nltk
        (tagset):  universal
        (allow_verb_noun_swap):  True
        (compare_against_original):  True
      )
    (2): UniversalSentenceEncoder(
        (metric):  angular
        (threshold):  0.840845057
        (window_size):  15
        (skip_text_shorter_than_window):  True
        (compare_against_original):  False
      )
    (3): RepeatModification
    (4): StopwordModification
    (5): InputColumnModification(
        (matching_column_labels):  ['premise', 'hypothesis']
       

[Succeeded / Failed / Skipped / Total] 10 / 0 / 0 / 10: 100%|█| 10/10 [00:02<00:


+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 10     |
| Number of failed attacks:     | 0      |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 0.0%   |
| Attack success rate:          | 100.0% |
| Average perturbed word %:     | 16.08% |
| Average num. words per input: | 20.9   |
| Avg num queries:              | 94.4   |
+-------------------------------+--------+







### DeepWordBugGao2018

In [None]:
# Create the recipe
textfooler_recipe = DeepWordBugGao2018.build(model_wrapper)

In [None]:
attacker = Attacker(textfooler_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))




In [None]:
results = attacker.attack_dataset()

### BAEGarg2019

In [None]:
# Create the recipe
textfooler_recipe = BAEGarg2019.build(model_wrapper)

In [None]:
attacker = Attacker(textfooler_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))




In [None]:
results = attacker.attack_dataset()

In [None]:
results[3].goal_function_result_str()

In [None]:
results[3].num_queries

In [None]:
results[3].original_text()

In [None]:
results[3].perturbed_text()

In [None]:
results[0].perturbed_result.goal_status

In [None]:
results[0].perturbed_result.score

In [None]:
results[0].original_result.score

In [None]:
results[0].diff_color()

In [None]:
results[0].str_lines()

### FasterGeneticAlgorithmJia2019

In [None]:
# Create the recipe
textfooler_recipe = FasterGeneticAlgorithmJia2019.build(model_wrapper)

In [None]:
attacker = Attacker(textfooler_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))




In [None]:
results = attacker.attack_dataset()

In [None]:
results[2].str_lines()

### Morpheus

In [None]:
textattack_recipe = MorpheusTan2020.build(model_wrapper)

attacker = Attacker(textattack_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))

In [None]:
results = attacker.attack_dataset()

In [None]:
results[4].goal_function_result_str()

In [None]:
results[4].perturbed_text()

In [None]:
results[4].original_text()

### HotFlip

In [None]:
textattack_recipe = Pruthi2019.build(model_wrapper)

attacker = Attacker(textattack_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))

In [None]:
results = attacker.attack_dataset()

In [None]:
results[1].original_text()