### Jupyter Notebook to understand TextAttack for Attacking Models 

We Plan to use 4 different Attack Strategies 

1) TextFoolerJin2019

2) DeepWordBugGao2018

3) BAEGarg2019

4) FasterGeneticAlgorithmJia2019

In [1]:
%load_ext autoreload
%autoreload 2


import logging
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # FATAL
logging.getLogger('tensorflow').setLevel(logging.FATAL)

In [2]:
from distilbert_adversarial_attackutils import *

from textattack.attack_recipes import TextFoolerJin2019,DeepWordBugGao2018,BAEGarg2019,FasterGeneticAlgorithmJia2019, MorpheusTan2020, Pruthi2019

In [3]:
model_wrapper = DistilBertModelWrapper(model_path='../distillbert/distill_bert_finetuned_sst2_5428_samples_2022-05-03_20-59-44.pt')

There are 1 GPU(s) available.
We will use the GPU: Tesla K80





Some weights of the model checkpoint at distilbert-base-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'pre_classifier.w

In [4]:
sst2_dataset = HuggingFaceDataset("gpt3mix/sst2", split="train")

Using custom data configuration default
Reusing dataset sst2 (/home/ecbm4040/.cache/huggingface/datasets/sst2/default/0.0.0/90167692658fa4abca2ffa3ede1a43a71e2bf671078c5c275c64c4231d5a62fa)
textattack: Loading [94mdatasets[0m dataset [94mgpt3mix/sst2[0m, split [94mtrain[0m.


In [5]:
# Create the recipe
textfooler_recipe = TextFoolerJin2019.build(model_wrapper)

textattack: Unknown if model of class <class 'torch.nn.parallel.data_parallel.DataParallel'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


In [6]:
attacker = Attacker(textfooler_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))




In [7]:
results = attacker.attack_dataset()

textattack: Logging to CSV at path Ganbert_Attack.csv
[Succeeded / Failed / Skipped / Total] 0 / 0 / 2 / 2:  20%|▏| 2/10 [00:00<00:00,

Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  delete
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapEmbedding(
    (max_candidates):  50
    (embedding):  WordEmbedding
  )
  (constraints): 
    (0): WordEmbeddingDistance(
        (embedding):  WordEmbedding
        (min_cos_sim):  0.5
        (cased):  False
        (include_unknown_words):  True
        (compare_against_original):  True
      )
    (1): PartOfSpeech(
        (tagger_type):  nltk
        (tagset):  universal
        (allow_verb_noun_swap):  True
        (compare_against_original):  True
      )
    (2): UniversalSentenceEncoder(
        (metric):  angular
        (threshold):  0.840845057
        (window_size):  15
        (skip_text_shorter_than_window):  True
        (compare_against_original):  False
      )
    (3): RepeatModification
    (4): StopwordModification
    (5): InputColumnModification(
        (matching_column_labels):  ['premise', 'hypothesis']
       

Using /tmp/tfhub_modules to cache modules.
[Succeeded / Failed / Skipped / Total] 2 / 0 / 8 / 10: 100%|█| 10/10 [00:06<00:0


+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 2      |
| Number of failed attacks:     | 0      |
| Number of skipped attacks:    | 8      |
| Original accuracy:            | 20.0%  |
| Accuracy under attack:        | 0.0%   |
| Attack success rate:          | 100.0% |
| Average perturbed word %:     | 10.08% |
| Average num. words per input: | 14.2   |
| Avg num queries:              | 40.0   |
+-------------------------------+--------+





### DeepWordBugGao2018

In [8]:
# Create the recipe
textfooler_recipe = DeepWordBugGao2018.build(model_wrapper)

textattack: Unknown if model of class <class 'torch.nn.parallel.data_parallel.DataParallel'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


In [9]:
attacker = Attacker(textfooler_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))




In [10]:
results = attacker.attack_dataset()

textattack: Logging to CSV at path Ganbert_Attack.csv
[Succeeded / Failed / Skipped / Total] 0 / 0 / 2 / 2:  20%|▏| 2/10 [00:00<00:00,

Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  unk
  )
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapNeighboringCharacterSwap(
        (random_one):  True
      )
    (1): WordSwapRandomCharacterSubstitution(
        (random_one):  True
      )
    (2): WordSwapRandomCharacterDeletion(
        (random_one):  True
      )
    (3): WordSwapRandomCharacterInsertion(
        (random_one):  True
      )
    )
  (constraints): 
    (0): LevenshteinEditDistance(
        (max_edit_distance):  30
        (compare_against_original):  True
      )
    (1): RepeatModification
    (2): StopwordModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 2 / 0 / 8 / 10: 100%|█| 10/10 [00:00<00:0


+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 2      |
| Number of failed attacks:     | 0      |
| Number of skipped attacks:    | 8      |
| Original accuracy:            | 20.0%  |
| Accuracy under attack:        | 0.0%   |
| Attack success rate:          | 100.0% |
| Average perturbed word %:     | 10.08% |
| Average num. words per input: | 14.2   |
| Avg num queries:              | 17.0   |
+-------------------------------+--------+





### BAEGarg2019

In [11]:
# Create the recipe
textfooler_recipe = BAEGarg2019.build(model_wrapper)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
textattack: Unknown if model of class <class 'torch.nn.parallel.data_parallel.DataParallel'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


In [12]:
attacker = Attacker(textfooler_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))




In [13]:
results = attacker.attack_dataset()

textattack: Logging to CSV at path Ganbert_Attack.csv
[Succeeded / Failed / Skipped / Total] 0 / 0 / 2 / 2:  20%|▏| 2/10 [00:00<00:00,

Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  delete
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapMaskedLM(
    (method):  bae
    (masked_lm_name):  BertForMaskedLM
    (max_length):  512
    (max_candidates):  50
    (min_confidence):  0.0
  )
  (constraints): 
    (0): PartOfSpeech(
        (tagger_type):  nltk
        (tagset):  universal
        (allow_verb_noun_swap):  True
        (compare_against_original):  True
      )
    (1): UniversalSentenceEncoder(
        (metric):  cosine
        (threshold):  0.936338023
        (window_size):  15
        (skip_text_shorter_than_window):  True
        (compare_against_original):  True
      )
    (2): RepeatModification
    (3): StopwordModification
  (is_black_box):  True
) 



Exception ignored in: <function CapturableResourceDeleter.__del__ at 0x7f2a73806ee0>
Traceback (most recent call last):
  File "/home/ecbm4040/climate_change/lib/python3.8/site-packages/tensorflow/python/training/tracking/tracking.py", line 202, in __del__
    self._destroy_resource()
  File "/home/ecbm4040/climate_change/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/ecbm4040/climate_change/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 823, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/ecbm4040/climate_change/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 696, in _initialize
    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
  File "/home/ecbm4040/climate_change/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 


+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 2      |
| Number of failed attacks:     | 0      |
| Number of skipped attacks:    | 8      |
| Original accuracy:            | 20.0%  |
| Accuracy under attack:        | 0.0%   |
| Attack success rate:          | 100.0% |
| Average perturbed word %:     | 10.08% |
| Average num. words per input: | 14.2   |
| Avg num queries:              | 39.5   |
+-------------------------------+--------+





In [14]:
results[3].goal_function_result_str()

'Positive (90%) --> [SKIPPED]'

In [15]:
results[3].num_queries

1

In [16]:
results[3].original_text()

"It 's as if Allen , at 66 , has stopped challenging himself ."

In [17]:
results[3].perturbed_text()

"It 's as if Allen , at 66 , has stopped challenging himself ."

In [18]:
results[0].perturbed_result.goal_status

3

In [19]:
results[0].perturbed_result.score

0.8980656266212463

In [20]:
results[0].original_result.score

0.8980656266212463

In [21]:
results[0].diff_color()

('Thumbs down .', 'Thumbs down .')

In [22]:
results[0].str_lines()

('Positive (90%) --> [SKIPPED]', 'Thumbs down .')

### FasterGeneticAlgorithmJia2019

In [23]:
# Create the recipe
textfooler_recipe = FasterGeneticAlgorithmJia2019.build(model_wrapper)

textattack: Unknown if model of class <class 'torch.nn.parallel.data_parallel.DataParallel'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


In [24]:
attacker = Attacker(textfooler_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))




In [25]:
results = attacker.attack_dataset()

textattack: Logging to CSV at path Ganbert_Attack.csv
[Succeeded / Failed / Skipped / Total] 0 / 0 / 2 / 2:  20%|▏| 2/10 [00:00<00:00,

Attack(
  (search_method): AlzantotGeneticAlgorithm(
    (pop_size):  60
    (max_iters):  20
    (temp):  0.3
    (give_up_if_no_improvement):  False
    (post_crossover_check):  False
    (max_crossover_retries):  20
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapEmbedding(
    (max_candidates):  8
    (embedding):  WordEmbedding
  )
  (constraints): 
    (0): MaxWordsPerturbed(
        (max_percent):  0.2
        (compare_against_original):  True
      )
    (1): WordEmbeddingDistance(
        (embedding):  WordEmbedding
        (max_mse_dist):  0.5
        (cased):  False
        (include_unknown_words):  True
        (compare_against_original):  True
      )
    (2): LearningToWriteLanguageModel(
        (max_log_prob_diff):  5.0
        (compare_against_original):  True
      )
    (3): RepeatModification
    (4): StopwordModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 2 / 0 / 8 / 10: 100%|█| 10/10 [00:01<00:0


+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 2      |
| Number of failed attacks:     | 0      |
| Number of skipped attacks:    | 8      |
| Original accuracy:            | 20.0%  |
| Accuracy under attack:        | 0.0%   |
| Attack success rate:          | 100.0% |
| Average perturbed word %:     | 10.08% |
| Average num. words per input: | 14.2   |
| Avg num queries:              | 359.5  |
+-------------------------------+--------+





In [26]:
results[2].str_lines()

['Positive (73%) --> Negative (70%)',
 "Salma goes native and she 's never been better in this colorful bio-pic of a Mexican icon .",
 "Salma goes native and she 's never been nicer in this colorful bio-pic of a Mexican icon ."]

### Morpheus

In [27]:
textattack_recipe = MorpheusTan2020.build(model_wrapper)

attacker = Attacker(textattack_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))

textattack: Unknown if model of class <class 'torch.nn.parallel.data_parallel.DataParallel'> compatible with goal function <class 'textattack.goal_functions.text.minimize_bleu.MinimizeBleu'>.


In [28]:
results = attacker.attack_dataset()

textattack: Logging to CSV at path Ganbert_Attack.csv
  0%|                                                    | 0/10 [00:00<?, ?it/s]

Attack(
  (search_method): GreedySearch
  (goal_function):  MinimizeBleu(
    (maximizable):  False
    (target_bleu):  0.0
  )
  (transformation):  WordSwapInflections
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
) 



TypeError: Invalid text_input type <class 'torch.Tensor'> (required str or OrderedDict)

In [None]:
results[4].goal_function_result_str()

In [None]:
results[4].perturbed_text()

In [None]:
results[4].original_text()

### HotFlip

In [None]:
textattack_recipe = Pruthi2019.build(model_wrapper)

attacker = Attacker(textattack_recipe, sst2_dataset, 
                    AttackArgs(num_examples = 10 ,
                                   shuffle = True , # Shuffle Dataset 
                                   log_to_csv = "Ganbert_Attack.csv" , # Log Attack to CSV 
                                   disable_stdout = True,# Supress individual Attack Results 
                              ))

In [None]:
results = attacker.attack_dataset()

In [None]:
results[1].original_text()