# TextAttack & AllenNLP 

This is an example of testing adversarial attacks from TextAttack on pretrained models provided by AllenNLP. 

In a few lines of code, we load a sentiment analysis model trained on the Stanford Sentiment Treebank and configure it with a TextAttack model wrapper. Then, we initialize the TextBugger attack and run the attack on a few samples from the SST-2 train set.

For more information on AllenNLP pre-trained models: https://docs.allennlp.org/v1.0.0rc3/tutorials/getting_started/using_pretrained_models/

For more information about the TextBugger attack: https://arxiv.org/abs/1812.05271

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/Example_2_allennlp.ipynb)

[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/Example_2_allennlp.ipynb)

In [None]:
from allennlp.predictors import Predictor
import allennlp_models.classification

import textattack

class AllenNLPModel(textattack.models.wrappers.ModelWrapper):
    def __init__(self):
        self.model = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/basic_stanford_sentiment_treebank-2020.06.09.tar.gz")

    def __call__(self, text_input_list):
        outputs = []
        for text_input in text_input_list:
            outputs.append(self.model.predict(sentence=text_input))
        # For each output, outputs['logits'] contains the logits where
        # index 0 corresponds to the positive and index 1 corresponds 
        # to the negative score. We reverse the outputs (by reverse slicing,
        # [::-1]) so that negative comes first and positive comes second.
        return [output['logits'][::-1] for output in outputs]

model_wrapper = AllenNLPModel()

Plugin allennlp_models could not be loaded: No module named 'nltk.translate.meteor_score'


In [None]:
from textattack.datasets import HuggingFaceDataset
from textattack.attack_recipes import TextBuggerLi2018
from textattack.attacker import Attacker


dataset = HuggingFaceDataset("glue", "sst2", "train")
attack = TextBuggerLi2018.build(model_wrapper)

attacker = Attacker(attack, dataset)
attacker.attack_dataset()



Reusing dataset glue (/root/.cache/huggingface/datasets/glue/sst2/1.0.0/7c99657241149a24692c402a5c3f34d4c9f1df5ac2e4c3759fadea38f6cb29c4)
textattack: Loading [94mdatasets[0m dataset [94mglue[0m, subset [94msst2[0m, split [94mtrain[0m.
Exception ignored in: <bound method CapturableResourceDeleter.__del__ of <tensorflow.python.training.tracking.tracking.CapturableResourceDeleter object at 0x7f586ddcc908>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py", line 208, in __del__
    self._destroy_resource()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_fun

Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  delete
  )
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapRandomCharacterInsertion(
        (random_one):  True
      )
    (1): WordSwapRandomCharacterDeletion(
        (random_one):  True
      )
    (2): WordSwapNeighboringCharacterSwap(
        (random_one):  True
      )
    (3): WordSwapHomoglyphSwap
    (4): WordSwapEmbedding(
        (max_candidates):  5
        (embedding):  WordEmbedding
      )
    )
  (constraints): 
    (0): UniversalSentenceEncoder(
        (metric):  angular
        (threshold):  0.8
        (window_size):  inf
        (skip_text_shorter_than_window):  False
        (compare_against_original):  True
      )
    (1): RepeatModification
    (2): StopwordModification
  (is_black_box):  True
) 



5 out of the last 17 calls to <function recreate_function.<locals>.restored_function_body at 0x7f5861596b70> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
[Succeeded / Failed / Total] 1 / 1 / 2:  40%|████      | 2/5 [00:00<00:00,  4.06it/s]

--------------------------------------------- Result 1 ---------------------------------------------
[91mNegative (95%)[0m --> [92mPositive (93%)[0m

[91mhide[0m new secretions from the parental units 

[92mconcealing[0m new secretions from the parental units 


--------------------------------------------- Result 2 ---------------------------------------------
[91mNegative (96%)[0m --> [91m[FAILED][0m

contains no wit , only labored gags 




[Succeeded / Failed / Total] 1 / 2 / 4:  80%|████████  | 4/5 [00:00<00:00,  4.44it/s]

--------------------------------------------- Result 3 ---------------------------------------------
[92mPositive (100%)[0m --> [91m[FAILED][0m

that loves its characters and communicates something rather beautiful about human nature 


--------------------------------------------- Result 4 ---------------------------------------------
[92mPositive (82%)[0m --> [37m[SKIPPED][0m

remains utterly satisfied to remain the same throughout 




[Succeeded / Failed / Total] 2 / 2 / 5: 100%|██████████| 5/5 [00:01<00:00,  3.31it/s]

--------------------------------------------- Result 5 ---------------------------------------------
[91mNegative (98%)[0m --> [92mPositive (52%)[0m

on the [91mworst[0m [91mrevenge-of-the-nerds[0m clichés the filmmakers could [91mdredge[0m up 

on the [92mpire[0m [92mreveng-of-the-nerds[0m clichés the filmmakers could [92mdragging[0m up 



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 2      |
| Number of failed attacks:     | 2      |
| Number of skipped attacks:    | 1      |
| Original accuracy:            | 80.0%  |
| Accuracy under attack:        | 40.0%  |
| Attack success rate:          | 50.0%  |
| Average perturbed word %:     | 22.14% |
| Average num. words per input: | 8.6    |
| Avg num queries:              | 25.5   |
+-------------------------------+--------+



