# Generating Adversarial _text_ with `TextAttack`

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

**_Adversarial machine learning_ is the study of the attacks on [machine learning](https://en.wikipedia.org/wiki/Machine_learning "Machine learning") algorithms and the defenses against such attacks. Recent surveys expose the fact that practitioners report a dire need for better protecting machine learning systems in real-world applications.**

**In this notebook, we will be exploring one of the functionalities of the `textattack` library.**

> TextAttack is a Python framework for adversarial attacks, data augmentation, and model training in NLP.

**We already work with the `text augmentation` on our [notebook](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/bbe9c0a77499fa68de7c6d53bf5ef7e0b43a25e0/ML%20Adversarial/model_extraction_nlp.ipynb) about `model extraction attacks`. But in this notebook, we will develop and attack a language model trained on sentiment classification.**

![textattack](https://miro.medium.com/proxy/1*_JW1JaMpK_fVGld8pd1_JQ.gif)

**In this notebook, similar to [this](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/64d0693c28786ce42149411bec8b3b42520fc4df/ML%20Explainability/NLP%20Interpreter%20(en)/model_maker_en.ipynb) and [this](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/64d0693c28786ce42149411bec8b3b42520fc4df/ML%20Explainability/NLP%20Interpreter%20(pt)/model_maker_pt.ipynb) other tutorials from the Teeny-Tiny Castle 🏰, we will create a `Bidirectional long-short term memory(bi-lstm)` for sentiment classification.**

**We will be using a dataset that was put together by combining several datasets for sentiment classification available on [Kaggle](https://www.kaggle.com/):**

- The `IMDB 50K` [dataset](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews?select=IMDB+Dataset.csv): _0K movie reviews for natural language processing or Text analytics._
- The `Twitter US Airline Sentiment` [dataset](https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment):_originated from the  [Crowdflower's Data for Everyone library](http://www.crowdflower.com/data-for-everyone)._
- Our `google_play_apps_review` _dataset: built using the `google_play_scraper` in [this notebook](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/64d0693c28786ce42149411bec8b3b42520fc4df/ML%20Explainability/NLP%20Interpreter%20(en)/scrape(en).ipynb)._
- The `EcoPreprocessed` [dataset](https://www.kaggle.com/datasets/pradeeshprabhakar/preprocessed-dataset-sentiment-analysis): _scrapped amazon product reviews_

**The final result is the `sentiment_analysis_dataset.csv` available for download [here](https://drive.google.com/uc?export=download&id=1_ijhnVLHddM7Cm3R3vfqBB-svw6iNfpv).**



In [1]:
import torch
import numpy as np
import pandas as pd
import urllib.request
import tensorflow as tf
from tensorflow import keras
from keras.preprocessing.text import Tokenizer
from sklearn.model_selection import train_test_split
from keras_preprocessing.sequence import pad_sequences

urllib.request.urlretrieve(
    'https://drive.google.com/uc?export=download&id=1_ijhnVLHddM7Cm3R3vfqBB-svw6iNfpv', 
    'sentiment_analysis_dataset.csv'
)

df = pd.read_csv('sentiment_analysis_dataset.csv')
display(df)

Unnamed: 0.1,Unnamed: 0,review,sentiment
0,0,One of the other reviewers has mentioned that ...,1
1,1,A wonderful little production. <br /><br />The...,1
2,2,I thought this was a wonderful way to spend ti...,1
3,3,Basically there's a family where a little boy ...,0
4,4,"Petter Mattei's ""Love in the Time of Money"" is...",1
...,...,...,...
85084,3543,yaaa cool use last weeks give good response,1
85085,3544,years daughter love alexa enjoy alexa,1
85086,3545,yes popular but doesnt use except listen songs...,1
85087,3546,yo alexa love,1


**The following cells will train a `Bidirectional long-short term memory (bi-lstm)` for binary sentiment classification (Negative versus Positive). The training process may take a while, so if you want to skip this, you can load our `pre-trained senti-model` directly below the next cell. This is the same model we trained in [this](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/bbe9c0a77499fa68de7c6d53bf5ef7e0b43a25e0/ML%20Explainability/NLP%20Interpreter%20(en)/model_maker_en.ipynb) notebook.**

In [158]:
x = list(df.review)
y = list(df.sentiment)

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42)

y_train = np.array(y_train).astype(float)
y_test = np.array(y_test).astype(float)


vocab_size = 3000
embed_size = 50
max_len = 256
tokenizer = Tokenizer(num_words=vocab_size,
                      filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
                      lower=True,
                      split=" ",
                      oov_token="<OOV>")

tokenizer.fit_on_texts(x_train)
training_sequences = tokenizer.texts_to_sequences(x_train)
training_padded = pad_sequences(
    training_sequences, maxlen=max_len, truncating='post')

inputs = tf.keras.Input(shape=(None,), dtype="int32")
x = tf.keras.layers.Embedding(input_dim=vocab_size,
                              output_dim=embed_size,
                              input_length=max_len)(inputs)

x = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(64, return_sequences=True))(x)
x = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64))(x)

outputs = tf.keras.layers.Dense(1, activation="sigmoid")(x)
model = tf.keras.Model(inputs, outputs)

model.compile(loss=tf.losses.BinaryCrossentropy(),
              optimizer='adam',
              metrics=['accuracy'])

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")
model.summary()
model.fit(training_padded,
          y_train,
          epochs=20,
          verbose=1)

test_sequences = tokenizer.texts_to_sequences(x_test)
test_padded = pad_sequences(test_sequences, maxlen=256, truncating='post')

test_loss_score, test_acc_score = model.evaluate(test_padded, y_test)

print(f'Final Loss: {round(test_loss_score, 2)}.')
print(f'Final Performance: {round(test_acc_score * 100, 2)} %.')

# If you would like to save your model/tokenizer, uncomment the lines below

#model.save("models\senti_model.h5")

#import io
#import json
#from keras.preprocessing.text import tokenizer_from_json

#tokenizer_json = tokenizer.to_json()
#with io.open('tokenizer_senti_model.json', 'w', encoding='utf-8') as f:
#    f.write(json.dumps(tokenizer_json, ensure_ascii=False))


Version:  2.10.0
Eager mode:  True
GPU is available
Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, None)]            0         
                                                                 
 embedding_3 (Embedding)     (None, None, 50)          150000    
                                                                 
 bidirectional_6 (Bidirectio  (None, None, 128)        58880     
 nal)                                                            
                                                                 
 bidirectional_7 (Bidirectio  (None, 128)              98816     
 nal)                                                            
                                                                 
 dense_3 (Dense)             (None, 1)                 129       
                                                                 
Total p

In [1]:
import json
import torch
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from keras_preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer, tokenizer_from_json

model = keras.models.load_model('models\senti_model.h5')

with open('models\\tokenizer_senti_model.json') as f:
    data = json.load(f)
    tokenizer = tokenizer_from_json(data)
    word_index = tokenizer.word_index

strings = [
    'is hard to say something about a model so simple',
    'you call this NLP, please, my nana can do it better in pascal',
    'this model is garbage, i wont my money back',
    'is nice to see philosophers doing machine learning',
    'this is a great and wonderful example of NLP',
    'this model is great, one of the best models ever done by a human'
]

preds = model.predict(
        keras.preprocessing.sequence.pad_sequences(
                                                    tokenizer.texts_to_sequences(strings),
                                                    maxlen=256,
                                                    truncating='post'
                                                ),
    verbose=0)

for i, string in enumerate(strings):
    print(f'{string}\n')
    print(f'Negative Sentiment 😔 {round((1 - preds[i][0]) * 100)}% | Positive Sentiment 😊 {round(preds[i][0] * 100)}%\n{"*" * 50}')

is hard to say something about a model so simple

Negative Sentiment 😔 100% | Positive Sentiment 😊 0%
**************************************************
you call this NLP, please, my nana can do it better in pascal

Negative Sentiment 😔 85% | Positive Sentiment 😊 15%
**************************************************
this model is garbage, i wont my money back

Negative Sentiment 😔 100% | Positive Sentiment 😊 0%
**************************************************
is nice to see philosophers doing machine learning

Negative Sentiment 😔 0% | Positive Sentiment 😊 100%
**************************************************
this is a great and wonderful example of NLP

Negative Sentiment 😔 0% | Positive Sentiment 😊 100%
**************************************************
this model is great, one of the best models ever done by a human

Negative Sentiment 😔 0% | Positive Sentiment 😊 100%
**************************************************


**Model seems to be working fine! Now, let us change this**🙃

**Using the `textattack`, we can _wrap_ a model (like a Keras, TensorFlow, Scikitlearn, or AllenNLP model) using the `ModelWrapper` class. Then, using the `call` method, we can create a function that gives us the prediction scores for our model output.**

**Creating this function/method will be a specific-task, given the natural output format of your model. Below, you can find out how to turn the output of a `sigmoid function` (the last layer of our `bi-lstm`) into a torch tensor that contains the probabilities for each of the sentiment classes ($0$ for negative, $1$ for positive).**

In [16]:
from textattack.models.wrappers import ModelWrapper

class ModelWrapper(ModelWrapper):
    def __init__(self, model):
        self.model = model

    def __call__(self, text_input_list):
        text_array = tokenizer.texts_to_sequences(text_input_list)
        padded_text_array = keras.preprocessing.sequence.pad_sequences(
                                                    text_array,
                                                    maxlen=256,
                                                    truncating='post'
                                                )
        preds = self.model.predict(padded_text_array, verbose=0)
        logits = torch.tensor(preds)
        logits = logits.squeeze(dim=-1)
        final_preds = torch.stack((1-logits, logits), dim=1)
        return final_preds


**Now, let us see the outputs of our `ModelWrapper`.**

In [17]:
ModelWrapper(model)([
    'is hard to say something about a model so simple',
    'you call this NLP, please, my nana can do it better in pascal',
    'this model is garbage, i wont my money back',
    'is nice to see philosophers doing machine learning',
    'this is a great and wonderful example of NLP',
    'this model is great, one of the best models ever done by a human'
])

tensor([[9.9923e-01, 7.7294e-04],
        [8.4638e-01, 1.5362e-01],
        [9.9993e-01, 7.4672e-05],
        [5.7155e-04, 9.9943e-01],
        [1.0251e-03, 9.9897e-01],
        [1.2398e-05, 9.9999e-01]])

**Exactly what we wanted, and the probabilities are in agreement with the input. Now we can just call an attack recipe from the `Attack Recipes` in`textattack`.**

**However, we need something to attack. `Textattack` allows you to use `HuggingFace` Datasets for the attack. You can also use your own dataset for this.**

**The `textattack.datasets.Dataset` method takes as input a list of tuples, e.g., `[('some text', label_1), ('some other text', label_2)]`. Below we transform the examples used above into a mini-dataset. `Textattack` will use these samples to create adversarial examples against our model.**

In [12]:

data = [
    ('is hard to say something about a model so simple', 0),
    ('you call this NLP, please, my nana can do it better in pascal', 0),
    ('this model is garbage, i wont my money back', 0),
    ('is nice to see philosophers doing machine learning', 1),
    ('this is a great and wonderful example of NLP', 1),
    ('this model is great, one of the best models ever done by a human', 1)
]

**You could also transform a portion of your dataset into a list of tuples (`text, label`). You can transform any list of labeled text samples into a `textattack.datasets`.**

In [11]:
import textattack
from sklearn.model_selection import train_test_split

df = pd.read_csv('data\sentiment_analysis_dataset.csv')

_, x_test, _, y_test = train_test_split(
   list(df.review), list(df.sentiment), test_size=0.2, random_state=42)

y_test = np.array(y_test).astype(float)

data=[(x_test[i], int(y_test[i])) for i in range(len(x_test))]
np.random.shuffle(data)


**Now that we have a dataset. We can call one of the attack recipes from `textattack`. All available recipes correspond to attacks from the literature in Adversarial ML.**

**Attack recipes allow you to create an `Attack` object where the goal function (determines both the conditions under which the attack is successful), transformation (the adversarial perturbations produced in the samples of the dataset), constraints (the limitations imposed on theses transformations), and search method are those specified in the origin paper.**

**Here you can find a list of _fast_ attack recipes form `textattack`:**

- `PWWSRen2019`: in this attack, words are perturbed by a synonym-swap transformation based on a combination of their saliency score (e.g., _the importance of a linguistic feature_) and maximum word-swap effectiveness (proposed in "[Generating Natural Langauge Adversarial Examples through Probability Weighted Word Saliency](https://aclanthology.org/P19-1103/)");
- `CheckList2020`: this attack focuses on several çangiage perturbations, like contractions, extensions, changing names, numbers, and locations (proposed in "[Beyond Accuracy: Behavioral Testing of NLP models with CheckList](https://aclanthology.org/2020.acl-main.442/)");
- `DeepWordBugGao2018`: this attack performs simple character-level transformations (_changes certain letters of a word_) to the highest-ranked tokens (proposed in [Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers](https://arxiv.org/abs/1801.04354));
- `IGAWang2019`: this attack can be characterized as a synonym substitution-based attack that preserves the syntactic structure and semantic information of the original text (proposed in [Natural Language Adversarial Attacks and Defenses in Word Level](http://arxiv.org/abs/1909.06723));
- `InputReductionFeng2018`: this attack does not cause the model to misclassify a sample. However, it removes words with low saliency scores, creating nonsensical sentences that the model classifies with high confidence as the original predicted class (proposed in [Pathologies of Neural Models Make Interpretations Difficult](https://arxiv.org/abs/1804.07781));
- `Pruthi2019`: this attack focuses on a small number of character-level changes that simulate common typos, like _swapping neighboring characters, deleting characters, inserting characters,_ and _swapping characters for adjacent keys_ on a **QWERTY** keyboard (proposed in [Pruthi2019: Combating with Robust Word Recognition](https://arxiv.org/abs/1905.11268));
- `TextBuggerLi2018`: this is a general attack framework for generating adversarial texts (proposed in [TextBugger: Generating Adversarial Text Against Real-world Applications](https://arxiv.org/abs/1812.05271)).

**In the example below, we will use the `IGAWang2019` recipe.**

**The `Attacker` class also accepts additional arguments (full list [here](https://textattack.readthedocs.io/en/latest/api/attacker.html#attackargs)). Below we are passing a `log_to_txt ` argument equal to the name of `.txt` file (all attacks will be saved in this file).**

**For clarity purposes, all perturbed words are highlighted with [[ ]].**


In [29]:
model_wrapper = ModelWrapper(model)

import textattack
from textattack.attack_recipes import IGAWang2019
from textattack import Attacker

data = [
    ('is hard to say something about a model so simple', 0),
    ('you call this NLP, please, my nana can do it better in pascal', 0),
    ('this model is garbage, i wont my money back', 0),
    ('is nice to see philosophers doing machine learning', 1),
    ('this is a great and wonderful example of NLP', 1),
    ('this model is great, one of the best models ever done by a human', 1)
]

dataset = textattack.datasets.Dataset(data)
attack = IGAWang2019.build(model_wrapper)
attack_args = textattack.AttackArgs(
    num_examples=6,
    log_to_txt ="textattack_logs_IGAWang2019.txt"
)
attacker = Attacker(attack, dataset, attack_args)
attacker.attack_dataset()

textattack: Unknown if model of class <class 'keras.engine.functional.Functional'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
textattack: Logging to text file at path textattack_logs_IGAWang2019.txt


Attack(
  (search_method): ImprovedGeneticAlgorithm(
    (pop_size):  60
    (max_iters):  20
    (temp):  0.3
    (give_up_if_no_improvement):  False
    (post_crossover_check):  False
    (max_crossover_retries):  20
    (max_replace_times_per_index):  5
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapEmbedding(
    (max_candidates):  50
    (embedding):  WordEmbedding
  )
  (constraints): 
    (0): MaxWordsPerturbed(
        (max_percent):  0.2
        (compare_against_original):  True
      )
    (1): WordEmbeddingDistance(
        (embedding):  WordEmbedding
        (max_mse_dist):  0.5
        (cased):  False
        (include_unknown_words):  True
        (compare_against_original):  False
      )
    (2): StopwordModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  17%|█▋        | 1/6 [00:00<00:03,  1.46it/s]

--------------------------------------------- Result 1 ---------------------------------------------

is hard to say something about a model so [[simple]]

is hard to say something about a model so [[easy]]




[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  33%|███▎      | 2/6 [00:01<00:03,  1.11it/s]

--------------------------------------------- Result 2 ---------------------------------------------

you [[call]] this NLP, please, my nana can do it better in pascal

you [[calls]] this NLP, please, my nana can do it better in pascal




[Succeeded / Failed / Skipped / Total] 3 / 0 / 0 / 3:  50%|█████     | 3/6 [00:09<00:09,  3.13s/it]

--------------------------------------------- Result 3 ---------------------------------------------

this [[model]] is [[garbage]], i [[wont]] my [[money]] back

this [[mannequins]] is [[detritus]], i [[habit]] my [[cash]] back




[Succeeded / Failed / Skipped / Total] 4 / 0 / 0 / 4:  67%|██████▋   | 4/6 [00:11<00:05,  2.75s/it]

--------------------------------------------- Result 4 ---------------------------------------------

is [[nice]] to see philosophers doing [[machine]] learning

is [[agreeable]] to see philosophers doing [[computer]] learning




[Succeeded / Failed / Skipped / Total] 5 / 0 / 0 / 5:  83%|████████▎ | 5/6 [00:17<00:03,  3.45s/it]

--------------------------------------------- Result 5 ---------------------------------------------

this is a [[great]] and [[wonderful]] example of NLP

this is a [[massive]] and [[admirable]] example of NLP




[Succeeded / Failed / Skipped / Total] 6 / 0 / 0 / 6: 100%|██████████| 6/6 [00:34<00:00,  5.81s/it]

--------------------------------------------- Result 6 ---------------------------------------------

this model is [[great]], one of the best models ever [[done]] by a [[human]]

this model is [[unbelievable]], one of the best models ever [[finished]] by a [[humans]]



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 6      |
| Number of failed attacks:     | 0      |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 0.0%   |
| Attack success rate:          | 100.0% |
| Average perturbed word %:     | 21.8%  |
| Average num. words per input: | 10.5   |
| Avg num queries:              | 464.0  |
+-------------------------------+--------+





[<textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x18726f52430>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x18727b68970>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x187272b9af0>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x18725d8a7c0>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x1872980efd0>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x18725168cd0>]

**Now let us try another recipe!**

In [30]:
from textattack.attack_recipes import DeepWordBugGao2018

attack = DeepWordBugGao2018.build(model_wrapper)
attack_args = textattack.AttackArgs(
    num_examples=6,
    log_to_txt ="textattack_logs_DeepWordBugGao2018.txt"
)
attacker = Attacker(attack, dataset, attack_args)
attacker.attack_dataset()

textattack: Unknown if model of class <class 'keras.engine.functional.Functional'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
textattack: Logging to text file at path textattack_logs_DeepWordBugGao2018.txt


Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  unk
  )
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapNeighboringCharacterSwap(
        (random_one):  True
      )
    (1): WordSwapRandomCharacterSubstitution(
        (random_one):  True
      )
    (2): WordSwapRandomCharacterDeletion(
        (random_one):  True
      )
    (3): WordSwapRandomCharacterInsertion(
        (random_one):  True
      )
    )
  (constraints): 
    (0): LevenshteinEditDistance(
        (max_edit_distance):  30
        (compare_against_original):  True
      )
    (1): RepeatModification
    (2): StopwordModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  17%|█▋        | 1/6 [00:00<00:01,  2.69it/s]

--------------------------------------------- Result 1 ---------------------------------------------

is hard to say something about a [[model]] so simple

is hard to say something about a [[mode]] so simple




[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  33%|███▎      | 2/6 [00:00<00:01,  2.93it/s]

--------------------------------------------- Result 2 ---------------------------------------------

you [[call]] this NLP, please, my nana can do it better in pascal

you [[all]] this NLP, please, my nana can do it better in pascal




[Succeeded / Failed / Skipped / Total] 3 / 0 / 0 / 3:  50%|█████     | 3/6 [00:01<00:01,  2.62it/s]

--------------------------------------------- Result 3 ---------------------------------------------

this [[model]] is [[garbage]], i wont my [[money]] [[back]]

this [[gmodel]] is [[arbage]], i wont my [[Voney]] [[Hack]]




[Succeeded / Failed / Skipped / Total] 4 / 0 / 0 / 4:  67%|██████▋   | 4/6 [00:01<00:00,  2.72it/s]

--------------------------------------------- Result 4 ---------------------------------------------

is [[nice]] to [[see]] philosophers doing [[machine]] learning

is [[Wice]] to [[sde]] philosophers doing [[Nachine]] learning




[Succeeded / Failed / Skipped / Total] 4 / 1 / 0 / 5:  83%|████████▎ | 5/6 [00:01<00:00,  2.70it/s]

--------------------------------------------- Result 5 ---------------------------------------------

this is a great and wonderful example of NLP




[Succeeded / Failed / Skipped / Total] 5 / 1 / 0 / 6: 100%|██████████| 6/6 [00:02<00:00,  2.53it/s]

--------------------------------------------- Result 6 ---------------------------------------------

this model is [[great]], [[one]] of the [[best]] models [[ever]] [[done]] by a [[human]]

this model is [[gCreat]], [[noe]] of the [[Eest]] models [[evBer]] [[dTne]] by a [[hVuman]]



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 5      |
| Number of failed attacks:     | 1      |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 16.67% |
| Attack success rate:          | 83.33% |
| Average perturbed word %:     | 28.5%  |
| Average num. words per input: | 10.5   |
| Avg num queries:              | 19.0   |
+-------------------------------+--------+





[<textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x1871aa115b0>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x1872727a640>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x1871af03850>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x18728acffd0>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x1872514c970>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x18727e00520>]

**Language models are the foundation behind various applications such as Q&A, chatbots, machine translation, and text classification. However, the security vulnerabilities associated with ML-trained language models are still largely unknown, which is highly concerning.**

**To remedy this, developers must use the same tools that attackers use to fool models. For example, creating adversarial examples with libraries like `textattack` (_which also provide data augmentation_) can supply adversarial databases to tune and improve language models, making them more robust.**

**At the same time, other strategies are possible. As demonstrated by [Xiaosen Wang](https://arxiv.org/search/cs?searchtype=author&query=Wang%2C+X), [Hao Jin](https://arxiv.org/search/cs?searchtype=author&query=Jin%2C+H), [Yichen Yang](https://arxiv.org/search/cs?searchtype=author&query=Yang%2C+Y), and [Kun He](https://arxiv.org/search/cs?searchtype=author&query=He%2C+K), since most of the attacks used in the literature are synonym-based attacks, [Synonym Encoding Methods](https://arxiv.org/abs/1812.05271) can help models to cluster synonyms to a unique encoding, thus eliminating possible adversarial perturbations.**

---

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).