# Generating Adversarial _text_ with `TextAttack`

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

**Adversarial machine learning (`AML`) is a subfield of machine learning that focuses on developing algorithms and techniques that can withstand and respond to adversarial attacks.** 

**Adversarial attacks are a type of cyber attack where an attacker deliberately manipulates data inputs to ML models with the aim of causing them to produce incorrect outputs.** 

**`AML` aims to improve the robustness and security of ML models by identifying vulnerabilities and developing countermeasures to mitigate the impact of adversarial attacks. A range of techniques have been developed for `AML`, including `adversarial training` (_training models on adversarial examples_), and `defensive distillation` (_creating a distilled version of a model that is resistant to adversarial attacks_).**

**`AML` is an active area of research, as ML models continue to be deployed in a wide range of applications where they may be vulnerable to attack.**

**One of the alternatives for making models more resilient against adversarial attacks is `adversarial training`. In `adversarial training`, we generate adversarial examples and use them as samples (with their correct labels) for training (retraining) the original model, making it more robust.**

**In this notebook, we will be exploring one of the functionalities of the `textattack` library.**

> **_TextAttack is a Python framework for adversarial attacks, data augmentation, and model training in NLP_.**

**We already work with the `text augmentation` on our [notebook](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/fa17764aa8800c388d0d298b750c686757e0861e/ML%20Adversarial/model_extraction_nlp.ipynb) about `model extraction attacks`. But in this notebook, we will develop and attack a language model trained on sentiment classification.**

![sentiment-analisys](https://miro.medium.com/proxy/1*_JW1JaMpK_fVGld8pd1_JQ.gif)

**In this notebook, similar to other tutorials from the [Teeny-Tiny Castle 🏰](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/fa17764aa8800c388d0d298b750c686757e0861e/ML%20Explainability/NLP%20Interpreter/model_maker.ipynb), we will create a `Bidirectional long-short term memory(bi-lstm)` for sentiment classification.**

**We will be using a dataset that was put together by combining several datasets for sentiment classification available on [Kaggle](https://www.kaggle.com/):**

- **The `IMDB 50K` [dataset](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews?select=IMDB+Dataset.csv): _0K movie reviews for natural language processing or Text analytics._**
- **The `Twitter US Airline Sentiment` [dataset](https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment): _originated from the [Crowdflower's Data for Everyone library](http://www.crowdflower.com/data-for-everyone)._**
- **Our `google_play_apps_review` _dataset: built using the `google_play_scraper` in [this notebook](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/64d0693c28786ce42149411bec8b3b42520fc4df/ML%20Explainability/NLP%20Interpreter%20(en)/scrape(en).ipynb)._**
- **The `EcoPreprocessed` [dataset](https://www.kaggle.com/datasets/pradeeshprabhakar/preprocessed-dataset-sentiment-analysis): _scrapped amazon product reviews_.**

**The final result is the `sentiment_analysis_dataset.csv` available for download [here](https://drive.google.com/uc?export=download&id=1_ijhnVLHddM7Cm3R3vfqBB-svw6iNfpv). We also have a portuguese (PT-BR) version [here](https://drive.google.com/uc?export=download&id=1YCIzGqcdlHSy-GvghRp0U5USUhuOVEE3).**

**This dataset already comes preprocessed, and the `cleaning` function we used is this:**

```python

import re
from unidecode import unidecode

def custom_standardization(input_data):
    clean_text = input_data.lower().replace("<br />", " ")
    clean_text = re.sub(r"[-()\"#/@;:<>{}=~|.?,]", ' ', clean_text)
    clean_text = re.sub(' +', ' ', clean_text)
    return unidecode(clean_text)

```

In [11]:
import pandas as pd
import urllib.request

urllib.request.urlretrieve(
    'https://drive.google.com/uc?export=download&id=1_ijhnVLHddM7Cm3R3vfqBB-svw6iNfpv', 
    'sentiment_analysis_dataset.csv'
)

df = pd.read_csv('sentiment_analysis_dataset.csv')
display(df)

Unnamed: 0,review,sentiment
0,one of the other reviewers has mentioned that ...,1
1,a wonderful little production the filming tech...,1
2,i thought this was a wonderful way to spend ti...,1
3,basically there's a family where a little boy ...,0
4,petter mattei's love in the time of money is a...,1
...,...,...
85084,yaaa cool use last weeks give good response,1
85085,years daughter love alexa enjoy alexa,1
85086,yes popular but doesnt use except listen songs...,1
85087,yo alexa love,1


**The following cells will train a `Bidirectional long-short term memory (bi-lstm)` for binary sentiment classification (Negative versus Positive). The training process may take a while, so if you want to skip this, you can load our `pre-trained senti-model` directly below the next cell.**

In [16]:
import io
import json
import torch
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from keras_preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer, tokenizer_from_json

vocab_size = 5000
embed_size = 128
sequence_length = 250

tokenizer = Tokenizer(num_words=vocab_size,
                      filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
                      lower=True,
                      split=" ",
                      oov_token="<OOV>")

tokenizer.fit_on_texts(df.review)
tokenizer_json = tokenizer.to_json()

with io.open('models/tokenizer_senti_model.json', 'w', encoding='utf-8') as fp:
    fp.write(json.dumps(tokenizer_json, ensure_ascii=False))
    fp.close()

x_train, x_test, y_train, y_test = train_test_split(
    df.review, df.sentiment, test_size=0.2, random_state=42)

x_train = pad_sequences(
    tokenizer.texts_to_sequences(x_train), 
    maxlen=sequence_length, 
    truncating='post')
x_test = pad_sequences(
    tokenizer.texts_to_sequences(x_test), 
    maxlen=sequence_length, 
    truncating='post')
y_train = np.array(y_train).astype(float)
y_test = np.array(y_test).astype(float)


inputs = tf.keras.Input(shape=(None,), dtype="int32")
x = tf.keras.layers.Embedding(input_dim=vocab_size,
                              output_dim=embed_size,
                              input_length=sequence_length)(inputs)

x = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(64, return_sequences=True))(x)
x = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64))(x)

outputs = tf.keras.layers.Dense(1, activation="sigmoid")(x)
model = tf.keras.Model(inputs, outputs)

model.compile(loss=tf.losses.BinaryCrossentropy(),
              optimizer='adam',
              metrics=['accuracy'])

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")
model.summary()

callbacks = [keras.callbacks.ModelCheckpoint("models/senti_model.keras",
                                                save_best_only=True),
            keras.callbacks.EarlyStopping(monitor="val_loss",
                                            patience=3,
                                            verbose=1,
                                            mode="auto",
                                            baseline=None,
                                            restore_best_weights=True)]
                                                                                        
model.fit(x_train,
          y_train,
          epochs=20,
          validation_split=0.2,
          callbacks=callbacks,
          verbose=1)

test_loss_score, test_acc_score = model.evaluate(x_test, y_test)

print(f'Final Loss: {round(test_loss_score, 2)}.')
print(f'Final Performance: {round(test_acc_score * 100, 2)} %.')

Version:  2.10.1
Eager mode:  True
GPU is available
Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, None)]            0         
                                                                 
 embedding_2 (Embedding)     (None, None, 128)         640000    
                                                                 
 bidirectional_4 (Bidirectio  (None, None, 128)        98816     
 nal)                                                            
                                                                 
 bidirectional_5 (Bidirectio  (None, 128)              98816     
 nal)                                                            
                                                                 
 dense_2 (Dense)             (None, 1)                 129       
                                                                 
Total p

**If you do not want to train the model, you can load the trained version in the cell below. But first, you need to download them (instructions in the `models` folder.)**

In [17]:
import json
import torch
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from keras_preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer, tokenizer_from_json

model = keras.models.load_model('models/senti_model.keras')

with open('models/tokenizer_senti_model.json') as fp:
    data = json.load(fp)
    tokenizer = tokenizer_from_json(data)
    fp.close()

strings = [
    'this explanation is really bad',
    'i did not like this tutorial 2/10',
    'this tutorial is garbage i wont my money back',
    'is nice to see philosophers doing machine learning',
    'this is a great and wonderful example of nlp',
    'this tutorial is great one of the best tutorials ever made'
]

preds = model.predict(
        keras.preprocessing.sequence.pad_sequences(
                                                    tokenizer.texts_to_sequences(strings),
                                                    maxlen=250,
                                                    truncating='post'
                                                ),
    verbose=0)
    
for i, string in enumerate(strings):
    print(f'Review: "{string}"\n(Negative 😔 {round((1 - preds[i][0]) * 100)}% | Positive 😊 {round(preds[i][0] * 100)}%)\n')


Review: "this explanation is really bad"
(Negative 😔 95% | Positive 😊 5%)

Review: "i did not like this tutorial 2/10"
(Negative 😔 81% | Positive 😊 19%)

Review: "this tutorial is garbage i wont my money back"
(Negative 😔 93% | Positive 😊 7%)

Review: "is nice to see philosophers doing machine learning"
(Negative 😔 3% | Positive 😊 97%)

Review: "this is a great and wonderful example of nlp"
(Negative 😔 0% | Positive 😊 100%)

Review: "this tutorial is great one of the best tutorials ever made"
(Negative 😔 0% | Positive 😊 100%)



**Model seems to be working fine! Now, let us change this.** 🙃

**Using the `textattack`, we can _wrap_ a model (like a Keras, TensorFlow, Scikitlearn, or AllenNLP model) using the `ModelWrapper` class. Then, using the `call` method, we can create a function that gives us the prediction scores for our model output.**

**Creating this function/method will be a specific-task, given the natural output format of your model. Below, you can find out how to turn the output of a `sigmoid function` (the last layer of our `bi-lstm`) into a torch tensor that contains the probabilities for each of the sentiment classes (`0` for negative, `1` for positive).**

In [19]:
from textattack.models.wrappers import ModelWrapper

class ModelWrapper(ModelWrapper):
    def __init__(self, model):
        self.model = model

    def __call__(self, text_input_list):
        text_array = tokenizer.texts_to_sequences(text_input_list)
        padded_text_array = keras.preprocessing.sequence.pad_sequences(
                                                    text_array,
                                                    maxlen=250,
                                                    truncating='post'
                                                )
        preds = self.model.predict(padded_text_array, verbose=0)
        logits = torch.tensor(preds)
        logits = logits.squeeze(dim=-1)
        final_preds = torch.stack((1-logits, logits), dim=1)
        return final_preds


**Now, let us see the outputs of our `ModelWrapper`.**

In [21]:
ModelWrapper(model)([
    'this explanation is really bad',
    'i did not like this tutorial 2/10',
    'this tutorial is garbage i wont my money back',
    'is nice to see philosophers doing machine learning',
    'this is a great and wonderful example of nlp',
    'this tutorial is great one of the best tutorials ever made'
])

tensor([[0.9459, 0.0541],
        [0.8122, 0.1878],
        [0.9319, 0.0681],
        [0.0289, 0.9711],
        [0.0024, 0.9976],
        [0.0015, 0.9985]])

**Exactly what we wanted, and the probabilities are in agreement with the input. Now we can just call an attack recipe from the `Attack Recipes` in`textattack`.**

**However, we need something to attack. `Textattack` allows you to use `HuggingFace` Datasets for the attack. You can also use your own dataset for this.**

**The `textattack.datasets.Dataset` method takes as input a list of tuples, e.g., `[('some text', label_1), ('some other text', label_2)]`. Below we transform the examples used above into a mini-dataset. `Textattack` will use these samples to create adversarial examples against our model.**

In [22]:
data = [
    ('this explanation is really bad', 0),
    ('this tutorial is garbage i wont my money back', 0),
    ('i did not like this tutorial 2/10', 0),
    ('is nice to see philosophers doing machine learning', 1),
    ('this is a great and wonderful example of nlp', 1),
    ('this tutorial is great one of the best tutorials ever made', 1)
]

**You could also transform a portion of your dataset into a list of tuples (`text, label`). You can transform any list of labeled text samples into a `textattack.datasets`.**

In [23]:
import textattack
from sklearn.model_selection import train_test_split

df = pd.read_csv('sentiment_analysis_dataset.csv')

_, x_test, _, y_test = train_test_split(
   list(df.review), list(df.sentiment), test_size=0.2, random_state=42)

y_test = np.array(y_test).astype(float)

data=[(x_test[i], int(y_test[i])) for i in range(len(x_test))]
np.random.shuffle(data)


**Now that we have a dataset. We can call one of the attack recipes from `textattack`. All available recipes correspond to attacks from the literature in Adversarial ML.**

**Attack recipes allow you to create an `Attack` object where the goal function (determines both the conditions under which the attack is successful), transformation (the adversarial perturbations produced in the samples of the dataset), constraints (the limitations imposed on theses transformations), and search method are those specified in the origin paper.**

**Here you can find a list of _fast_ attack recipes form `textattack`:**

- **`PWWSRen2019`: in this attack, words are perturbed by a synonym-swap transformation based on a combination of their saliency score (e.g., _the importance of a linguistic feature_) and maximum word-swap effectiveness (proposed in "[Generating Natural Langauge Adversarial Examples through Probability Weighted Word Saliency](https://aclanthology.org/P19-1103/)").**
- **`CheckList2020`: this attack focuses on several çangiage perturbations, like contractions, extensions, changing names, numbers, and locations (proposed in "[Beyond Accuracy: Behavioral Testing of NLP models with CheckList](https://aclanthology.org/2020.acl-main.442/)").**
- **`DeepWordBugGao2018`: this attack performs simple character-level transformations (_changes certain letters of a word_) to the highest-ranked tokens (proposed in [Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers](https://arxiv.org/abs/1801.04354)).**
- **`IGAWang2019`: this attack can be characterized as a synonym substitution-based attack that preserves the syntactic structure and semantic information of the original text (proposed in [Natural Language Adversarial Attacks and Defenses in Word Level](http://arxiv.org/abs/1909.06723)).**
- **`InputReductionFeng2018`: this attack does not cause the model to misclassify a sample. However, it removes words with low saliency scores, creating nonsensical sentences that the model classifies with high confidence as the original predicted class (proposed in [Pathologies of Neural Models Make Interpretations Difficult](https://arxiv.org/abs/1804.07781)).**
- **`Pruthi2019`: this attack focuses on a small number of character-level changes that simulate common typos, like _swapping neighboring characters, deleting characters, inserting characters,_ and _swapping characters for adjacent keys_ on a QWERTY keyboard (proposed in [Pruthi2019: Combating with Robust Word Recognition](https://arxiv.org/abs/1905.11268)).**
- **`TextBuggerLi2018`: this is a general attack framework for generating adversarial texts (proposed in [TextBugger: Generating Adversarial Text Against Real-world Applications](https://arxiv.org/abs/1812.05271)).**

**In the example below, we will use the `IGAWang2019` recipe.**

**The `Attacker` class also accepts additional arguments (full list [here](https://textattack.readthedocs.io/en/latest/api/attacker.html#attackargs)). Below we are passing a `log_to_csv ` argument equal to the name of `.csv` file (all attacks will be saved in this file).**

**For clarity purposes, all perturbed words are highlighted with [[ ]].**


In [25]:
model_wrapper = ModelWrapper(model)

import textattack
from textattack.attack_recipes import IGAWang2019
from textattack import Attacker

data = [
    ('this explanation is really bad', 0),
    ('this tutorial is garbage i wont my money back', 0),
    ('i did not like this tutorial 2/10', 0),
    ('is nice to see philosophers doing machine learning', 1),
    ('this is a great and wonderful example of nlp', 1),
    ('this tutorial is great one of the best tutorials ever made', 1)
]

dataset = textattack.datasets.Dataset(data)
attack = IGAWang2019.build(model_wrapper)
attack_args = textattack.AttackArgs(
    num_examples=6,
    log_to_csv ="textattack_logs_IGAWang2019.csv"
)
attacker = Attacker(attack, dataset, attack_args)
attacker.attack_dataset()

textattack: Unknown if model of class <class 'keras.engine.functional.Functional'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
textattack: Logging to CSV at path textattack_logs_IGAWang2019.csv


Attack(
  (search_method): ImprovedGeneticAlgorithm(
    (pop_size):  60
    (max_iters):  20
    (temp):  0.3
    (give_up_if_no_improvement):  False
    (post_crossover_check):  False
    (max_crossover_retries):  20
    (max_replace_times_per_index):  5
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapEmbedding(
    (max_candidates):  50
    (embedding):  WordEmbedding
  )
  (constraints): 
    (0): MaxWordsPerturbed(
        (max_percent):  0.2
        (compare_against_original):  True
      )
    (1): WordEmbeddingDistance(
        (embedding):  WordEmbedding
        (max_mse_dist):  0.5
        (cased):  False
        (include_unknown_words):  True
        (compare_against_original):  False
      )
    (2): StopwordModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  17%|█▋        | 1/6 [00:00<00:02,  2.42it/s]

--------------------------------------------- Result 1 ---------------------------------------------

this explanation is really [[bad]]

this explanation is really [[adverse]]




[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  33%|███▎      | 2/6 [00:01<00:02,  1.60it/s]

--------------------------------------------- Result 2 ---------------------------------------------

this tutorial is [[garbage]] i wont my [[money]] back

this tutorial is [[detritus]] i wont my [[financial]] back




[Succeeded / Failed / Skipped / Total] 2 / 1 / 0 / 3:  50%|█████     | 3/6 [00:02<00:02,  1.18it/s]

--------------------------------------------- Result 3 ---------------------------------------------

i did not like this tutorial 2/10




[Succeeded / Failed / Skipped / Total] 3 / 1 / 0 / 4:  67%|██████▋   | 4/6 [00:03<00:01,  1.28it/s]

--------------------------------------------- Result 4 ---------------------------------------------

is [[nice]] to see philosophers doing machine learning

is [[handsome]] to see philosophers doing machine learning




[Succeeded / Failed / Skipped / Total] 4 / 1 / 0 / 5:  83%|████████▎ | 5/6 [00:04<00:00,  1.12it/s]

--------------------------------------------- Result 5 ---------------------------------------------

this is a [[great]] and [[wonderful]] example of nlp

this is a [[enormous]] and [[unbelievable]] example of nlp




[Succeeded / Failed / Skipped / Total] 5 / 1 / 0 / 6: 100%|██████████| 6/6 [00:06<00:00,  1.12s/it]

--------------------------------------------- Result 6 ---------------------------------------------

this tutorial is [[great]] [[one]] of the [[best]] tutorials ever made

this tutorial is [[giant]] [[eden]] of the [[higher]] tutorials ever made



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 5      |
| Number of failed attacks:     | 1      |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 16.67% |
| Attack success rate:          | 83.33% |
| Average perturbed word %:     | 20.84% |
| Average num. words per input: | 8.33   |
| Avg num queries:              | 398.33 |
+-------------------------------+--------+





[<textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x182c7b2adf0>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x182a12eba00>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x182962be880>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x182c6fc74f0>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x182aa7a8be0>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x1829c6db520>]

**Now let us try another recipe!**

In [26]:
from textattack.attack_recipes import DeepWordBugGao2018

attack = DeepWordBugGao2018.build(model_wrapper)
attack_args = textattack.AttackArgs(
    num_examples=6,
    log_to_csv ="textattack_logs_DeepWordBugGao2018.csv"
)
attacker = Attacker(attack, dataset, attack_args)
attacker.attack_dataset()

textattack: Unknown if model of class <class 'keras.engine.functional.Functional'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
textattack: Logging to CSV at path textattack_logs_DeepWordBugGao2018.csv


Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  unk
  )
  (goal_function):  UntargetedClassification
  (transformation):  CompositeTransformation(
    (0): WordSwapNeighboringCharacterSwap(
        (random_one):  True
      )
    (1): WordSwapRandomCharacterSubstitution(
        (random_one):  True
      )
    (2): WordSwapRandomCharacterDeletion(
        (random_one):  True
      )
    (3): WordSwapRandomCharacterInsertion(
        (random_one):  True
      )
    )
  (constraints): 
    (0): LevenshteinEditDistance(
        (max_edit_distance):  30
        (compare_against_original):  True
      )
    (1): RepeatModification
    (2): StopwordModification
  (is_black_box):  True
) 



[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  17%|█▋        | 1/6 [00:00<00:01,  3.09it/s]

--------------------------------------------- Result 1 ---------------------------------------------

this explanation is really [[bad]]

this explanation is really [[baEd]]




[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  33%|███▎      | 2/6 [00:00<00:01,  3.06it/s]

--------------------------------------------- Result 2 ---------------------------------------------

this tutorial is [[garbage]] i wont my [[money]] back

this tutorial is [[garabge]] i wont my [[lmoney]] back




[Succeeded / Failed / Skipped / Total] 2 / 1 / 0 / 3:  50%|█████     | 3/6 [00:01<00:01,  2.84it/s]

--------------------------------------------- Result 3 ---------------------------------------------

i did not like this tutorial 2/10




[Succeeded / Failed / Skipped / Total] 3 / 1 / 0 / 4:  67%|██████▋   | 4/6 [00:01<00:00,  3.05it/s]

--------------------------------------------- Result 4 ---------------------------------------------

is [[nice]] to see philosophers doing machine learning

is [[ince]] to see philosophers doing machine learning




[Succeeded / Failed / Skipped / Total] 3 / 2 / 0 / 5:  83%|████████▎ | 5/6 [00:01<00:00,  2.76it/s]

--------------------------------------------- Result 5 ---------------------------------------------

this is a great and wonderful example of nlp




[Succeeded / Failed / Skipped / Total] 4 / 2 / 0 / 6: 100%|██████████| 6/6 [00:02<00:00,  2.61it/s]

--------------------------------------------- Result 6 ---------------------------------------------

this tutorial is [[great]] [[one]] of the [[best]] tutorials [[ever]] made

this tutorial is [[grea]] [[on]] of the [[bYest]] tutorials [[Vever]] made



+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 4      |
| Number of failed attacks:     | 2      |
| Number of skipped attacks:    | 0      |
| Original accuracy:            | 100.0% |
| Accuracy under attack:        | 33.33% |
| Attack success rate:          | 66.67% |
| Average perturbed word %:     | 22.77% |
| Average num. words per input: | 8.33   |
| Avg num queries:              | 15.67  |
+-------------------------------+--------+





[<textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x182a217d100>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x182c8e82370>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x182c8aaae50>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x182c6a775b0>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x182a179c190>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x182a28c5a90>]

**Language models are the foundation behind various applications such as Q&A, chatbots, machine translation, and text classification. However, the security vulnerabilities associated with ML-trained language models are still largely unknown, which is highly concerning.**

**To remedy this, developers must use the same tools that attackers use to fool models. For example, creating adversarial examples with libraries like `textattack` (_which also provide data augmentation_) can supply adversarial databases to tune and improve language models, making them more robust.**

**At the same time, other strategies are possible. As demonstrated by [Xiaosen Wang](https://arxiv.org/search/cs?searchtype=author&query=Wang%2C+X), [Hao Jin](https://arxiv.org/search/cs?searchtype=author&query=Jin%2C+H), [Yichen Yang](https://arxiv.org/search/cs?searchtype=author&query=Yang%2C+Y), and [Kun He](https://arxiv.org/search/cs?searchtype=author&query=He%2C+K), since most of the attacks used in the literature are synonym-based attacks, [Synonym Encoding Methods](https://arxiv.org/abs/1812.05271) can help models to cluster synonyms to a unique encoding, thus eliminating possible adversarial perturbations.**

---

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).