# TextAttack on Keras Model

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/Example_6_Keras.ipynb)

[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/Example_6_Keras.ipynb)

Please remember to run  **pip3 install textattack[tensorflow]**  in your notebook enviroment before the following codes:

## This notebook runs textattack on a trained keras model: 

## Training

The code below trains a basic neural network on a series of movie reviews from the IMDB dataset, loaded using Tensorflow's datasets module. Each review is encoded as a sequence of tokens corresponding to a word's index in the vocabulary. Class labels are provided, denoting a positive or negative sentiment. 

See [here](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data) for more information on the IMDB dataset. 


In [1]:
import tensorflow as tf
import keras
import numpy as np
from keras.utils import to_categorical
from textattack.models.wrappers import ModelWrapper
from textattack.datasets import HuggingFaceDataset
from textattack.attack_recipes import PWWSRen2019

import numpy as np
from keras.utils import to_categorical
from keras import models
from keras import layers
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import Dropout

from nltk.tokenize import word_tokenize, RegexpTokenizer

Below, we load the IMDB dataset from Tensorflow and transform it for our classifier, using a Bag-of-Words format. 

In [2]:
NUM_WORDS = 1000

(x_train_tokens, y_train), (x_test_tokens, y_test) = tf.keras.datasets.imdb.load_data(
    path="imdb.npz",
    num_words=NUM_WORDS,
    skip_top=0,
    maxlen=None,
    seed=113,
    start_char=1,
    oov_char=2,
    index_from=3,
)


def transform(x):
    x_transform = []
    for i, word_indices in enumerate(x):
        BoW_array = np.zeros((NUM_WORDS,))
        for index in word_indices:
            if index < len(BoW_array):
                BoW_array[index] += 1
        x_transform.append(BoW_array)
    return np.array(x_transform)


index = int(0.9 * len(x_train_tokens))
x_train = transform(x_train_tokens)[:index]
x_test = transform(x_test_tokens)[index:]
y_train = np.array(y_train[:index])
y_test = np.array(y_test[index:])
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

vocabulary = tf.keras.datasets.imdb.get_word_index(path="imdb_word_index.json")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json


With our data successfully loaded, we can now design and trained our model. 

In [3]:
# Model Created with Keras
model = Sequential()
model.add(Dense(512, activation="relu", input_dim=NUM_WORDS))
model.add(Dropout(0.3))
model.add(Dense(100, activation="relu"))
model.add(Dense(2, activation="sigmoid"))
opt = keras.optimizers.Adam(learning_rate=0.00001)

model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"])


results = model.fit(
    x_train, y_train, epochs=18, batch_size=512, validation_data=(x_test, y_test)
)


print(results.history)

Epoch 1/18
Epoch 2/18
Epoch 3/18
Epoch 4/18
Epoch 5/18
Epoch 6/18
Epoch 7/18
Epoch 8/18
Epoch 9/18
Epoch 10/18
Epoch 11/18
Epoch 12/18
Epoch 13/18
Epoch 14/18
Epoch 15/18
Epoch 16/18
Epoch 17/18
Epoch 18/18
{'loss': [0.9584308862686157, 0.9078119993209839, 0.8743314146995544, 0.8533967733383179, 0.8329190015792847, 0.816802442073822, 0.7941828966140747, 0.7797670960426331, 0.7623777985572815, 0.7523201107978821, 0.7390732765197754, 0.7265127897262573, 0.714047372341156, 0.7041717767715454, 0.6944125294685364, 0.6798228025436401, 0.6702008247375488, 0.6643370985984802], 'accuracy': [0.49871110916137695, 0.5064444541931152, 0.5264000296592712, 0.5385333299636841, 0.5563555359840393, 0.5614666938781738, 0.5766666531562805, 0.5895110964775085, 0.6000000238418579, 0.6095555424690247, 0.6185333132743835, 0.6323555707931519, 0.6407999992370605, 0.647599995136261, 0.6585777997970581, 0.6676889061927795, 0.6765778064727783, 0.6834222078323364], 'val_loss': [0.731362521648407, 0.7148647904396057

## Attacking

With our model trained, we can create a  `ModelWrapper` that will allow us to run TextAttack on a custom Keras model. Each `ModelWrapper` must implement a single method, `__call__`, which takes a list of strings and returns a `List`, `np.ndarray`, or `torch.Tensor` of predictions.

In [4]:
class CustomKerasModelWrapper(ModelWrapper):
    def __init__(self, model):
        self.model = model

    def __call__(self, text_input_list):
        x_transform = []
        for i, review in enumerate(text_input_list):
            tokens = [x.strip(",") for x in review.split()]
            BoW_array = np.zeros((NUM_WORDS,))
            for word in tokens:
                if word in vocabulary:
                    if vocabulary[word] < len(BoW_array):
                        BoW_array[vocabulary[word]] += 1
            x_transform.append(BoW_array)
        x_transform = np.array(x_transform)
        prediction = self.model.predict(x_transform)
        return prediction


CustomKerasModelWrapper(model)(["bad bad bad bad bad", "good good good good"])

array([[0.44404104, 0.5262513 ],
       [0.49010894, 0.49974558]], dtype=float32)

With our `ModelWrapper` constructed, we can use TextAttack's HuggingFaceDataset module to load reviews for testing, alongside TextAttack's PWWSRen2019 module to serve as our attack recipe. 

The attack below leverages TextAttack's `Attack` class, capable of running attacks against entire datasets. 


In [5]:
from textattack import AttackArgs
from textattack.datasets import Dataset
from textattack import Attacker

model_wrapper = CustomKerasModelWrapper(model)
dataset = HuggingFaceDataset("rotten_tomatoes", None, "test", shuffle=True)

attack = PWWSRen2019.build(model_wrapper)

attack_args = AttackArgs(num_examples=10, checkpoint_dir="checkpoints")

attacker = Attacker(attack, dataset, attack_args)

attacker.attack_dataset()

Using custom data configuration default
Reusing dataset rotten_tomatoes_movie_review (/p/qdata/jy2ma/.cache/textattack/datasets/rotten_tomatoes_movie_review/default/1.0.0/9c411f7ecd9f3045389de0d9ce984061a1056507703d2e3183b1ac1a90816e4d)
textattack: Loading [94mdatasets[0m dataset [94mrotten_tomatoes[0m, split [94mtest[0m.
textattack: Unknown if model of class <class 'tensorflow.python.keras.engine.sequential.Sequential'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
[Succeeded / Failed / Skipped / Total] 0 / 0 / 1 / 1:  10%|█         | 1/10 [00:00<00:00, 17.58it/s]

Attack(
  (search_method): GreedyWordSwapWIR(
    (wir_method):  weighted-saliency
  )
  (goal_function):  UntargetedClassification
  (transformation):  WordSwapWordNet
  (constraints): 
    (0): RepeatModification
    (1): StopwordModification
  (is_black_box):  True
) 

--------------------------------------------- Result 1 ---------------------------------------------
[91mNegative (50%)[0m --> [37m[SKIPPED][0m

lovingly photographed in the manner of a golden book sprung to life , stuart little 2 manages sweetness largely without stickiness .




[Succeeded / Failed / Skipped / Total] 0 / 1 / 1 / 2:  20%|██        | 2/10 [00:00<00:00,  8.59it/s]

--------------------------------------------- Result 2 ---------------------------------------------
[92mPositive (50%)[0m --> [91m[FAILED][0m

consistently clever and suspenseful .




[Succeeded / Failed / Skipped / Total] 1 / 1 / 3 / 5:  50%|█████     | 5/10 [00:00<00:00,  5.88it/s]

--------------------------------------------- Result 3 ---------------------------------------------
[92mPositive (50%)[0m --> [91mNegative (50%)[0m

it's [92mlike[0m a " big chill " reunion of the baader-meinhof [92mgang[0m , only these guys are more harmless pranksters than political activists .

it's [91msimilar[0m a " big chill " reunion of the baader-meinhof [91mbunch[0m , only these guys are more harmless pranksters than political activists .


--------------------------------------------- Result 4 ---------------------------------------------
[91mNegative (51%)[0m --> [37m[SKIPPED][0m

the story gives ample opportunity for large-scale action and suspense , which director shekhar kapur supplies with tremendous skill .


--------------------------------------------- Result 5 ---------------------------------------------
[91mNegative (50%)[0m --> [37m[SKIPPED][0m

red dragon " never cuts corners .




[Succeeded / Failed / Skipped / Total] 2 / 1 / 5 / 8:  80%|████████  | 8/10 [00:01<00:00,  6.08it/s]

--------------------------------------------- Result 6 ---------------------------------------------
[92mPositive (50%)[0m --> [91mNegative (51%)[0m

fresnadillo has something serious to [92msay[0m about the ways in which extravagant chance can distort our perspective and throw us off the path of good sense .

fresnadillo has something serious to [91mtell[0m about the ways in which extravagant chance can distort our perspective and throw us off the path of good sense .


--------------------------------------------- Result 7 ---------------------------------------------
[91mNegative (51%)[0m --> [37m[SKIPPED][0m

throws in enough clever and unexpected twists to make the formula feel fresh .


--------------------------------------------- Result 8 ---------------------------------------------
[91mNegative (51%)[0m --> [37m[SKIPPED][0m

weighty and ponderous but every bit as filling as the treat of the title .




[Succeeded / Failed / Skipped / Total] 3 / 1 / 5 / 9:  90%|█████████ | 9/10 [00:01<00:00,  4.89it/s]

--------------------------------------------- Result 9 ---------------------------------------------
[92mPositive (50%)[0m --> [91mNegative (50%)[0m

a real audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance [92mcompany[0m office .

a real audience-pleaser that will strike a chord with anyone who's ever waited in a doctor's office , emergency room , hospital bed or insurance [91msociety[0m office .




[Succeeded / Failed / Skipped / Total] 4 / 1 / 5 / 10: 100%|██████████| 10/10 [00:02<00:00,  4.86it/s]

--------------------------------------------- Result 10 ---------------------------------------------
[92mPositive (51%)[0m --> [91mNegative (50%)[0m

generates an enormous [92mfeeling[0m of empathy for its characters .

generates an enormous [91mlook[0m of empathy for its characters .



+-------------------------------+-------+
| Attack Results                |       |
+-------------------------------+-------+
| Number of successful attacks: | 4     |
| Number of failed attacks:     | 1     |
| Number of skipped attacks:    | 5     |
| Original accuracy:            | 50.0% |
| Accuracy under attack:        | 10.0% |
| Attack success rate:          | 80.0% |
| Average perturbed word %:     | 7.24% |
| Average num. words per input: | 15.4  |
| Avg num queries:              | 103.2 |
+-------------------------------+-------+





[<textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d494c67f0>,
 <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7f2d40ab9520>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f2d46675be0>,
 <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d4740da60>,
 <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d40aca130>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f2d4289e9d0>,
 <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d42fa9820>,
 <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7f2d3b54eb50>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f2d4905ce50>,
 <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f2d49032940>]

## Conclusion

Great! We trained a binary classifier, created a custom `ModelWrapper` for Keras models, and successsfully ran adversarial attacks against our trained Keras model! This serves a basic demo for how to use TextAttack within your own environments. 
