# TensorFlow and TextAttack

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/Example_0_tensorflow.ipynb)

[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/Example_0_tensorflow.ipynb)

## Training



The following is code for training a text classification model using TensorFlow (and on top of it, the Keras API). This comes from the Tensorflow documentation ([see here](https://www.tensorflow.org/tutorials/keras/text_classification_with_hub)).

This cell loads the IMDB dataset (using `tensorflow_datasets`, not `datasets`), initializes a simple classifier, and trains it using Keras.

In [12]:
import numpy as np

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

import matplotlib.pyplot as plt

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

train_data, test_data = tfds.load(name="imdb_reviews", split=["train", "test"], 
                                  batch_size=-1, as_supervised=True)

train_examples, train_labels = tfds.as_numpy(train_data)
test_examples, test_labels = tfds.as_numpy(test_data)

model = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(model, output_shape=[20], input_shape=[], 
                           dtype=tf.string, trainable=True)
hub_layer(train_examples[:3])

model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))

model.summary()

x_val = train_examples[:10000]
partial_x_train = train_examples[10000:]

y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

model.compile(optimizer='adam',
              loss=tf.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=40,
                    batch_size=512,
                    validation_data=(x_val, y_val),
                    verbose=1)

INFO:absl:No config specified, defaulting to first: imdb_reviews/plain_text
INFO:absl:Overwrite dataset info from restored data version.
INFO:absl:Reusing dataset imdb_reviews (/root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0)
INFO:absl:Constructing tf.data.Dataset for split ['train', 'test'], from /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0


Version:  2.2.0
Eager mode:  True
Hub version:  0.8.0
GPU is NOT AVAILABLE
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
keras_layer_1 (KerasLayer)   (None, 20)                400020    
_________________________________________________________________
dense_2 (Dense)              (None, 16)                336       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 17        
Total params: 400,373
Trainable params: 400,373
Non-trainable params: 0
_________________________________________________________________
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40


## Attacking

For each input, our classifier outputs a single number that indicates how positive or negative the model finds the input. For binary classification, TextAttack expects two numbers for each input (a score for each class, positive and negative). We have to post-process each output to fit this TextAttack format. To add this post-processing we need to implement a custom model wrapper class (instead of using the built-in `textattack.models.wrappers.TensorFlowModelWrapper`).

Each `ModelWrapper` must implement a single method, `__call__`, which takes a list of strings and returns a `List`, `np.ndarray`, or `torch.Tensor` of predictions.

In [13]:
import numpy as np
import torch

from textattack.models.wrappers import ModelWrapper

class CustomTensorFlowModelWrapper(ModelWrapper):
    def __init__(self, model):
        self.model = model

    def __call__(self, text_input_list):
        text_array = np.array(text_input_list)
        preds = self.model(text_array).numpy()
        logits = torch.exp(-torch.tensor(preds))
        logits = 1 / (1 + logits)
        logits = logits.squeeze(dim=-1)
        # Since this model only has a single output (between 0 or 1),
        # we have to add the second dimension.
        final_preds = torch.stack((1-logits, logits), dim=1)
        return final_preds


Let's test our model wrapper out to make sure it can use our model to return predictions in the correct format.

In [14]:
CustomTensorFlowModelWrapper(model)(['I hate you so much', 'I love you'])

tensor([[0.2745, 0.7255],
        [0.0072, 0.9928]])

Looks good! Now we can initialize our model wrapper with the model we trained and pass it to an instance of `textattack.attack.Attack`. 

We'll use the `PWWSRen2019` recipe as our attack, and attack 10 samples.

In [15]:
model_wrapper = CustomTensorFlowModelWrapper(model)

from textattack.datasets import HuggingFaceDataset
from textattack.attack_recipes import PWWSRen2019

dataset = HuggingFaceDataset("rotten_tomatoes", None, "test", shuffle=True)
attack = PWWSRen2019.build(model_wrapper)

results_iterable = attack.attack_dataset(dataset, indices=range(10))
for result in results_iterable:
  print(result.__str__(color_method='ansi'))

[34;1mtextattack[0m: Loading [94mnlp[0m dataset [94mrotten_tomatoes[0m, split [94mtest[0m.
[34;1mtextattack[0m: Unknown if model of class <class '__main__.CustomTensorFlowModelWrapper'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.


[92mPositive (60%)[0m --> [37m[SKIPPED][0m

kaufman's script is never especially clever and often is rather pretentious .
[91mNegative (98%)[0m --> [92mPositive (59%)[0m

an [91munfortunate[0m title for a film that has [91mnothing[0m endearing about it .

an [92minauspicious[0m title for a film that has [92mzip[0m endearing about it .
[91mNegative (73%)[0m --> [92mPositive (59%)[0m

sade achieves the near-impossible : it [91mturns[0m the marquis de sade into a dullard .

sade achieves the near-impossible : it [92mtour[0m the marquis de sade into a dullard .
[91mNegative (98%)[0m --> [37m[SKIPPED][0m

. . . planos fijos , tomas largas , un ritmo pausado y una sutil observaciÃ³n de sus personajes , sin estridencias ni grandes revelaciones .
[91mNegative (97%)[0m --> [92mPositive (62%)[0m

charly comes off as emotionally manipulative and [91msadly[0m imitative of innumerable past love story derisions .

charly comes off as emotionally manipulative and [9

## Conclusion 

Looks good! We successfully loaded a model, adapted it for TextAttack's `ModelWrapper`, and used that object in an attack. This is basically how you would adapt any model, using TensorFlow or any other library, for use with TextAttack.