# _Cloning_ Language Models with Data Augmentation via Textattack

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

**_Adversarial machine learning_ is the study of the attacks on [machine learning](https://en.wikipedia.org/wiki/Machine_learning "Machine learning") algorithms and the defenses against such attacks. Recent surveys expose the fact that practitioners report a dire need for better protecting machine learning systems in real-world applications.**

**In this notebook we will be exploring a type of attack called model extraction (_cloning_). But _what is model extraction?_ A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions.**

![extraction](https://vitalab.github.io/article/images/stealml/fig1.jpeg)


**_The Unprotected Model ..._**


In [2]:
import json
import tensorflow as tf
from tensorflow import keras
from keras_preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer, tokenizer_from_json

model_api = keras.models.load_model('models\model_api.h5')

with open('models\model_api.json') as f:
    data = json.load(f)
    tokenizer = tokenizer_from_json(data)
    word_index = tokenizer.word_index


def api_call(string):
    api_response = model_api.predict(
        keras.preprocessing.sequence.pad_sequences(
            tokenizer.texts_to_sequences([string]),
            maxlen=256,
            truncating='post'
        )
    )
    return api_response


**Let's assume that our victim has a model that we can call via an API. This particular model is a _sentiment classifier_ (a.k.a., a language model) that we would like to clone. As an attacker, _we do not have a large budget_ (i.e. we must limit the number of calls we make to the model/API), and _we do not have a database of hundreds of thousands of labeled examples_ (if we did, we probably wouldn't need to be cloning this model).**


In [3]:
# 'this model is kind of normal'
# 'it is an ok model'
# 'nothing special about this model'
# 'this is a great example of NLP'
# 'is nice to see philosophers doing machine learning'
# 'this model is great, one of the best models ever done by a human'
# 'i hard to say something about a model so simple'
# 'you call this NLP, please, my nana can do it better in pascal'
# 'this model is garbage, i wont my money back'

request = 'this is a great example of NLP'

api_response = api_call(request)

print(f'\nREQUEST\n{"*" * 50}\n\n"{request}" \n\nAPI RESPONSE\n{"*" * 50}\n')
print(f'Negative Sentiment 😔 {round(api_response[0][0] * 100)}%')
print(f'Neutral Sentiment 😐 {round(api_response[0][1] * 100)}%')
print(f'Positive Sentiment 😊 {round(api_response[0][2] * 100)}%')



REQUEST
**************************************************

"this is a great example of NLP" 

API RESPONSE
**************************************************

Negative Sentiment 😔 11%
Neutral Sentiment 😐 11%
Positive Sentiment 😊 78%


**The model looks good, and we want to clone it.**

**_How should an attacker proceed?_**

**To start with, if we don't want to write all our initial samples by hand, we need some data. Via [web scrapping](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/bbe9c0a77499fa68de7c6d53bf5ef7e0b43a25e0/ML%20Explainability/NLP%20Interpreter%20(en)/scrape_en.ipynb) or through public data repositories (e.g., [Kaggle](https://www.kaggle.com/)) we were able to assemble an initial database containing 3000 unlabeled samples (_not enough to train a good sentiment classifier_).**

**This is our `proto_dataset.csv`.**

**This is a _black-box attack_, which means that we have no access to the _parameters/gradient/architecture_ of the model (to us it is just something that produces outputs after receiving inputs). However, we can use these outputs to classify our `proto_dataset`. Thus, information about the target model will (indirectly) be passed to our samples. We are basically stealing the predictive power of this model, to later try to replicate.**


In [2]:
import pandas as pd
import numpy as np

df = pd.read_csv('data\proto_dataset.csv')

for i in range(len(df)):
    if type(df['clean_text'][i]) != str:
        df = df.drop([i])

df = df.reset_index().drop('index', axis=1)
df['proba'] = df.clean_text.apply(api_call)
df['class'] = df.proba.apply(np.argmax)

display(df)


**The process of classifying our `proto_dataset` will vary according to the constraints imposed by our victim API (e.g., _cost per call, the limit of calls per minute, etc._). In the end, we now have a (_small_) dataset labeled by the target model. And if this model is indeed good (_why else would we want to clone it_), our samples have been accurately classified.**

**Now we need to "multiply our data". We are assuming that the attacker does not have a large initial database, and it is not feasible to classify 30000 samples using the API of the target model (either by price or other restrictions).**

**_[Data augmentation](https://en.wikipedia.org/wiki/Data_augmentation)_ are techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. _This is exactly what we need._ And one library that does this for us (specifically in NLP) is [TextAttack](https://textattack.readthedocs.io/en/latest/index.html).**

**[TextAttack](https://github.com/QData/TextAttack) is a Python framework for adversarial attacks, adversarial training, and data augmentation in NLP.**

**The part of TextAttack that interests us right now is its _data augmentation part_. Below we list some of the many ready-made augmentation classes from this library:**

> **Transforation tools $\rightarrow$ text transformations implemented (e.g., _swaping words, like names and places_) used to create an `Augmenter` object.**

- `CompositeTransformation`: use to combine multiple transformations.
- `WordInsertionRandomSynonym`: inserts synonyms of words that are already in the sequence.
- `WordInsertionMaskedLM`: generate potential insertion for a word using a masked language model.
- `WordSwapHowNet`: transforms an input by replacing its words with synonyms in the stored synonyms bank generated by the OpenHowNet (needs a python version > 3.8.1).
- `WordSwapEmbedding`: transforms an input by replacing its words with synonyms in the word embedding space.
- `WordSwapHomoglyphSwap`: transforms an input by replacing its words with visually similar words using homoglyph swaps.
- `WordSwapQWERTY`: common misspellings related to the QWERTY keyboard style.
- `WordSwapContract`: transforms an input by performing contraction on recognized combinations.
- `WordSwapChangeLocation`: changes a location described in text (e.g., Brazil -> Argentina).
- `WordSwapChangeNumber`: changes a number mentioned in text (e.g., 7 -> 13).
- `WordSwapChangeName`: changes a name mentioned in text (e.g., Alice -> Bob).
- `WordSwapInflections`: transforms an input by replacing its words with their inflections.
- `WordSwapMaskedLM` generates potential replacements for a word using a masked language model.
- `WordSwapRandomCharacterDeletion`: transforms an input by deleting its characters (`random_one=True, skip_first_char=True, skip_last_char=True` works good!).
- `WordSwapRandomCharacterInsertion`: transforms an input by inserting a random character (`random_one=True, skip_first_char=True, skip_last_char=True` works good!).
- `WordSwapRandomCharacterSubstitution` transforms an input by replacing one character in a word with a random new character.

> **Constraints $\rightarrow$ constraints determine whether or not a given augmentation is valid, consequently enhancing the quality of the augmentations.**

- `RepeatModification`: a constraint disallowing the modification of words that have already been modified.
- `StopwordModification`: a constraint disallowing the modification of stopwords.

> **Augmantation parameters $\rightarrow$ control parameters of the augmenting object.**

- `pct_words_to_swap`: percentage of words to swap per augmented example. The default is set to 0.1 (10%).
- `transformations_per_example`: maximum number of augmentations per input. The default is set to 1 (one augmented sentence given one original input)

> **Ready Recipes $\rightarrow$ in addition to creating your own augmenter, you could also use pre-built augmentation recipes. These [recipes are implemented from published papers](https://textattack.readthedocs.io/en/latest/3recipes/augmenter_recipes.html) and are very convenient to use.**

- `CheckListAugmenter`: augments words by using the transformation methods provided by **CheckList INV testing**, which combines **Name Replacement, Location Replacement, Number Alteration, and Contraction/Extension**.
- `WordNetAugmenter`: another pre-made augmentation recipe (`high_yield=True, enable_advanced_metrics=True` works good!).


In [70]:
from textattack.augmentation import Augmenter
from textattack.transformations import CompositeTransformation, WordInsertionRandomSynonym, WordSwapContract
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification

transformation = CompositeTransformation(
    [WordInsertionRandomSynonym(), WordSwapContract()])
constraints = [RepeatModification(), StopwordModification()]

aug = Augmenter(transformation=transformation,
                constraints=constraints,
                pct_words_to_swap=0.5,
                transformations_per_example=10)

request = df['clean_text'][1058]
aug_request = aug.augment(request)
for generated_data in aug_request:
    print(generated_data + '\n')


modi constitute teli obc embody because his upbringing and mindset outlook can hold chowkidar had there been his kids they too would have become besides chowkidar get gatekeeper

modi teli obc because his fosterage upbringing and mindset upbringing can embody chowkidar had there been his ingest kids they too also would rearing have become chowkidar gatekeeper

modi teli obc because his upbringing and mentality mindset can personify chowkidar pot had consume there been his kids they too would embody have go become chowkidar gatekeeper

modi teli obc because his upbringing child and suit mindset can chowkidar had there fostering been his doorkeeper kids they too would porter have nipper become chowkidar gatekeeper

modi teli obc because his upbringing too and mindset can chowkidar breeding had excessively there been his kids they likewise too doorkeeper would have become outlook chowkidar gatekeeper

modi teli obc because thither his upbringing and mindset can chowkidar there had live th

**For each labeled sample in our `proto_dataset`, we will generate $10$ augmented copies.**


In [72]:
labels = []
generated_sentences = []

for i in range(len(df)):
    if i % 250 == 0:
        print(f'{i} samples augmented ...')
    if i % len(df) == 0 and i != 0:
        print(f'{i} samples augmented. Augmentation Complete.')
    request = df['clean_text'][i]
    label = df['class'][i]
    aug_request = aug.augment(request)
    for generated_data in aug_request:
        generated_sentences.append(generated_data)
        labels.append(label)

data = {'clean_text': generated_sentences,
        'class': labels}

generated_data = pd.DataFrame(data)
generated_data.to_csv('augmented_dataset.csv')
generated_data


0 samples augmented ...
250 samples augmented ...
500 samples augmented ...
750 samples augmented ...
1000 samples augmented ...
1250 samples augmented ...
1500 samples augmented ...
1750 samples augmented ...
2000 samples augmented ...
2250 samples augmented ...
2500 samples augmented ...
2750 samples augmented ...


Unnamed: 0,clean_text,class
0,direct when year modi promised “minimum politi...,0
1,posit when modi promised “minimum release gove...,0
2,when get modi nation promised “age minimum min...,0
3,when modi promised “class minimum government g...,0
4,when modi promised “commonwealth minimum gover...,0
...,...,...
29263,kamre modi brand stiff remove marque first let...,0
29264,kamre modi stay brand remove corpse first miss...,0
29265,missive kamre modi brand remove offset first l...,0
29266,take kamre modi brand remove first absent lett...,0


**We repeat this process twice, wherein the second time we increase the percentage of words to be changed in each sentence (`pct_words_to_swap=0.8`), and including the `WordSwapQWERTY` transformation, to simulate common typing errors. Eliminating duplicates, we arrive at a `dataset_final` with $59258$ samples. Any imbalance in the distribution of samples across classes is just a _mirror image of the biases of the original model_ (e.g., most of the samples classified in the `proto_dataset` have the label _"negative sentiment"_). The total time for creating this dataset was $5$ hours.**

**Also, given the way that the API delivers model outputs (it gives us the _probability distribution_ of the victim's model `softmax` function), there is more information to be extracted. For example, we could [recover the model's logits from its probability predictions to approximate gradients](https://arxiv.org/abs/2011.14779). However, in this notebook/toy-example, we will limit ourselves to the vanilla version of this attack.**

In [70]:
df = pd.read_csv('data\\final_dataset.csv')

display(df)


Unnamed: 0.1,Unnamed: 0,clean_text,class
0,0,State when sodi Dromised “start minimur goveJn...,0
1,1,behave wBen need modo promised “whC minimum as...,0
2,2,tabernacle non when modi promised “minimum gov...,0
3,3,when mMdi promiseO “minimum gBvernment maximum...,0
4,4,when mbdi promised “minimum government maximum...,0
...,...,...,...
59253,59253,kamre modi brand stiff remove marque first let...,0
59254,59254,kamre modi stay brand remove corpse first miss...,0
59255,59255,missive kamre modi brand remove offset first l...,0
59256,59256,take kamre modi brand remove first absent lett...,0


**Now we can train our surrogate model the _old fashion_.**

- **Load & Split the `dataset`**;
- **Build & Save the `tokenizer`**;
- **Train the `surrogate_model`**.


In [65]:
import io
import json
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras.preprocessing.text import Tokenizer, tokenizer_from_json
from keras_preprocessing.sequence import pad_sequences

from sklearn.model_selection import train_test_split
df = pd.read_csv('data\\final_dataset.csv')

x = list(df.clean_text)
y = list(df['class'])


x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.5, random_state=42)

y_train = np.array(y_train).astype(float)
y_test = np.array(y_test).astype(float)

vocab_size = 3000
embed_size = 50
max_len = 280

tokenizer = Tokenizer(num_words=vocab_size,
                      filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
                      lower=True,
                      split=" ",
                      oov_token="<OOV>")

tokenizer.fit_on_texts(x_train)
training_sequences = tokenizer.texts_to_sequences(x_train)
training_padded = pad_sequences(
    training_sequences, maxlen=max_len, truncating='post')


tokenizer_json = tokenizer.to_json()
with io.open('surrogate_model.json', 'w', encoding='utf-8') as f:
    f.write(json.dumps(tokenizer_json, ensure_ascii=False))


inputs = tf.keras.Input(shape=(None,), dtype="int32")
x = tf.keras.layers.Embedding(input_dim=vocab_size,
                              output_dim=embed_size,
                              input_length=max_len)(inputs)
x = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(64, return_sequences=True))(x)
x = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64))(x)
outputs = tf.keras.layers.Dense(3, activation="softmax")(x)
model = tf.keras.Model(inputs, outputs)


model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])


model.fit(training_padded,
          y_train,
          epochs=10,
          verbose=1)

test_sequences = tokenizer.texts_to_sequences(x_test)
test_padded = pad_sequences(test_sequences, maxlen=max_len, truncating='post')

test_loss_score, test_acc_score = model.evaluate(test_padded, y_test)

print(f'Final Loss: {round(test_loss_score, 2)}.')
print(f'Final Performance: {round(test_acc_score * 100, 2)} %.')
model.save('models\surrogate_model.h5')


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Final Loss: 0.2.
Final Performance: 94.71 %.




INFO:tensorflow:Assets written to: surrogate_model_3\assets


INFO:tensorflow:Assets written to: surrogate_model_3\assets


**In real-life situations, we could not do a comparison test between the original model and our clone. But since this is just a toy example, we can! 🙃**

**For this, we are using a test database not seen by both models.**


In [68]:
test_dataset = pd.read_csv('data\compare_models_dataset.csv')
test_dataset = test_dataset.dropna(how='any', axis=0)

x = list(test_dataset.clean_text)
y = np.array(list(test_dataset['class'])).astype(float)


model_api = keras.models.load_model('models\model_api.h5')

with open('models\model_api.json') as f:
    data = json.load(f)
    tokenizer_api = tokenizer_from_json(data)
    word_index_api = tokenizer_api.word_index

surrogate_model = keras.models.load_model('models\surrogate_model.h5')

with open('models\surrogate_model.json') as f:
    data = json.load(f)
    tokenizer_surrogate = tokenizer_from_json(data)
    word_index_surrogate = tokenizer_surrogate.word_index

test_sequences_api = tokenizer_api.texts_to_sequences(x)
test_padded_api = pad_sequences(
    test_sequences_api, maxlen=256, truncating='post')

_, test_acc_score = model_api.evaluate(test_padded_api, y)

print(f'\nAccuracy of the API MODEL: {round(test_acc_score * 100, 2)} %.\n')

test_sequences_surrogate = tokenizer_surrogate.texts_to_sequences(x)
test_padded_surrogate = pad_sequences(
    test_sequences_surrogate, maxlen=280, truncating='post')

_, test_acc_score = surrogate_model.evaluate(test_padded_surrogate, y)

print(
    f'\nAccuracy of the SURROGATE MODEL: {round(test_acc_score * 100, 2)} %.\n')



Accuracy of the API MODEL: 79.37 %.


Accuracy of the SURROGATE MODEL: 63.87 %.



**$15,4\%$ less accurate than the original model in this benchmark, but still a valid model. Architecture changes and database augmentation can improve the performance of our `surrogate_model`. We now have our own language model for sentiment classification, and we have spent not even $10%$ of what was invested in creating the original model (_supposedly_).**

**Now let's put our `surrogate_model` into production.**


In [69]:
def surrogate_api_call(string):
    surrogate_api_response = surrogate_model.predict(
        keras.preprocessing.sequence.pad_sequences(
            tokenizer_surrogate.texts_to_sequences([string]),
            maxlen=280,
            truncating='post'
        )
    )
    return surrogate_api_response

# 'nothing special about this model'
# 'this model is great, one of the best models ever done by a human'
# 'this model is garbage, i wont my money back'


request = 'nothing special about this model'

surrogate_api_response = surrogate_api_call(request)

print(
    f'\nREQUEST\n{"*" * 50}\n\n"{request}" \n\nSURROGATE API RESPONSE\n{"*" * 50}\n')
print(f'Negative Sentiment 😔 {round(surrogate_api_response[0][0] * 100)}%')
print(f'Neutral Sentiment 😐 {round(surrogate_api_response[0][1] * 100)}%')
print(f'Positive Sentiment 😊 {round(surrogate_api_response[0][2] * 100)}%')



REQUEST
**************************************************

"nothing special about this model" 

SURROGATE API RESPONSE
**************************************************

Negative Sentiment 😔 1%
Neutral Sentiment 😐 98%
Positive Sentiment 😊 0%


**Model extraction attacks pose a treat to intellectual property and privacy. The availability of a model in the cloud, whether as a service or API, must be carefully architected by developers if they do not want to fall victim to this kind of attack. 🐱‍💻**

**For more information on the subject, check the literature listed below:**

- [A Framework for Understanding Model Extraction Attack and Defense](https://arxiv.org/abs/2206.11480);
- [Increasing the Cost of Model Extraction with Calibrated Proof of Work](https://arxiv.org/abs/2201.09243);
- [Data-Free Model Extraction](https://arxiv.org/abs/2011.14779);
- [MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI](https://arxiv.org/abs/2107.08909);
- [DeepSteal: Advanced Model Extractions Leveraging Efficient Weight Stealing in Memories](https://arxiv.org/abs/2111.04625);
- [Model Extraction and Defenses on Generative Adversarial Networks](https://arxiv.org/abs/2101.02069).

---

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).
