<a href="https://colab.research.google.com/github/chong-z/NLG-project/blob/master/CS269_NLG_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS269 NLG Project: Generating Semi-Restricted Natural Language Adversarial Examples
Group Member: Chong Zhang

## Setup
Install dependencies and clone the repo. May take a few minutes. 

In [1]:
!pip install pytorch-pretrained-bert==0.6.2 nlp torch nltk numpy tensorboardX pandas lm-scorer

# lm-scorer installs a different version of transformers.
!pip install transformers==3.0.2

!git clone https://github.com/chong-z/NLG-project.git
%cd NLG-project
!sh dowloaddata.sh

Collecting transformers==3.0.2
[?25l  Downloading https://files.pythonhosted.org/packages/27/3c/91ed8f5c4e7ef3227b4119200fc0ed4b4fd965b1f0172021c25701087825/transformers-3.0.2-py3-none-any.whl (769kB)
[K     |████████████████████████████████| 778kB 13.4MB/s 
[?25hCollecting pytorch-pretrained-bert==0.6.2
[?25l  Downloading https://files.pythonhosted.org/packages/d7/e0/c08d5553b89973d9a240605b9c12404bcf8227590de62bae27acbcfe076b/pytorch_pretrained_bert-0.6.2-py3-none-any.whl (123kB)
[K     |████████████████████████████████| 133kB 48.9MB/s 
[?25hCollecting nlp
[?25l  Downloading https://files.pythonhosted.org/packages/09/e3/bcdc59f3434b224040c1047769c47b82705feca2b89ebbc28311e3764782/nlp-0.4.0-py3-none-any.whl (1.7MB)
[K     |████████████████████████████████| 1.7MB 51.5MB/s 
Collecting tensorboardX
[?25l  Downloading https://files.pythonhosted.org/packages/af/0c/4f41bcd45db376e6fe5c619c01100e9b7531c55791b7244815bac6eac32c/tensorboardX-2.1-py2.py3-none-any.whl (308kB)
[K     |██

## Explore SST-2
We focus the study on the SST-2 dataset. Here is a peek on the training examples.

Label 0 and 1 denotes negative and positive sentiment, respectively.

In [2]:
import nlp
import pandas as pd
pd.options.display.min_rows = 20
pd.options.display.max_colwidth = 200

sst2_data = nlp.load_dataset('glue', 'sst2')['train']
df = pd.DataFrame(sst2_data)
display(df)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28940.0, style=ProgressStyle(descriptio…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=30329.0, style=ProgressStyle(descriptio…


Downloading and preparing dataset glue/sst2 (download: 7.09 MiB, generated: 4.81 MiB, post-processed: Unknown sizetotal: 11.90 MiB) to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/637080968c182118f006d3ea39dd9937940e81cfffc8d79836eaae8bba307fc4...


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=7439277.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/637080968c182118f006d3ea39dd9937940e81cfffc8d79836eaae8bba307fc4. Subsequent calls will reuse this data.


Unnamed: 0,idx,label,sentence
0,0,0,hide new secretions from the parental units
1,1,0,"contains no wit , only labored gags"
2,2,1,that loves its characters and communicates something rather beautiful about human nature
3,3,0,remains utterly satisfied to remain the same throughout
4,4,0,on the worst revenge-of-the-nerds clichés the filmmakers could dredge up
5,5,0,that 's far too tragic to merit such superficial treatment
6,6,1,"demonstrates that the director of such hollywood blockbusters as patriot games can still turn out a small , personal film with an emotional wallop ."
7,7,1,of saucy
8,8,0,a depressed fifteen-year-old 's suicidal poetry
9,9,1,are more deeply thought through than in most ` right-thinking ' films


## Run adversarial attacks
`semi_attack.py` is the main attack script. It takes in a few parameters:

 - `-c models/sample-GRU/E9.pytorch`: Use our pre-trained GRU VAE for generating interpolations. Please refer to the next section on how to train your own VAE.
 - `--iter 2`: Use 2 iterations for the 'binary' search.
 - `--steps 10`: Sample 10 interpolations per iteration.
 - `--victim_model "distilbert-base-uncased-finetuned-sst-2-english"`: Use the pre-trained ["distilbert-base-uncased-finetuned-sst-2-english"](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) model from HuggingFace. Other models such as "textattack/roberta-base-SST-2" can also be used.
 - `--victim_sentence "i study at ucla"`: The victim sentence under attack. Our goal is to find an adversarial sentence similar but with different prediction than the victim sentence.
 - `--reference_sentence "i finished my final exam at ucla"`: The reference sentence providing the hint for the desired style.
 - Please refer to the script for additional parameters.

 
 ### Example 1

 In this example, the VAE only generates movie reviews despite the input (as expected). However, they do share some similarities with the victim and the reference sentence. For instance, they usually starts with an "i" and have the similar length.

 The final output is:
 ```
 -------Attack Result-------
Victim Sentence: i study at ucla pred:0.8251549005508423
Best Adv Sentence: this woefully hackneyed movie with flailing bodily movements <eos> pred:0.00024062106967903674
 ```

In [3]:
!python semi_attack.py -c models/sample-GRU/E9.pytorch --iter 2 --steps 10 --rseed 7 --most_similar -v \
  --victim_model "distilbert-base-uncased-finetuned-sst-2-english" \
  --victim_sentence "i study at ucla" \
  --reference_sentence "i finished my final exam at ucla"

2020-12-07 21:18:21.078838: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
VALID preprocessed file not found at data/ptb.valid.json. Creating new.
Model loaded from models/sample-GRU/E9.pytorch
Downloading: 100% 629/629 [00:00<00:00, 813kB/s]
Downloading: 100% 232k/232k [00:00<00:00, 15.5MB/s]
Downloading: 100% 268M/268M [00:03<00:00, 67.0MB/s]

-------Initial Inputs-------
Victim Sentence: i study at ucla pred:0.8251549005508423
Reference Sentence: i finished my final exam at ucla pred:0.07655958086252213

-------ITERATION 0-------
Best Adv Sentence: i finished my final exam at ucla pred:0.07655958086252213
-------PREDICTIONS-------
0.825 & i study at ucla \\
0.000 & a movie filled with unlikable , spiteful idiots  \\
1.000 & a movie that will enthrall the whole family  \\
0.003 & i ' ve seen before i saw this movie ,  \\
0.000 & i saw this movie , i think it ' s just another crime movie  \\
0.001 & i saw this movi

### Example 2
In this example, we pass in appropriate movie reviews to the script. We find adversarial examples in 2 iterations:

1. In the first iteration, our method finds the sentence `an intriguing story , but ultimately purposeless , ...`, which has a different prediction than the victim example. We use it ass the reference example for the next iteration.
2. After the second iteration, our method outputs the best adversarial example `the story is bogus and directed by joel ...`, which also has a different prediction but even closer to the victim sentence.

The final output is:
```
-------Attack Result-------
Victim Sentence: a strangely compelling and brilliantly acted psychological drama . pred:0.999883770942688
Best Adv Sentence: the story is bogus and directed by joel schumacher and a half dozen young men who has been overexposed , redolent of the plot device <eos> pred:0.0007369006052613258
```

Please note that we may need additional iterations to find a more similar adversarial example.

In [4]:
!python semi_attack.py -c models/sample-GRU/E9.pytorch --iter 2 --steps 10 --rseed 3 --most_similar -v \
  --victim_model "distilbert-base-uncased-finetuned-sst-2-english" \
  --victim_sentence "a strangely compelling and brilliantly acted psychological drama ." \
  --reference_sentence "an absurdist sitcom about alienation , separation and loss ."

2020-12-07 21:18:48.481691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Model loaded from models/sample-GRU/E9.pytorch

-------Initial Inputs-------
Victim Sentence: a strangely compelling and brilliantly acted psychological drama . pred:0.999883770942688
Reference Sentence: an absurdist sitcom about alienation , separation and loss . pred:0.00250418484210968

-------ITERATION 0-------
Best Adv Sentence: an absurdist sitcom about alienation , separation and loss . pred:0.00250418484210968
-------PREDICTIONS-------
1.000 & a strangely compelling and brilliantly acted psychological drama . \\
0.999 & a quietly introspective portrait of pure misogynist evil  \\
1.000 & a quietly moving portrait of an intelligent screenplay  \\
0.999 & an intriguing story , but ultimately purposeless and satisfying heroine  \\
0.001 & an intriguing story , but ultimately purposeless , and ultimately empty examination of the modern ru

## Train VAE
Train a LSTM-based VAE with 10 epoches and default settings, may take half an hour to run. Please refer to `train.py` for additional parameters.

In [7]:
!python train.py --data_dir data --epochs 10 --rnn_type lstm -tb

100% 478750579/478750579 [00:10<00:00, 47853190.10B/s]
100% 656/656 [00:00<00:00, 626614.31B/s]
100% 815973/815973 [00:00<00:00, 28416838.83B/s]
100% 458495/458495 [00:00<00:00, 18688156.93B/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.
TRAIN preprocessed file not found at data/ptb.train.json. Creating new.
2020-12-07 21:43:48.096056: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Vocablurary of 14221 keys created.
SentenceVAE(
  (embedding): Embedding(14221, 300)
  (embedding_dropout): Dropout(p=0.5, inplace=False)
  (encoder_rnn): LSTM(300, 256, batch_first=True)
  (decoder_rnn): LSTM(300, 256, batch_first=True)
  (hidden2mean): Linear(in_features=256, out_features=16, bias=True)
  (hidden2logv): Linear(in_features=256, out_features=16, bias=True)
  (latent2hidden): Linear(in_features=16, out_features=256, bias=True)
  (outputs2vocab): Linear(in_features=256, out_features=142

Test your model with an adversarial attack, please replace `bin/2020-Dec-07-21:43:33/E9.pytorch` with the path to your checkpoint as shown in the outputs.

In [21]:
!python semi_attack.py -c bin/2020-Dec-07-21:43:33/E9.pytorch --rnn_type lstm --iter 2 --steps 10 --rseed 7 --most_similar -v \
  --victim_model "distilbert-base-uncased-finetuned-sst-2-english" \
  --victim_sentence "a strangely compelling and brilliantly acted psychological drama ." \
  --reference_sentence "an absurdist sitcom about alienation , separation and loss ."

2020-12-07 22:35:09.916066: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Model loaded from bin/2020-Dec-07-21:43:33/E9.pytorch

-------Initial Inputs-------
Victim Sentence: a strangely compelling and brilliantly acted psychological drama . pred:0.999883770942688
Reference Sentence: an absurdist sitcom about alienation , separation and loss . pred:0.00250418484210968

-------ITERATION 0-------
Best Adv Sentence: an absurdist sitcom about alienation , separation and loss . pred:0.00250418484210968
-------PREDICTIONS-------
1.000 & a strangely compelling and brilliantly acted psychological drama . \\
1.000 & , it ' s a very good yarn .  \\
1.000 & , it ' s a very good viewing alternative .  \\
0.998 & a fascinating curiosity piece of filmmaking  \\
0.003 & an absurdist sitcom about alienation , separation and loss . \\

-------ITERATION 1-------
Best Adv Sentence: an absurdist sitcom about alienation , separation an