Plausible looking adversarial examples for text classification
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
images
.gitignore
LICENSE
README.md
adversarial_tools.py
data_utils.py
model.py
paraphrase.py
requirements.txt
run_demo.py
run_training.py
typos.py

README.md

Plausible looking adversarial examples for text classification

DOI

This is a proof of concept aiming at producing "imperceptible" adversarial examples for text classifiers.

For instance, this are some adversarial examples produced by this code for a classifier of a tweet author's gender based on the tweet's text:

Examples of adversarial examples

Setup

System

You need Python 3, and all system dependencies possibly required by

  • Keras
  • NLTK
  • SpaCy

Python

pip install -r requirements.txt

NLP Data

  • SpaCy English language model:
    python -m spacy download en
    
  • NLTK datasets (a prompt will appear upon running paraphrase.py)

Model

To train using default parameters simply run

python run_training.py

By default will check for the CSV data set at ./data/twitter_gender_data.csv, and save the model weights to ./data/model.dat.

Should attain about 66% accuracy on validation data set for gender recognition.

Data

This model uses Kaggle Twitter User Gender Classification data.

Demo

To run the adversarial crafting script:

python run_demo.py

Success rate for crafting the adversarial example should be about 17%. By default the script will write the crafted examples into ./data/adversarial_texts.csv.

Paraphrasing

This module is rather reusable, although not immensely useful for anything practical. It provides a function that "paraphrases" a text by replacing some words with their WordNet synonyms, sorting by GloVe similarity between the synonym and the original context window. Relies on SpaCy and NLTK.

Example of paraphrase:

Paraphrase example

Citing notes

Please use Zenodo link to cite textfool. Not that this work is not published, and not peer-reviewed. textfool has no relationship to "Deep Text Classification Can be Fooled." by B. Liang, H. Li, M. Su, P. Bian, X. Li, and W. Shi.