Skip to content

Commit

Permalink
Merge pull request #99 from QData/quick-grad-rename
Browse files Browse the repository at this point in the history
Quick grad rename
  • Loading branch information
jxmorris12 committed May 15, 2020
2 parents cd28217 + 92bb29e commit c61d840
Show file tree
Hide file tree
Showing 15 changed files with 439 additions and 357 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ environment variable `TA_CACHE_DIR`.

### Running Attacks

The [`examples/`](examples/) folder contains notebooks walking through examples of basic usage of TextAttack, including building a custom transformation and a custom constraint.
The [`examples/`](docs/examples/) folder contains notebooks walking through examples of basic usage of TextAttack, including building a custom transformation and a custom constraint. These examples can also be viewed through the [documentation website](https://textattack.readthedocs.io/en/latest).

We also have a command-line interface for running attacks. See help info and list of arguments with `python -m textattack --help`.

Expand Down
7 changes: 4 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
author = 'UVA QData Lab'

# The full version, including alpha/beta/rc tags
release = '0.0.1'
release = '0.0.1.9'

# Set master doc to `index.rst`.
master_doc = 'index'
Expand All @@ -35,7 +35,8 @@
'sphinx.ext.viewcode',
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
"sphinx_rtd_theme"
'sphinx_rtd_theme',
'nbsphinx'
]

# Add any paths that contain templates here, relative to this directory.
Expand All @@ -48,7 +49,7 @@


# Mock language_check to stop issues with Sphinx not loading it
autodoc_mock_imports = ["language_check"]
autodoc_mock_imports = []



Expand Down
6 changes: 3 additions & 3 deletions docs/constraints/constraint.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ Constraints determine whether a given transformation is valid. Since transformat

We split constraints into three main categories:

:ref:`semantics`: Check meaning of sentence
:ref:`semantic`: Based on the meaning of input and perturbation

:ref:`syntactical`: Check part-of-speech and grammar
:ref:`grammaticality`: Based on syntactic properties like part-of-speech and grammar

:ref:`overlap`: Measure edit distance
:ref:`overlap`: Based on character-based properties, like edit distance

.. automodule:: textattack.constraints.constraint
:members:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
.. _syntactical:

==============================
Constraints based on Syntax
Grammaticality
==============================

Syntactic constraints determine if a transformation is valid based on the resulting syntax.
Grammaticality constraints determine if a transformation is valid based on
syntactic properties of the perturbation.

Language Models
################
Expand All @@ -14,14 +15,9 @@ Language Models
.. automodule:: textattack.constraints.grammaticality.language_models.gpt2
:members:

Google Language Models
************************
.. automodule:: textattack.constraints.grammaticality.language_models.google_language_model.google_language_model
:members:

.. automodule:: textattack.constraints.grammaticality.language_models.google_language_model.alzantot_goog_lm
:members:

Language Tool
##############
.. automodule:: textattack.constraints.grammaticality.language_tool
Expand Down
2 changes: 1 addition & 1 deletion docs/constraints/overlap.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _overlap:

==========================================
Constraints based on Overlap
Overlap
==========================================

Overlap constraints determine if a transformation is valid based on character-level analysis.
Expand Down
5 changes: 3 additions & 2 deletions docs/constraints/semantics.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
.. _semantics:

================================
Constraints based on Semantics
Semantics
================================

Semantic constraints determine if a transformation is valid based on similarity of the semantics between the orignal input and the transformed input.
Semantic constraints determine if a transformation is valid based on similarity
of the semantics between the orignal input and the transformed input.

Word Embedding Distance
########################
Expand Down
File renamed without changes.
328 changes: 328 additions & 0 deletions docs/examples/1_Introduction_and_Transformtions.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"\n",
"- **Overlap constraints** determine if a perturbation is valid based on character-level analysis. For example, some attacks are constrained by edit distance: a perturbation is only valid if it perturbs some small number of characters (or fewer).\n",
"\n",
"- **Syntactic constraints** filter inputs based on their syntax. For example, an attack may require that adversarial perturbations do not introduce grammatical errors.\n",
"- **Grammaticality constraints** filter inputs based on syntactical information. For example, an attack may require that adversarial perturbations do not introduce grammatical errors.\n",
"\n",
"- **Semantic constraints** try to ensure that the perturbation is semantically similar to the original input. For example, we may design a constraint that uses a sentence encoder to encode the original and perturbed inputs, and enforce that the sentence encodings be within some fixed distance of one another. (This is what happens in subclasses of `textattack.constraints.semantics.sentence_encoders`.)"
]
Expand All @@ -35,7 +35,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## A custom constraint\n",
"### A custom constraint\n",
"\n",
"\n",
"For fun, we're going to see what happens when we constrain an attack to only allow perturbations that substitute out a named entity for another. In linguistics, a **named entity** is a proper noun, the name of a person, organization, location, product, etc. Named Entity Recognition is a popular NLP task (and one that state-of-the-art models can perform quite well). \n",
Expand All @@ -45,21 +45,59 @@
"\n",
"**NLTK**, the Natural Language Toolkit, is a Python package that helps developers write programs that process natural language. NLTK comes with predefined algorithms for lots of linguistic tasks– including Named Entity Recognition.\n",
"\n",
"First, we're going to write a constraint class. In the `__call__` method, we're going to use NLTK to find the named entities in both `x` and `x_adv`. We will only return `True` (that is, our constraint is met) if `x_adv` has substituted one named entity in `x` for another."
"First, we're going to write a constraint class. In the `__call__` method, we're going to use NLTK to find the named entities in both `x` and `x_adv`. We will only return `True` (that is, our constraint is met) if `x_adv` has substituted one named entity in `x` for another.\n",
"\n",
"Let's import NLTK and download the required modules:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package punkt to /u/jm8wx/nltk_data...\n",
"[nltk_data] Package punkt is already up-to-date!\n",
"[nltk_data] Downloading package maxent_ne_chunker to\n",
"[nltk_data] /u/jm8wx/nltk_data...\n",
"[nltk_data] Package maxent_ne_chunker is already up-to-date!\n",
"[nltk_data] Downloading package words to /u/jm8wx/nltk_data...\n",
"[nltk_data] Package words is already up-to-date!\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import nltk\n",
"nltk.download('punkt') # The NLTK tokenizer\n",
"nltk.download('maxent_ne_chunker') # NLTK named-entity chunker\n",
"nltk.download('words') # NLTK list of words"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## NLTK NER Example\n",
"### NLTK NER Example\n",
"\n",
"Here's an example of using NLTK to find the named entities in a sentence:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"execution_count": 8,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -90,8 +128,6 @@
}
],
"source": [
"import nltk\n",
"\n",
"sentence = ('In 2017, star quarterback Tom Brady led the Patriots to the Super Bowl, '\n",
" 'but lost to the Philadelphia Eagles.')\n",
"\n",
Expand All @@ -115,7 +151,7 @@
},
{
"cell_type": "code",
"execution_count": 51,
"execution_count": 9,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -145,14 +181,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Putting it all together: getting a list of Named Entity Labels from a sentence\n",
"### Putting it all together: getting a list of Named Entity Labels from a sentence\n",
"\n",
"Now that we know how to tokenize, parse, and detect named entities using NLTK, let's put it all together into a single helper function. Later, when we implement our constraint, we can query this function to easily get the entity labels from a sentence. We can even use `@functools.lru_cache` to try and speed this process up."
]
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -178,7 +214,7 @@
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": 11,
"metadata": {},
"outputs": [
{
Expand All @@ -200,7 +236,7 @@
" ('.', '.')]"
]
},
"execution_count": 37,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -221,14 +257,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating our NamedEntityConstraint\n",
"### Creating our NamedEntityConstraint\n",
"\n",
"Now that we know how to detect named entities using NLTK, let's create our custom constraint."
]
},
{
"cell_type": "code",
"execution_count": 38,
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -273,16 +309,28 @@
"collapsed": true
},
"source": [
"## Testing our constraint\n",
"### Testing our constraint\n",
"\n",
"We need to create an attack and a dataset to test our constraint on. We went over all of this in the first tutorial, so let's gloss over this part for now."
]
},
{
"cell_type": "code",
"execution_count": 39,
"execution_count": 13,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34;1mtextattack\u001b[0m: Downloading https://textattack.s3.amazonaws.com/models/classification/lstm/yelp_polarity.\n",
"100%|██████████| 297M/297M [00:06<00:00, 48.3MB/s] \n",
"\u001b[34;1mtextattack\u001b[0m: Unzipping file path_to_zip_file to unzipped_folder_path.\n",
"\u001b[34;1mtextattack\u001b[0m: Successfully saved models/classification/lstm/yelp_polarity to cache.\n",
"\u001b[34;1mtextattack\u001b[0m: Goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'> matches model LSTMForYelpSentimentClassification.\n"
]
}
],
"source": [
"# Import the dataset.\n",
"from textattack.datasets.classification import YelpSentiment\n",
Expand All @@ -296,7 +344,7 @@
},
{
"cell_type": "code",
"execution_count": 40,
"execution_count": 14,
"metadata": {},
"outputs": [
{
Expand All @@ -319,7 +367,7 @@
],
"source": [
"from textattack.transformations import WordSwapEmbedding\n",
"from textattack.attack_methods import GreedyWordSwap\n",
"from textattack.search_methods import GreedyWordSwap\n",
"\n",
"# We're going to the `WordSwapEmbedding` transformation. Using the default settings, this\n",
"# will try substituting words with their neighbors in the counter-fitted embedding space. \n",
Expand Down
29 changes: 24 additions & 5 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,22 @@ TextAttack
Features
-----------

- **Search Methods**: Explores the transformation space and attempts to find a successful attack
- **Transformations**: Takes a text input and transforms it by replacing words and phrases while attempting to retain the meaning
- **Constraints**: Determines if a given transformation is valid
- **Built-in Datasets** and **Pre-trained Models** for ease of use
TextAttack isn't just a Python library; it's a framework for constructing adversarial attacks in NLP. TextAttack builds attacks from four components:

- **Goal Functions** stipulate the goal of the attack, like to change the prediction score of a classification model, or to change all of the words in a translation output
- **Search Methods** explores the space of transformations and attempt to find a successful perturbtion
- **Transformations** takes a text input and transform it by replacing characters, words, or phrases
- **Constraints**: Determines if a potential perturbtion is valid with respect to the original input

TextAttack provides a set of **attack recipes** that assemble attacks from the literature from these four components.

TextAttack has some other features that make it a pleasure to use:

- **Data augmentation** using transformations & constraints
- **Built-in Datasets** for running attacks without supplying your own data
- **Pre-trained Models** for testing attacks and evaluating constraints
- **Built-in tokenizers** so you don't have to worry about tokenizing the inputs
- **Visualization options** like Visdom and Weights & Biases


.. toctree::
Expand All @@ -23,6 +35,13 @@ Features

quickstart/installation
quickstart/overview

.. toctree::
:maxdepth: 2
:caption: Examples

examples/1_Introduction_and_Transformtions.ipynb
examples/2_Constraints.ipynb


.. toctree::
Expand All @@ -44,7 +63,7 @@ Features

constraints/constraint
constraints/semantics
constraints/syntax
constraints/grammaticality
constraints/overlap


Expand Down
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nbsphinx

0 comments on commit c61d840

Please sign in to comment.