Skip to content

Commit

Permalink
Merge pull request #543 from QData/doc-minor
Browse files Browse the repository at this point in the history
add custom dataset API use example in doc
  • Loading branch information
qiyanjun committed Oct 8, 2021
2 parents caacc1c + 42d0192 commit 3f0d529
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 6 deletions.
14 changes: 10 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -499,15 +499,21 @@ dataset = [('Today was....', 1), ('This movie is...', 0), ...]
You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.


#### Dataset via AttackedText class

To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
#### Dataset loading via other mechanism, see: [more details at here](https://textattack.readthedocs.io/en/latest/api/datasets.html)

```python
import textattack
my_dataset = [("text",label),....]
new_dataset = textattack.datasets.Dataset(my_dataset)
```


#### Dataset loading via other mechanism, see: [here](https://textattack.readthedocs.io/en/latest/api/datasets.html)

#### Dataset via AttackedText class

To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.


### Attacks and how to design a new attack
Expand Down
11 changes: 9 additions & 2 deletions docs/1start/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,14 +110,21 @@ You can then run attacks on samples from this dataset by adding the argument `--



#### Dataset loading via other mechanism, see: [more details at here](https://textattack.readthedocs.io/en/latest/api/datasets.html)

```python
import textattack
my_dataset = [("text",label),....]
new_dataset = textattack.datasets.Dataset(my_dataset)
```


#### Custom Dataset via AttackedText class

To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.


#### Custome Dataset via Data Frames or other python data objects (*coming soon*)


### 4. Benchmarking Attacks

Expand Down

0 comments on commit 3f0d529

Please sign in to comment.