ValueError: Unsupported dataset schema #449 #529

marwanomar1 · 2021-09-20T20:03:40Z

I am running adversarial training on NLP models and I am getting an error " ValueError: Unsupported dataset schema ". When I run the following code:
import textattack
import transformers
from textattack.datasets import HuggingFaceDataset

model = transformers.AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = transformers.AutoTokenizer.from_pretrained("bert-base-uncased")
model_wrapper = textattack.models.wrappers.HuggingFaceModelWrapper(model, tokenizer)

We only use DeepWordBugGao2018 to demonstration purposes.
attack = textattack.attack_recipes.DeepWordBugGao2018.build(model_wrapper)
train_dataset = HuggingFaceDataset('squad', split='train')
eval_dataset = HuggingFaceDataset('squad', split='validation')

Train for 3 epochs with 1 initial clean epochs, 1000 adversarial examples per epoch, learning rate of 5e-5, and effective batch size of 32 (8x4).
training_args = textattack.TrainingArgs(
num_epochs=3,
num_clean_epochs=1,
num_train_adv_examples=1000,
learning_rate=5e-5,
per_device_train_batch_size=8,
gradient_accumulation_steps=4,
log_to_tb=True,
)

trainer = textattack.Trainer(
model_wrapper,
"classification",
attack,

eval_dataset,
training_args
)
trainer.train()
@jxmorris12

jxmorris12 · 2021-09-22T13:26:08Z

I suggested a fix that you haven't tried yet:

A quick diagnosis tells me you should be using our HuggingFaceDataset class to wrap the dataset instead of just importing it directly from huggingface datasets. so in the code you posted, your dataset initializations might look something like:

from textattack.datasets import HuggingFaceDataset

train_dataset = HuggingFaceDataset('squad', split='train')
eval_dataset = HuggingFaceDataset('squad', split='validation')

marwanomar1 · 2021-09-22T18:52:55Z

Thank you, Jack. Things are working now. In the same code above when I try the yelp dataset, it shows that it will take several days to complete because the size of examples is about 560.000.00.

Is it possible to reduce the number of examples to about 10k so that it would go faster?

jxmorris12 · 2021-09-22T23:54:39Z

yes! I would try using the rotten_tomatoes dataset instead. It's much smaller.

marwanomar1 · 2021-09-23T00:44:00Z

Great. Many thanks. I really appreciate it.

marwanomar1 · 2021-09-23T13:15:42Z

I am running the following code to test IMDB on WordCNN model-

It gives me error: NameError: name 'model_wrapper' is not defined

!pip install textattack
!pip install -U tensorflow-text
import textattack
import json
import os
import torch
from torch import nn as nn
from torch.nn import functional as F
import textattack
from textattack.model_args import TEXTATTACK_MODELS
from textattack.models.helpers import GloveEmbeddingLayer
from textattack.models.helpers.utils import load_cached_state_dict
from textattack.shared import utils
import textattack

We only use DeepWordBugGao2018 to demonstration purposes.

attack = textattack.attack_recipes.DeepWordBugGao2018.build(model_wrapper)
train_dataset = textattack.datasets.HuggingFaceDataset("imdb", split="train")
eval_dataset = textattack.datasets.HuggingFaceDataset("imdb", split="test")

Train for 3 epochs with 1 initial clean epochs, 1000 adversarial examples per epoch, learning rate of 5e-5, and effective batch size of 32 (8x4).

training_args = textattack.TrainingArgs(
num_epochs=3,
num_clean_epochs=1,
num_train_adv_examples=1000,
learning_rate=5e-5,
per_device_train_batch_size=8,
gradient_accumulation_steps=4,
log_to_tb=True,
)
trainer = textattack.Trainer(
model_wrapper,
"classification",
attack,
train_dataset,
eval_dataset,
training_args
)
trainer.train()

@jxmorris12

jxmorris12 · 2021-09-23T17:38:43Z

uhh, yeah, you still need this piece of the code:

model = transformers.AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = transformers.AutoTokenizer.from_pretrained("bert-base-uncased")
model_wrapper = textattack.models.wrappers.HuggingFaceModelWrapper(model, tokenizer)

marwanomar1 · 2021-09-23T18:25:41Z

That worked. Many thanks!

marwanomar1 · 2021-09-24T10:46:47Z

I ran the training on LSTM using command: textattack train --model-name-or-path lstm --dataset yelp_polarity --epochs 50 --learning-rate 1e-5
so now I want to know which command to use to attack this same model which I just trained. I want to attack it with textfooler

jxmorris12 · 2021-09-24T14:49:06Z

Pretty sure you have to create a model wrapper file and use the --model-from-file argument to textattack attack. Or you could just write a script that loads the model and runs attacks in the script.

marwanomar1 · 2021-10-01T10:41:53Z

When I try to run an attack using my saved model. I use this command: !textattack attack --recipe textfooler --num-examples 100 --model ./outputs/2021-09-15-06-37-33-327512/best_model --dataset-from-huggingface imdb --dataset-split test
white_check_mark
eyes
raised_hands

but it gives me this error: ValueError: Error: unsupported TextAttack model ./outputs/2021-09-15-06-37-33-327512/best_model

Do you know what could be going wrong?

@jxmorris12

jxmorris12 · 2021-10-03T18:41:55Z

You're using --model, not --model-from-file, I think that's the problem!

marwanomar1 · 2021-10-24T16:04:36Z

I am trying to run an attack on a pretrained, fine-tuned model as follows:
!textattack attack --model cardiffnlp/twitter-roberta-base-offensive --recipe deepwordbug --num-examples 10

but its giving me the following error:
ValueError: Must supply pretrained model or dataset

I am not sure why it would not take the pretrained model above- is there anything I am doing wrong here?

@jxmorris12

spacegaier mentioned this issue Sep 20, 2021

ValueError: Unsupported dataset schema alecthomas/voluptuous#449

Closed

jxmorris12 added the bug Something isn't working label Sep 22, 2021

marwanomar1 closed this as completed Sep 23, 2021

marwanomar1 reopened this Sep 23, 2021

jxmorris12 closed this as completed Sep 24, 2021

jw-dong mentioned this issue Dec 9, 2021

Unsupported dataset schema for SQuAD #597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Unsupported dataset schema #449 #529

ValueError: Unsupported dataset schema #449 #529

marwanomar1 commented Sep 20, 2021

jxmorris12 commented Sep 22, 2021

marwanomar1 commented Sep 22, 2021

jxmorris12 commented Sep 22, 2021

marwanomar1 commented Sep 23, 2021

marwanomar1 commented Sep 23, 2021

jxmorris12 commented Sep 23, 2021

marwanomar1 commented Sep 23, 2021

marwanomar1 commented Sep 24, 2021

jxmorris12 commented Sep 24, 2021

marwanomar1 commented Oct 1, 2021

jxmorris12 commented Oct 3, 2021

marwanomar1 commented Oct 24, 2021

ValueError: Unsupported dataset schema #449 #529

ValueError: Unsupported dataset schema #449 #529

Comments

marwanomar1 commented Sep 20, 2021

jxmorris12 commented Sep 22, 2021

marwanomar1 commented Sep 22, 2021

jxmorris12 commented Sep 22, 2021

marwanomar1 commented Sep 23, 2021

marwanomar1 commented Sep 23, 2021

We only use DeepWordBugGao2018 to demonstration purposes.

Train for 3 epochs with 1 initial clean epochs, 1000 adversarial examples per epoch, learning rate of 5e-5, and effective batch size of 32 (8x4).

jxmorris12 commented Sep 23, 2021

marwanomar1 commented Sep 23, 2021

marwanomar1 commented Sep 24, 2021

jxmorris12 commented Sep 24, 2021

marwanomar1 commented Oct 1, 2021

jxmorris12 commented Oct 3, 2021

marwanomar1 commented Oct 24, 2021