Glue MNLI task fails due to missing 'validation' key in dataset #213
Comments
Huggingface |
I'd submit a PR if I knew what approach is recommended to accommodate this edge case. |
To anyone else trying to reproduce this, this only works if you run I'd change lightning_transformers/core/data.py to something like this: def train_dataloader(self) -> DataLoader:
if hasattr(self.cfg, "train_dataset"):
dataset_name = self.cfg.train_dataset
elif "train" in self.ds:
dataset_name = "train"
else:
raise KeyError("'train' subset not found in dataset")
return DataLoader(
self.ds[dataset_name],
batch_size=self.batch_size,
num_workers=self.cfg.num_workers,
collate_fn=self.collate_fn,
)
def val_dataloader(self) -> DataLoader:
if hasattr(self.cfg, "valid_dataset"):
dataset_name = self.cfg.valid_dataset
elif "validation" in self.ds:
dataset_name = "validation"
else:
raise KeyError("'validation' subset not found in dataset")
return DataLoader(
self.ds[dataset_name],
batch_size=self.batch_size,
num_workers=self.cfg.num_workers,
collate_fn=self.collate_fn,
)
def test_dataloader(self) -> Optional[DataLoader]:
if hasattr(self.cfg, "test_dataset"):
dataset_name = self.cfg.test_dataset
elif "test" in self.ds:
dataset_name = "test"
else:
raise KeyError("'test' subset not found in dataset")
return DataLoader(
self.ds[dataset_name],
batch_size=self.batch_size,
num_workers=self.cfg.num_workers,
collate_fn=self.collate_fn,
) This way, I can specify subsets if possible. This should work on any dataset, not just mnli You'd have to write that this is possible somewhere in the docs, but yeah, you get the jist. Specifying this in hydra should be as easy as: !pl-transformers-train \
task=nlp/text_classification \
dataset=nlp/text_classification/glue \
dataset.cfg.dataset_config_name=mnli \
++dataset.cfg.valid_dataset=validation_matched |
Interesting, when I run your notebook with lightning >= 1.5, I get:
I assume that when that error is fixed, the next error will be the validation key missing though. |
There already is an issue open for the above error: #212 That said, were you still looking to do a pull request for this feature? Because if not, I just might do it as well. |
Oh, I'd like to make the PR then! Thanks for the opportunity.
…On Fri, Dec 3, 2021, 01:27 Jv Kyle Eclarin ***@***.***> wrote:
Interesting, when I run your notebook with lightning >= 1.5, I get:
TypeError: Error instantiating 'pytorch_lightning.trainer.trainer.Trainer' : __init__() got an unexpected keyword argument 'truncated_bptt_steps'
I assume that when that error is fixed, the next error will be the
validation key missing though. When I run it locally with git clone and pip
install . and lightning >= 1.5, I didn't get this error. Regardless, it's
probably better to open a new issue for the above error...
There already is an issue open for the above error: #212
<#212>
That said, were you still looking to do a pull request for this feature?
Because if not, I just might do it as well.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#213 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFLAOR2Z5K2XS6DDLAQNMOLUO6M7ZANCNFSM5JD5RT4Q>
.
|
Sorry for the delay, been through a tough week. I've thought about two approaches to solve this more universally.
I'll PR the second approach, and welcome any comments. |
@mariomeissner, I agree with the second approach, and I'm curious to see how you implement it |
Created a PR #214 |
I like it! Very neat. |
🐛 Bug
MNLI has two validation and two test sets, called
validation_matched
,validation_mismatched
,test_matched
andtest_matched
. I assume that this was not taken into account in the datamodule.To Reproduce
Steps to reproduce the behavior:
Run the following command:
Expected behavior
I would expect the dataloader to handle the special case of MNLI and load
validation_matched
andtest_matched
by default. Maybe add an option to additionally test ontest_mismatched
as well, when desired.Environment
A standard pip install from source, as of 2021.12.01. Fails with or without GPU.
The text was updated successfully, but these errors were encountered: