Data Preparation Problem for conll2003 #2

YiSyuanChen · 2022-03-04T11:15:54Z

Hi authors,
First of all, I would like to thank for your great works. I've install the packages according to the requirement file, and I run the command for data preparation as:
python -c "import fewshot; fewshot.make_challenge('flex');"

However, an error shows when downloading the conll2003 dataset:

Downloading and preparing dataset conll2003/conll2003 (download: 4.63 MiB, generated: 9.78 MiB, post-processed: Unknown size, total: 14.41 MiB) to /home/yisyuan/.cache/huggingface/datasets/conll2003/conll2003/1.0.0/40e7cb6bcc374f7c349c83acd1e9352a4f09474eb691f64f364ee62eb65d0ca6...
Traceback (most recent call last):
File "", line 1, in | 0/3 [00:00<?, ?it/s]
File "/home/yisyuan/Workspace_2_250GB_SSD/researches/flex/fewshot/challenges/registration.py", line 13, in make
return registry.make(id, **evaluator_kwargs)
File "/home/yisyuan/Workspace_2_250GB_SSD/researches/flex/fewshot/challenges/registration.py", line 108, in make
return self.get_spec(id).make(**evaluator_kwargs)
File "/home/yisyuan/Workspace_2_250GB_SSD/researches/flex/fewshot/challenges/registration.py", line 29, in make
return Evaluator(config_name=self.id, hash=self.hash, **evaluator_kwargs)
File "/home/yisyuan/Workspace_2_250GB_SSD/researches/flex/fewshot/challenges/eval.py", line 93, in init
split='test',
File "/home/yisyuan/Venv/flex/lib/python3.7/site-packages/datasets/load.py", line 1707, in load_dataset
use_auth_token=use_auth_token,
File "/home/yisyuan/Venv/flex/lib/python3.7/site-packages/datasets/builder.py", line 595, in download_and_prepare
dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
File "/home/yisyuan/Venv/flex/lib/python3.7/site-packages/datasets/builder.py", line 690, in _download_and_prepare
) from None
OSError: Cannot find data file.
Original error:
Error instantiating 'fewshot.datasets.store.Store' : Couldn't find file at https://github.com/davidsbatista/NER-datasets/raw/master/CONLL2003/train.txt

It seems like the error comes from the Huggingface datasets since there is a related issue which has been solved. (huggingface/datasets#3582) However, I've changed the version of datasets from 1.8.0 (as requirements) to 1.18.3 (current version) but the error still happens. Also, I've tried to download the conll2003 dataset directly with 1.18.3 version and it just goes well:
datasets.load_dataset("conll2003")
So, I'm not quite sure what goes wrong. It would be really appreciated if you could provide any suggestion. Thank you!

The text was updated successfully, but these errors were encountered:

YiSyuanChen · 2022-03-04T14:17:14Z

Hi, I found a workaround by modifying "HF_SCRIPTS_VERSION" in fewshot/store/base.py from 1.5.0 to 1.18.3. However, does this affect the expected evaluation results?

jbragg · 2022-03-19T08:15:13Z

Thank you for reporting!
Should be fixed now. Please reopen if you still have issues.
To be safe, I have updated the HF_SCRIPTS_VERSION only for that dataset.

neichfeldt · 2022-03-28T09:33:54Z

Sorry, Im a newbie and my task is to learn with flair how to train models. And I just copied the code `import torch

get the corpus

from flair.datasets import CONLL_03_GERMAN corpus = CONLL_03_GERMAN()

and this doesnt work at all becauzse I have to mount somtehing from somehwere (ECI Multilingual...) but I cant really find and and understand what to do.
I tried it also with
from datasets import load_dataset dataset = load_dataset("conll2003") and it downloaded it but latest step 3 . make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

it failes beacuse "corpus" is not defined. Could please someone help! Thank you so much!

jbragg closed this as completed Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Preparation Problem for conll2003 #2

Data Preparation Problem for conll2003 #2

YiSyuanChen commented Mar 4, 2022 •

edited

YiSyuanChen commented Mar 4, 2022

jbragg commented Mar 19, 2022

neichfeldt commented Mar 28, 2022 •

edited

Data Preparation Problem for conll2003 #2

Data Preparation Problem for conll2003 #2

Comments

YiSyuanChen commented Mar 4, 2022 • edited

YiSyuanChen commented Mar 4, 2022

jbragg commented Mar 19, 2022

neichfeldt commented Mar 28, 2022 • edited

YiSyuanChen commented Mar 4, 2022 •

edited

neichfeldt commented Mar 28, 2022 •

edited