Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Preparation Problem for conll2003 #2

Closed
YiSyuanChen opened this issue Mar 4, 2022 · 3 comments
Closed

Data Preparation Problem for conll2003 #2

YiSyuanChen opened this issue Mar 4, 2022 · 3 comments

Comments

@YiSyuanChen
Copy link

YiSyuanChen commented Mar 4, 2022

Hi authors,
First of all, I would like to thank for your great works. I've install the packages according to the requirement file, and I run the command for data preparation as:
python -c "import fewshot; fewshot.make_challenge('flex');"

However, an error shows when downloading the conll2003 dataset:

Downloading and preparing dataset conll2003/conll2003 (download: 4.63 MiB, generated: 9.78 MiB, post-processed: Unknown size, total: 14.41 MiB) to /home/yisyuan/.cache/huggingface/datasets/conll2003/conll2003/1.0.0/40e7cb6bcc374f7c349c83acd1e9352a4f09474eb691f64f364ee62eb65d0ca6...
Traceback (most recent call last):
File "", line 1, in | 0/3 [00:00<?, ?it/s]
File "/home/yisyuan/Workspace_2_250GB_SSD/researches/flex/fewshot/challenges/registration.py", line 13, in make
return registry.make(id, **evaluator_kwargs)
File "/home/yisyuan/Workspace_2_250GB_SSD/researches/flex/fewshot/challenges/registration.py", line 108, in make
return self.get_spec(id).make(**evaluator_kwargs)
File "/home/yisyuan/Workspace_2_250GB_SSD/researches/flex/fewshot/challenges/registration.py", line 29, in make
return Evaluator(config_name=self.id, hash=self.hash, **evaluator_kwargs)
File "/home/yisyuan/Workspace_2_250GB_SSD/researches/flex/fewshot/challenges/eval.py", line 93, in init
split='test',
File "/home/yisyuan/Venv/flex/lib/python3.7/site-packages/datasets/load.py", line 1707, in load_dataset
use_auth_token=use_auth_token,
File "/home/yisyuan/Venv/flex/lib/python3.7/site-packages/datasets/builder.py", line 595, in download_and_prepare
dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
File "/home/yisyuan/Venv/flex/lib/python3.7/site-packages/datasets/builder.py", line 690, in _download_and_prepare
) from None
OSError: Cannot find data file.
Original error:
Error instantiating 'fewshot.datasets.store.Store' : Couldn't find file at https://github.com/davidsbatista/NER-datasets/raw/master/CONLL2003/train.txt

It seems like the error comes from the Huggingface datasets since there is a related issue which has been solved. (huggingface/datasets#3582) However, I've changed the version of datasets from 1.8.0 (as requirements) to 1.18.3 (current version) but the error still happens. Also, I've tried to download the conll2003 dataset directly with 1.18.3 version and it just goes well:
datasets.load_dataset("conll2003")
So, I'm not quite sure what goes wrong. It would be really appreciated if you could provide any suggestion. Thank you!

@YiSyuanChen
Copy link
Author

Hi, I found a workaround by modifying "HF_SCRIPTS_VERSION" in fewshot/store/base.py from 1.5.0 to 1.18.3. However, does this affect the expected evaluation results?

@jbragg
Copy link
Collaborator

jbragg commented Mar 19, 2022

Thank you for reporting!
Should be fixed now. Please reopen if you still have issues.
To be safe, I have updated the HF_SCRIPTS_VERSION only for that dataset.

@jbragg jbragg closed this as completed Mar 19, 2022
@neichfeldt
Copy link

neichfeldt commented Mar 28, 2022

Sorry, Im a newbie and my task is to learn with flair how to train models. And I just copied the code `import torch

  1. get the corpus

from flair.datasets import CONLL_03_GERMAN corpus = CONLL_03_GERMAN()

and this doesnt work at all becauzse I have to mount somtehing from somehwere (ECI Multilingual...) but I cant really find and and understand what to do.
I tried it also with
from datasets import load_dataset dataset = load_dataset("conll2003") and it downloaded it but latest step 3 . make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

it failes beacuse "corpus" is not defined. Could please someone help! Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants