-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(NER) Error when training a custom dataset with document-level #22
Comments
Thank you for your fast answer. Unfortunately, I only had time to test it today. It seems another error it's showing up.
|
Oops, fixed that. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello everyone,
I'm having an error trying to use a custom dataset and training it with a document-level model for the NER task. In this case, I'm copying the configuration file
xlnet-doc-en-ner-finetune.yaml
and modifying, based on the documentation provided in this repository, to use my custom dataset.This custom dataset has 3 files: train, dev, and test. There are 10 different labels (IOB2 annotation scheme). I created a
tag_dictionary
based on the same way this file was created, just adding more tags according to my necessities. The format, structure, and everything else are identical to the normal CONLL2003 used in this repository. I've been using this custom dataset with sentence-level models and I had no problems at all. Errors do only happen when I use document-level models. I also should mention that the files I'm using are updated within the last commit of this repository.This is the error:
The error
index 0 is out of bounds for dimension 0 with size 0
seems to happen in this line.harem_default.train
harem_default.dev
harem_default.test
ner_tags_harem.pkl
xlnet-doc-ner-test.yaml
You can find all files here.
I'm very thankful for your patience and help.
[EDIT]: I forgot to mention: this error does not occur when I'm using the CONLL2003 dataset, and that's confusing me because both datasets seem to be in an identical structure. In my custom dataset, the third column (chunking) has random values, I think that's not a problem since I don't remember reading anything about using this column to the output predictions.
The text was updated successfully, but these errors were encountered: