Unable to run NER in inference mode #54

soumyavhasure · 2019-10-30T22:21:09Z

I trained the BioBERT model using the v1.1 weights found in the NAVER Github repository. I followed the instructions provided, but had to modify "--init_checkpoint=$BIOBERT_DIR/biobert_model.ckpt" part of the code to "--init_checkpoint=$BIOBERT_DIR/model.ckpt-1000000." Once the model had been trained, I tried running it in inference mode with "--do_train=false --do_eval=true --do_predict=true" and everything else the same as when I trained the model. I am able to see the token-level evaluation result printed as stdout format. However, this does not create the "token_test.txt and label_test.txt in output_dir." I'm not sure if I'm doing something wrong. Any help would be appreciated!

wonjininfo · 2019-10-30T22:34:11Z

Hi please check this comment from issue #39
Thanks!
Wonjin

yrahul3910 · 2019-11-03T04:43:59Z

I can confirm this issue. Trying the workaround with --num_train_epochs=0.1 doesn't seem to work. The issue is with the two files token_test.txt and label_test.txt not existing. Is there something we're missing? What generates these files?

Edit: I dug a little bit. Running grep token_test.txt * in the source directory showed me this is only referenced in run_ner.py, where this path is called token_path. In line 586, it appears you remove the file if it exists; but then in line 610, you try reading it, while none of the lines in between appear to use this variable (and therefore, there's no way to create this file).

wonjininfo · 2019-11-03T05:07:57Z

Hi all,
Would you please check this recent comment ?
In short, you need to finetune first, and use that "pretrained" weight as inference mode for your next experiments. (token_test.txtwill be generated while you finetune our weights )

Anushka1610 · 2019-11-14T03:25:59Z

I experienced similar behavior because my test file didn't have a blank newline at the end. DataProcessor._read_data() expects one, and will return an empty list of token/label pairs if it processes the entire file without finding one. Adding the newline fixed this issue for me.

yrahul3910 · 2019-11-15T02:27:50Z

@Anushka1610 Interesting. Could you describe the structure of your test file? Intuitively, I assume it should only hold one word or one sentence per line. Is this correct?

Anushka1610 · 2019-11-15T17:51:33Z

@yrahul3910 The structure of your .tsv should mimic that of the ones provided. You're correct that it should be one word/label pair per line, separated by a tab:

word\tlabel\n

For example, let's cat NERdata/NCBI-disease/train.tsv | head -10:

Identification  O
of      O
APC2    O
,       O
a       O
homologue       O
of      O
the     O
adenomatous     B
polyposis       I

yrahul3910 · 2019-11-15T18:16:43Z

@Anushka1610 Thanks for clarifying! How do you run prediction then? Given a sequence of words, I'd like to predict their class; so it feels odd to me that I need to provide classes for the test files. Is there a way around this? Alternatively, do you think I could simply give them all a random class and just look at the predictions generated by the model?

yrahul3910 · 2019-11-22T01:33:19Z

I still can't get this to work. In my NER_DIR, I have train.tsv, train_dev.tsv, devel.tsv, and test.tsv. The former 3 are copied as-is from the s800 data set provided in the repo. For the last one, I've formatted it to match the format described above. If it makes any difference, in my test.tsv, I labeled everything as class B (since I do not know the labels and prediction is my goal).

demongolem · 2019-12-23T19:35:03Z

Hi all,
Would you please check this recent comment ?
In short, you need to finetune first, and use that "pretrained" weight as inference mode for your next experiments. (token_test.txtwill be generated while you finetune our weights )

I would like to add a question to this though. Line 586 says If FLAGS.do_predict, so it seems that during prediction the token_text.txt is deleted. I agree that token_test.txt is created during the fine-tuning process, and I have created that file, but why would this file be deleted during prediction? I cannot see another place before the fatal error in line 615 when we try to read token_test.txt where the file token_test.txt would be created again. If I comment out the remove portion, I get the same token_test.txt with the timestamp of when the prediction completed. Must there be a new token_test.txt file generated during this prediction phase? If I comment out as indicated, the detokenizer in brocades will not work because of unequal length between the label and the token file.

Mayar2009 · 2020-01-14T06:42:26Z

As understand
in short for trainig
python run_ner.py
--vocab_file=$BIOBERT_DIR/vocab.txt
--bert_config_file=$BIOBERT_DIR/bert_config.json
--init_checkpoint=$BIOBERT_DIR/biobert_model.ckpt
--data_dir=$NER_DIR/
--do_train=true
--do_eval=true
--num_train_epochs=10.0
--output_dir=/tmp/bioner/(pre-trained dir)

for predicting
python run_ner.py
--vocab_file=$BIOBERT_DIR/vocab.txt
--bert_config_file=$BIOBERT_DIR/bert_config.json
--init_checkpoint=$BIOBERT_DIR/biobert_model.ckpt
--data_dir=$NER_DIR/
--do_train=true
--do_predict=true
--num_train_epochs= 4.0
--output_dir=/tmp/bioner/(pre-trained dir)

what we will change in predicting stage just write( --do_predict=true ) instead of ( --do_eval=true ) and put smaller number for num_train_epochs

right?
for me I have
NER_DIR = "/content/drive/My Drive/Colab Notebooks/Bert/BioBert/NERdata/BC5CDR-chem"
BIOBERT_DIR = "/content/drive/My Drive/Colab Notebooks/Bert/BioBert/biobert_v1.1_pubmed/biobert_v1.1_pubmed"
Output_Dir = "/content/drive/My Drive/Colab Notebooks/Bert/BioBert/output/"

so when I fune tuned and tried to predict like I wrote above
I got this error

WARNING:tensorflow:From /content/drive/My Drive/Colab Notebooks/Bert/BioBert/biobert-master/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

/usr/local/lib/python3.6/dist-packages/absl/flags/_validators.py:359: UserWarning: Flag --task_name has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
'command line!' % flag_name)
WARNING:tensorflow:From /content/drive/My Drive/Colab Notebooks/Bert/BioBert/biobert-master/run_ner.py:646: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flag.py", line 181, in _parse
return self.parser.parse(argument)
File "/usr/local/lib/python3.6/dist-packages/absl/flags/_argument_parser.py", line 152, in parse
val = self.convert(argument)
File "/usr/local/lib/python3.6/dist-packages/absl/flags/_argument_parser.py", line 213, in convert
return float(argument)
ValueError: could not convert string to float:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/content/drive/My Drive/Colab Notebooks/Bert/BioBert/biobert-master/run_ner.py", line 646, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 293, in run
flags_parser,
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 362, in _run_init
flags_parser=flags_parser,
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 212, in _register_and_parse_flags_with_usage
args_to_main = flags_parser(original_argv)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 31, in _parse_flags_tolerate_undef
return flags.FLAGS(_sys.argv if argv is None else argv, known_only=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/flags.py", line 112, in call
return self.dict['__wrapped'].call(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py", line 626, in call
unknown_flags, unparsed_args = self._parse_args(args, known_only)
File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py", line 774, in _parse_args
flag.parse(value)
File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flag.py", line 166, in parse
self.value = self._parse(argument)
File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flag.py", line 184, in _parse
'flag --%s=%s: %s' % (self.name, argument, e))
absl.flags._exceptions.IllegalFlagValueError: flag --num_train_epochs=: could not convert string to float:

what should I do in this case?

Fix known bugs about NER task. 1. NER inference mode (only predict) #39 #50 #54 2. Check OUTPUT dir and make it if not exists 3. Fixed "missing labels" problem. 3-1. See Line 631 of run_ner.py for [PAD] related problem 3-2. See biocodes/ner_detokenize.py for max_seq_length related problems 4. Refactored a few lines (ex. os.path.join, replaced **NULL** with [PAD]) 5. Functionize detokenizer (See biocodes/ner_detokenize.py for details) 6. misc If you wish to use previous version, pleas use tag v20200409 https://github.com/dmis-lab/biobert/tree/v20200409

jhyuklee closed this as completed May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to run NER in inference mode #54

Unable to run NER in inference mode #54

soumyavhasure commented Oct 30, 2019

wonjininfo commented Oct 30, 2019

yrahul3910 commented Nov 3, 2019 •

edited

Loading

wonjininfo commented Nov 3, 2019 •

edited

Loading

Anushka1610 commented Nov 14, 2019

yrahul3910 commented Nov 15, 2019

Anushka1610 commented Nov 15, 2019 •

edited

Loading

yrahul3910 commented Nov 15, 2019

yrahul3910 commented Nov 22, 2019

demongolem commented Dec 23, 2019 •

edited

Loading

Mayar2009 commented Jan 14, 2020

Unable to run NER in inference mode #54

Unable to run NER in inference mode #54

Comments

soumyavhasure commented Oct 30, 2019

wonjininfo commented Oct 30, 2019

yrahul3910 commented Nov 3, 2019 • edited Loading

wonjininfo commented Nov 3, 2019 • edited Loading

Anushka1610 commented Nov 14, 2019

yrahul3910 commented Nov 15, 2019

Anushka1610 commented Nov 15, 2019 • edited Loading

yrahul3910 commented Nov 15, 2019

yrahul3910 commented Nov 22, 2019

demongolem commented Dec 23, 2019 • edited Loading

Mayar2009 commented Jan 14, 2020

yrahul3910 commented Nov 3, 2019 •

edited

Loading

wonjininfo commented Nov 3, 2019 •

edited

Loading

Anushka1610 commented Nov 15, 2019 •

edited

Loading

demongolem commented Dec 23, 2019 •

edited

Loading