Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER predict only mode #39

Closed
UrszulaCzerwinska opened this issue Aug 6, 2019 · 2 comments
Closed

NER predict only mode #39

UrszulaCzerwinska opened this issue Aug 6, 2019 · 2 comments

Comments

@UrszulaCzerwinska
Copy link

I would like to use NER model I fine-tune with

python run_ner.py \
    --do_train=true \
    --do_eval=true \
...

to make prediction on sentences without labels (without evaluation of course)

But

python run_ner.py \
    --do_train=false \
    --do_eval=false \
    --do_predict=true \
...

gives me error:

raise ValueError("At least one of `do_train` or `do_eval` must be True.")
ValueError: At least one of `do_train` or `do_eval` must be True.

How can I make predictions without having evaluation?

@jhyuklee
Copy link
Member

jhyuklee commented Aug 8, 2019

Hi @UrszulaCzerwinska ,
you can just comment the line below:

biobert/run_ner.py

Lines 477 to 478 in 7a3c96e

if not FLAGS.do_train and not FLAGS.do_eval:
raise ValueError("At least one of `do_train` or `do_eval` must be True.")

but this could cause other issues if do_eval and do_predict have unintended dependencies.
Thanks.

@wonjininfo
Copy link
Member

wonjininfo commented Oct 25, 2019

Hi,
I am writing this comment since the issue #50 mentioned this issue.

One simple way to bypass " At least one of do_train or do_eval must be True. " error is to set
(1) --do_train=true
(2) --num_train_epochs=0.1 (or less than pre-trained ckpt)
(3) --output_dir=(as your pre-trained model)
(4) (optional) --init_checkpoint=(as your pretrained model)
and this will skip training steps.

For example,

python run_ner.py \
    --vocab_file=$BIOBERT_DIR/vocab.txt \
    --bert_config_file=$BIOBERT_DIR/bert_config.json \
    --init_checkpoint=$BIOBERT_DIR/biobert_model.ckpt \
    --data_dir=$NER_DIR/ \
    --do_train=true \
    --do_predict=true \
    --num_train_epochs=0.1 \
    --output_dir=/tmp/bioner/(pre-trained dir) 

How it works

Our code (and also BERT) checks whether the model in the output folder is already trained.
Therefore setting --num_train_epochs less than the epochs, which the pre-trained model in the output folder went through, will make the code skip training and just perform prediction.
(You will see a message INFO:tensorflow:Skipping training since max_steps has already saved. from the middle of the console output.)

I will check with other colleagues about the line,
Please use our "bypass method" until then.
Thanks!
Wonjin

wonjininfo added a commit that referenced this issue Apr 10, 2020
Fix known bugs about NER task.
1. NER inference mode (only predict) #39 #50 #54
2. Check OUTPUT dir and make it if not exists
3. Fixed "missing labels" problem.
3-1. See Line 631 of run_ner.py for [PAD] related problem
3-2. See biocodes/ner_detokenize.py for max_seq_length related problems
4. Refactored a few lines (ex. os.path.join, replaced **NULL** with [PAD])
5. Functionize detokenizer (See biocodes/ner_detokenize.py for details)
6. misc

If you wish to use previous version, pleas use tag v20200409 https://github.com/dmis-lab/biobert/tree/v20200409
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants