Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use pretrained models for prediction? Is there a tutorial? #13

Closed
new2scala opened this issue May 20, 2017 · 5 comments
Closed
Labels

Comments

@new2scala
Copy link

new2scala commented May 20, 2017

Have been trying for 2 days with little progress. Seems that the tool works only with this specific format:

Japan NNP B-NP B-LOC
began VBD B-VP O
the DT B-NP O
defence NN I-NP O
of IN B-PP O
their PRP$ B-NP O
Asian JJ I-NP B-MISC
Cup NNP I-NP I-MISC
...

Ideally, the input for the prediction mode should be plain text without format requirement. I've tried to use spacy to convert plain text to the format. This is what I got so far (no idea how to generate the 3rd column):

Google NNP B-PROPN O
’s NNP B-PROPN O
second JJ B-ADJ O
generation NN B-NOUN O
TPU NNP B-PROPN O
chips NNS B-NOUN O
...

And it seems that the result actually depends on the supplied label (the 4th column, contrary to what's said the documentation), but I'm probably missing something here...

@Franck-Dernoncourt
Copy link
Owner

Franck-Dernoncourt commented May 20, 2017

the input for the prediction mode should be plain text without format requirement.

Plain text without format requirement corresponds to the BRAT format with no annotation, which is supported by NeuroNER.

the result actually depends on the supplied label

You should set in the src/parameters.ini configuration file:

train_model = False
use_pretrained_model = True

Here is some instructions on how to use the trained_models/conll_2003_en pretrained model on some new unannotated text:

  1. Create a file [NeuroNER folder]/data/dataset/deploy/phrase.txt that contains I have been trying for 2 days with little progress in the United States. Seems that the tool works only with this specific format.
  2. Change the following in [NeuroNER folder]/src/parameters.ini:
# At least one of use_pretrained_model and train_model must be set to True.
train_model = False
use_pretrained_model = True
pretrained_model_folder = ../trained_models/conll_2003_en

[dataset]
dataset_text_folder = ../data/dataset
  1. Run python3.5 main.py (it should take one or two minutes)
  2. In the output folder (e.g., [NeuroNER folder]/output/dataset_2017-05-20_14-54-04-457488), you should find a file 000_deploy.txt containing:
I phrase 0 1 O O
have phrase 2 6 O O
been phrase 7 11 O O
trying phrase 12 18 O O
for phrase 19 22 O O
2 phrase 23 24 O O
days phrase 25 29 O O
with phrase 30 34 O O
little phrase 35 41 O O
progress phrase 42 50 O O
in phrase 51 53 O O
the phrase 54 57 O O
United phrase 58 64 O B-LOC
States phrase 65 71 O I-LOC
. phrase 71 72 O O

Seems phrase 73 78 O O
that phrase 79 83 O O
the phrase 84 87 O O
tool phrase 88 92 O O
works phrase 93 98 O O
only phrase 99 103 O O
with phrase 104 108 O O
this phrase 109 113 O O
specific phrase 114 122 O O
format phrase 123 129 O O
. phrase 129 130 O O

Please let you know if that answers your question!

Point taken, we should make it clearer in the documentation.

@new2scala
Copy link
Author

Yes. Now it works!

IBM phrase 0 3 O B-ORG
and phrase 4 7 O O
Google phrase 8 14 O B-ORG
’s phrase 14 16 O O
second phrase 17 23 O O
generation phrase 24 34 O O
TPU phrase 35 38 O B-ORG
chips phrase 39 44 O O
takes phrase 45 50 O O
machine phrase 51 58 O O
learning phrase 59 67 O O
processing phrase 68 78 O O
to phrase 79 81 O O
a phrase 82 83 O O
new phrase 84 87 O O
level phrase 88 93 O O
. phrase 93 94 O O

Here's what I missed:

  1. Instead of create a text file in the deploy folder, I created a file named deploy.txt;
  2. don't know where the prediction result is stored (going through the [NeuroNER folder]/output/ folder, I actually found the results from my previous runs last night)

Many thanks, Franck - you just saved my day!

@Franck-Dernoncourt
Copy link
Owner

Thanks for the feedback!

@spate141
Copy link

spate141 commented Oct 9, 2017

@Franck-Dernoncourt How can I load a pre-train model in memory, and classify new sentences on the go? I have followed this approach, and I can classify sentences if I have them in proper folder/file location.

I was wondering if I can keep the model loaded in memory and somehow can call def predict(self, text) in neuroner.py file to classify a list of sentences?

UPDATE: I think I missed this, with below I can get results for the input sentence..

nn = NeuroNER(**arguments)    
#nn.fit()
nn.predict(text="I love Chicago, IL")
text:
I love Chicago, IL
entity: {'start': 7, 'text': 'Chicago', 'type': 'ORG', 'id': 'T1', 'end': 14}

I think I can make changes to predict method to accept list of text and return back all results without saving them to temp directory. Let me know if I am following the correct path here or not. Thanks!

@trinh-hoang-hiep
Copy link

Did you make changes to predict method to accept list of text and return back all results without saving them to temp directory?. Let me know how you did. Thanks! @new2scala

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants