Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch processing #6

Closed
Subh1m opened this issue Aug 19, 2017 · 5 comments
Closed

Batch processing #6

Subh1m opened this issue Aug 19, 2017 · 5 comments

Comments

@Subh1m
Copy link

Subh1m commented Aug 19, 2017

It is a great wrapper.
Can you make it run as a batch process as it is too slow to run this each time for a new sentence?
I need to make it dependency parse several sentences within seconds.
Please look into the issue.

@scottwthompson
Copy link

There is a way I think by using the 'split-eolonly' property and joining the text with \n but I didn't get it to work yet but you can try investigate, One other way to get speed improvements would be use some async post requests to send which works like about 6/7x faster for me but still a little slightly slow, might be throttled at the server side also so could be even quicker if you have multiple.

@Lynten
Copy link
Owner

Lynten commented Dec 15, 2017

@Subh1m Thanks for your advice. I have tested the following code:

# coding=utf-8
import time

from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP(r'G:/JavaLibraries/stanford-corenlp-full-2016-10-31/')

sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
begin = time.time()
nlp.dependency_parse(sentence)
print(time.time() - begin)

corpus = [sentence] * 1000
begin = time.time()
for sent in corpus:
    nlp.dependency_parse(sent)
print(time.time() - begin)
Out:
26.315443992614746
23.550291299819946

It means it takes about 26 seconds to load the model, and about 24 seconds to parse 1000 sentences.

The project is just a wraper to parse the json data requested from the Java Server backend, and the "async request" method suggested by @scottwthompson may be a good way to speed up.

@Subh1m
Copy link
Author

Subh1m commented Jan 12, 2018

Thanks for the code @Lynten . Just had a question. I think in the line (NLP.dependency_parse(sentence)), we can use anything in the place of sentence, right? Or do we need to pass the sentence in order to load the model?

@Lynten
Copy link
Owner

Lynten commented Jan 15, 2018

@Subh1m The Java Server initializes the model the first time you call nlp.dependency_parse(sentence), and then it starts to work fastly. Of course we can use anything in place of the "sentence".

@Subh1m
Copy link
Author

Subh1m commented Jan 15, 2018

Thanks @Lynten, this helped a lot.

@Subh1m Subh1m closed this as completed Jan 15, 2018
@Lynten Lynten mentioned this issue Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants