Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary POS Tagging causing slow speed? #17

Open
rbracco opened this issue Aug 13, 2020 · 1 comment
Open

Unnecessary POS Tagging causing slow speed? #17

rbracco opened this issue Aug 13, 2020 · 1 comment

Comments

@rbracco
Copy link

rbracco commented Aug 13, 2020

Hi, thank you for this awesome library, I wish I'd discovered it months ago. The way you've set it up makes it seamless to convert to arpa. The one thing keeping me from integrating it into my work is the speed. It takes around 1ms to translate an in vocab phrase like "Hello this is a test of the public broadcasting system". The code I'm currently using, which doesnt do POS tagging and requires an extra preprocessing step for OOV items, takes 32µs for the same phrase.

Has anyone tried to optimize the library? I think the bottleneck is that nltk.pos_tag is called for all of the words, but if I'm not mistaken this is only used when one of the words is in the homograph dictionary (homograph2features). Could the code be possibly changed to only do pos-tagging only if there are multiple matches one of the words in the label?


image


image

@rbracco
Copy link
Author

rbracco commented Aug 13, 2020

Relevant: https://stackoverflow.com/questions/33829160/why-is-pos-tag-so-painfully-slow-and-can-this-be-avoided

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant