Unnecessary POS Tagging causing slow speed? #17

rbracco · 2020-08-13T14:45:49Z

Hi, thank you for this awesome library, I wish I'd discovered it months ago. The way you've set it up makes it seamless to convert to arpa. The one thing keeping me from integrating it into my work is the speed. It takes around 1ms to translate an in vocab phrase like "Hello this is a test of the public broadcasting system". The code I'm currently using, which doesnt do POS tagging and requires an extra preprocessing step for OOV items, takes 32µs for the same phrase.

Has anyone tried to optimize the library? I think the bottleneck is that nltk.pos_tag is called for all of the words, but if I'm not mistaken this is only used when one of the words is in the homograph dictionary (homograph2features). Could the code be possibly changed to only do pos-tagging only if there are multiple matches one of the words in the label?

The text was updated successfully, but these errors were encountered:

rbracco · 2020-08-13T15:09:36Z

Relevant: https://stackoverflow.com/questions/33829160/why-is-pos-tag-so-painfully-slow-and-can-this-be-avoided

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary POS Tagging causing slow speed? #17

Unnecessary POS Tagging causing slow speed? #17

rbracco commented Aug 13, 2020 •

edited

Loading

rbracco commented Aug 13, 2020

Unnecessary POS Tagging causing slow speed? #17

Unnecessary POS Tagging causing slow speed? #17

Comments

rbracco commented Aug 13, 2020 • edited Loading

rbracco commented Aug 13, 2020

rbracco commented Aug 13, 2020 •

edited

Loading