You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for this awesome library, I wish I'd discovered it months ago. The way you've set it up makes it seamless to convert to arpa. The one thing keeping me from integrating it into my work is the speed. It takes around 1ms to translate an in vocab phrase like "Hello this is a test of the public broadcasting system". The code I'm currently using, which doesnt do POS tagging and requires an extra preprocessing step for OOV items, takes 32µs for the same phrase.
Has anyone tried to optimize the library? I think the bottleneck is that nltk.pos_tag is called for all of the words, but if I'm not mistaken this is only used when one of the words is in the homograph dictionary (homograph2features). Could the code be possibly changed to only do pos-tagging only if there are multiple matches one of the words in the label?
The text was updated successfully, but these errors were encountered:
Hi, thank you for this awesome library, I wish I'd discovered it months ago. The way you've set it up makes it seamless to convert to arpa. The one thing keeping me from integrating it into my work is the speed. It takes around 1ms to translate an in vocab phrase like "Hello this is a test of the public broadcasting system". The code I'm currently using, which doesnt do POS tagging and requires an extra preprocessing step for OOV items, takes 32µs for the same phrase.
Has anyone tried to optimize the library? I think the bottleneck is that nltk.pos_tag is called for all of the words, but if I'm not mistaken this is only used when one of the words is in the homograph dictionary (
homograph2features
). Could the code be possibly changed to only do pos-tagging only if there are multiple matches one of the words in the label?The text was updated successfully, but these errors were encountered: