Add the ability to pass a custom tokenizer to disambiguate via an argument #35

davedgd · 2017-09-03T00:33:29Z

It would be nice to have the ability to pass a different tokenizer to the disambiguate function for better compatibility when using different tools (e.g., when using pre-tokenized text simply split on whitespace, or pass an alternative tokenizer to NLTK's word_tokenize [e.g., Stanford]). This is important since some tokenizers produce a different set/number of tokens based on internal rules, which can lead to inconsistency in terms of how many tokens are being returned by disambiguate vs. other tools.

alvations · 2019-04-24T09:46:18Z

It's a year late but still thank you @davedgd!

Merge via 42bdc97 since your no longer have the repo on your github account =)

davedgd · 2019-04-25T23:53:17Z

Thank you! No worries about being late; I appreciate your having implemented this. I came up with an easier solution to fix the problem in my code, but this is even better! =)

Update allwords_wsd.py

ae8147b

alvations closed this Apr 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the ability to pass a custom tokenizer to disambiguate via an argument #35

Add the ability to pass a custom tokenizer to disambiguate via an argument #35

davedgd commented Sep 3, 2017

alvations commented Apr 24, 2019

davedgd commented Apr 25, 2019

Add the ability to pass a custom tokenizer to disambiguate via an argument #35

Add the ability to pass a custom tokenizer to disambiguate via an argument #35

Conversation

davedgd commented Sep 3, 2017

alvations commented Apr 24, 2019

davedgd commented Apr 25, 2019