Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to pass a custom tokenizer to disambiguate via an argument #35

Closed
wants to merge 1 commit into from
Closed

Conversation

davedgd
Copy link

@davedgd davedgd commented Sep 3, 2017

It would be nice to have the ability to pass a different tokenizer to the disambiguate function for better compatibility when using different tools (e.g., when using pre-tokenized text simply split on whitespace, or pass an alternative tokenizer to NLTK's word_tokenize [e.g., Stanford]). This is important since some tokenizers produce a different set/number of tokens based on internal rules, which can lead to inconsistency in terms of how many tokens are being returned by disambiguate vs. other tools.

@alvations
Copy link
Owner

It's a year late but still thank you @davedgd!

Merge via 42bdc97 since your no longer have the repo on your github account =)

@alvations alvations closed this Apr 24, 2019
@davedgd
Copy link
Author

davedgd commented Apr 25, 2019

Thank you! No worries about being late; I appreciate your having implemented this. I came up with an easier solution to fix the problem in my code, but this is even better! =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants