Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Have Tokenizers return a Token object #286

Closed
matt-gardner opened this issue Sep 11, 2017 · 0 comments · Fixed by #311
Closed

Have Tokenizers return a Token object #286

matt-gardner opened this issue Sep 11, 2017 · 0 comments · Fixed by #311
Assignees

Comments

@matt-gardner
Copy link
Contributor

This will let us get rid of the nasty offset return value, because it will just be a field on the Token, and it will let us include POS tags, for POS tag embeddings.

It's probably easiest to just return spacy's token representation directly, rather than trying to roll our own, and have other word splitters mimic spacy's API. Or we could just have them crash; not sure we really need the other word splitters at this point - we could just simplify things a lot by putting spacy directly into WordTokenizer. Anybody have any thoughts on that?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant