Skip to content

Conversation

@bact
Copy link
Member

@bact bact commented Apr 15, 2019

Handling of custom dictionary for pythainlp.tokenize.dict_word_tokenize().
Each engine's segment() may take custom dictionary in different type (Trie, List[str], etc.),
dict_word_tokenize() will try to convert to a proper type when possible. At the same time, trying to keep lower-level segment() as simple as possible.

bact added 2 commits April 15, 2019 17:25
deepcut + dict_word_tokenize
…gine. Handles Trie, Iterable[str], and str (path to dictionary).
@bact bact requested a review from wannaphong April 15, 2019 17:05
@coveralls
Copy link

coveralls commented Apr 15, 2019

Coverage Status

Coverage increased (+0.3%) to 81.163% when pulling a5525c3 on bact:dev into 529f4c0 on PyThaiNLP:dev.

@bact bact merged commit 968537b into PyThaiNLP:dev Apr 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants