Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion pythainlp/corpus/words_th.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61186,7 +61186,6 @@
แอกน้อย
แอด ๆ
แอบ ๆ
๒,๕๔๐ รายการ
โอ้กอ้าก
โอฆ
โอฆชล
Expand Down
6 changes: 4 additions & 2 deletions pythainlp/tokenize/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def dict_word_tokenize(
:meth:`dict_word_tokenize` tokenizes word based on the dictionary you provide. The format has to be in trie data structure.
:param str text: text to be tokenized
:param dict custom_dict: a dictionary trie
:param str engine: choose between different options of engine to token (newmm, longest)
:param str engine: choose between different options of engine to token (newmm, mm, longest and deepcut)
:return: list of words
**Example**::
>>> from pythainlp.tokenize import dict_word_tokenize, dict_trie
Expand All @@ -90,9 +90,11 @@ def dict_word_tokenize(
from .longest import segment
elif engine == "mm" or engine == "multi_cut":
from .multi_cut import segment
elif engine == "deepcut":
from .deepcut import segment
return segment(text,list(custom_dict))
else: # default, use "newmm" engine
from .newmm import segment

return segment(text, custom_dict)


Expand Down
4 changes: 3 additions & 1 deletion pythainlp/tokenize/deepcut.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,7 @@
import deepcut


def segment(text: str) -> List[str]:
def segment(text: str,dict_source:List[str]=None) -> List[str]:
if dict_source!=None:
return deepcut.tokenize(text, custom_dict=dict_source)
return deepcut.tokenize(text)