Add graph size limit in _onecut() to avoid long wait for ambiguous text #333
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
เพิ่มตัวแปร
graph_size
มาเพื่อดักไม่ให้กราฟใหญ่เกินที่กำหนด ถ้าเกิดกำหนดให้ตัดทันที เอาคำเท่าที่มีส่งออกมาเป็นผลลัพธ์ก่อน แล้วค่อยทำต่อ จะได้ไม่ต้องสะสมจนบวมแล้วกลายเป็นลูปยาวfixed pythainlp.word_tokenize ปัญหาตัดคำประโยคที่ยาวต่อเนื่องโดยไม่มี space [newmm] #241 (same issue as in บางคำประโยคติด Loop ครับ ที่ _bfs_paths_graph #326) in a more elegant way
introduce
graph_size
to keep track of the size ofgraph
in_onecut()
, if greater than_MAX_GRAPH_SIZE
(now set to 50) then cutoffgraph
will be used by_bfs_paths_graph()
graph
goes too big, a long loop can occur"newmm-safe"
) relies on the heuristic length of string (and more like circumventing around than an actual fix to the real issue), while this Add graph size limit in _onecut() to avoid long wait for ambiguous text #333 approach look at the size of the word graph, which is more direct to the cause of long looprename variables in
_onecut()
to make them more explicitalso refactor
segment()
, reduce indent depthif possible, I like to suggest that we should push this to PyThaiNLP 2.1.1