Skip to content

Commit

Permalink
Update crfcut.py to modify the logic for Sentence splitting.
Browse files Browse the repository at this point in the history
Modified the logic for splitting of sentences due to empty strings or spaces.
  • Loading branch information
varunkatiyar819 committed Apr 3, 2024
1 parent 41558bb commit d1b64a7
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion pythainlp/tokenize/crfcut.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ def segment(text: str) -> List[str]:
if toks[idx].strip().endswith(("!", ".", "?")):
labs[idx] = "E"
# Spaces or empty strings would no longer be treated as end of sentence.
elif toks[idx].strip() == "":
elif (idx == 0 or labs[idx-1] == "E") and toks[idx].strip() == "":
labs[idx] = "I"

sentences = []
Expand Down

0 comments on commit d1b64a7

Please sign in to comment.