Skip to content

Commit

Permalink
Merge pull request #912 from S2P2/fix-join-broken-num
Browse files Browse the repository at this point in the history
Fix empty string ('') added (in some cases) when using word_tokenize with join_broken_num=True
  • Loading branch information
bact committed May 11, 2024
2 parents a38fd5e + dcd2b47 commit fd4175e
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions pythainlp/tokenize/_utils.py
Expand Up @@ -61,8 +61,8 @@ def rejoin_formatted_num(segments: List[str]) -> List[str]:
connected_token += segments[segment_idx]
pos += len(segments[segment_idx])
segment_idx += 1

tokens_joined.append(connected_token)
if connected_token:
tokens_joined.append(connected_token)
match = next(matching_results, None)
else:
tokens_joined.append(token)
Expand Down

0 comments on commit fd4175e

Please sign in to comment.