Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 15, 2026

Implements preprocessing for Thai TTS to handle numbers and the repetition character (ๆ). Without preprocessing, TTS models synthesize digits literally ("1-2-3") and cannot interpret the Thai repetition marker.

Changes

  • pythaitts/preprocess.py: New module with three functions

    • num_to_thai(): Converts numerals to Thai words (supports 0-9,999,999, decimals, negatives)
    • expand_maiyamok(): Expands ๆ by repeating the preceding word/syllable
    • preprocess_text(): Combines both transformations with selective disable options
  • pythaitts/__init__.py:

    • Added preprocess parameter to TTS.tts() (enabled by default)
    • Exported preprocessing functions at package level
  • Documentation: Updated README and Sphinx docs with usage examples

  • Tests: 17 unit tests covering number conversion, mai yamok expansion, edge cases

Usage

from pythaitts import TTS, preprocess_text

# Automatic preprocessing
tts = TTS()
tts.tts("มี 5 คนๆ")  # Internally: "มี 5 คนๆ" → "มี ห้า คนคน"

# Direct preprocessing
preprocess_text("ฉันมี 123 บาท")  # → "ฉันมี หนึ่งร้อยยี่สิบสาม บาท"

# Disable if needed
tts.tts("มี 5 คนๆ", preprocess=False)
Original prompt

This section details on the original issue you should resolve

<issue_title>[TODO] Add preprocessing text for Thai text</issue_title>
<issue_description>[TODO] Add preprocessing text for Thai text

  • Number to Thai text

...</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 5 commits January 15, 2026 18:46
…expansion

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
…ttern

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copilot AI changed the title [WIP] Add preprocessing for Thai text conversion Add Thai text preprocessing: number-to-text conversion and mai yamok expansion Jan 15, 2026
@wannaphong wannaphong marked this pull request as ready for review January 15, 2026 18:56
@wannaphong wannaphong merged commit d8db593 into dev Jan 15, 2026
1 check passed
@wannaphong wannaphong deleted the copilot/add-preprocessing-thai-text branch January 15, 2026 18:56
Copilot AI requested a review from wannaphong January 15, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TODO] Add preprocessing text for Thai text

2 participants