Skip to content

Implement newmm tokenizer with PyThaiNLP-compatible API#4

Merged
wannaphong merged 4 commits intomakefrom
copilot/fork-newmm-tokenizer-thainlp
Jan 10, 2026
Merged

Implement newmm tokenizer with PyThaiNLP-compatible API#4
wannaphong merged 4 commits intomakefrom
copilot/fork-newmm-tokenizer-thainlp

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 10, 2026

Implementation Plan for newmm Tokenizer

  • Create Trie data structure class for dictionary management
  • Create TCC (Thai Character Cluster) tokenizer implementation
  • Create newmm tokenizer implementation following PyThaiNLP's API
  • Add a basic Thai word dictionary file
  • Create Word tokenizer class to match PyThaiNLP's API
  • Add comprehensive unit tests for the new tokenizer
  • Update Program.cs to demonstrate the newmm tokenizer usage
  • Run tests to validate the implementation (22/22 tests passing)
  • Add comprehensive README documentation
  • Update GitHub Actions workflow to use .NET 8.0
Original prompt

This section details on the original issue you should resolve

<issue_title>Fork newmm tokenizer to thainlp.net</issue_title>
<issue_description>Fork newmm tokenizer to thainlp.net

API should like pythainlp.

PyThaiNLP: https://github.com/PyThaiNLP/pythainlp</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits January 10, 2026 12:40
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copilot AI changed the title [WIP] Fork newmm tokenizer to thainlp.net Implement newmm tokenizer with PyThaiNLP-compatible API Jan 10, 2026
Copilot AI requested a review from wannaphong January 10, 2026 12:44
@wannaphong
Copy link
Copy Markdown
Member

@copilot Fix .github/workflows/dotnetcore.yml

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jan 10, 2026

@copilot Fix .github/workflows/dotnetcore.yml

Updated the workflow to use .NET 8.0 (setup-dotnet@v4, checkout@v4, dotnet-version: '8.0.x') to match the project's target framework. Verified with Release build and all 22 tests passing. (23d375e)

@wannaphong wannaphong marked this pull request as ready for review January 10, 2026 12:54
@wannaphong wannaphong merged commit c4068a5 into make Jan 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fork newmm tokenizer to thainlp.net

2 participants