Refactor thai2transformers as utility package for transformers #66

cstorm125 · 2021-12-15T12:15:44Z

transformers is currently the de facto way to train NLP models (maybe speech and image soon?). For Thai language, we have some difficulties using the default settings; for example, tokenization for sequence-based metrics such as BLEU is based on space tokenization. We also want to include some quality-of-life functions such as easily loading datasets into datasets objects and preprocessing functions that are available in the tutorial notebooks.

Thai-language specific metrics
Preprocessing functions
Load datasets

PR #67

The text was updated successfully, but these errors were encountered:

cstorm125 added the enhancement New feature or request label Dec 15, 2021

cstorm125 self-assigned this Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor thai2transformers as utility package for transformers #66

Refactor thai2transformers as utility package for transformers #66

cstorm125 commented Dec 15, 2021 •

edited

Refactor thai2transformers as utility package for transformers #66

Refactor thai2transformers as utility package for transformers #66

Comments

cstorm125 commented Dec 15, 2021 • edited

cstorm125 commented Dec 15, 2021 •

edited