Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retraining Machine Translation model for Thai-English and English-Thai #899

Open
wannaphong opened this issue Feb 5, 2024 · 4 comments
Open
Assignees
Labels
refactoring a technical improvement which does not add any new features or change existing features.
Projects
Milestone

Comments

@wannaphong
Copy link
Member

Hello! I am working train new Machine Translation model for Thai-English and English-Thai. It's may doesn't done in v5.0.0 deadline but I hope new model will include in the next release of PyThaiNLP (v5.0.1 or other).

The new models are not Generative Pre-trained Transformers model and it will can working with huggingface transformers.

Dataset: https://github.com/vistec-AI/thai2nmt/releases/tag/scb-mt-en-th-2020%2Bmt-opus_v1.0

@wannaphong wannaphong added this to the 5.0 milestone Feb 5, 2024
@wannaphong wannaphong self-assigned this Feb 5, 2024
@wannaphong wannaphong added this to In progress in PyThaiNLP Feb 5, 2024
@pavaris-pm
Copy link
Contributor

@wannaphong it is a very excellent project to work on! However, could I have a quick question to ask for the current model that we've used for translation task? so that I can have some further research on different methods.

@wannaphong
Copy link
Member Author

@wannaphong it is a very excellent project to work on! However, could I have a quick question to ask for the current model that we've used for translation task? so that I can have some further research on different methods.

It is support various domains such as product reviews, laws, report, news, spoken dialogues, and SMS messages.

You can read scb-mt-en-th-2020: A Large English-Thai Parallel Corpus.

@bact
Copy link
Member

bact commented Feb 11, 2024

May relevant to #903

@bact bact added the refactoring a technical improvement which does not add any new features or change existing features. label Feb 11, 2024
@bact bact modified the milestones: 5.0, Future Feb 11, 2024
@wannaphong wannaphong modified the milestones: Future, 5.1 Feb 27, 2024
@wannaphong
Copy link
Member Author

wannaphong commented Mar 4, 2024

I have a computing problems, so this issue will be future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring a technical improvement which does not add any new features or change existing features.
Projects
PyThaiNLP
  
In progress
Development

No branches or pull requests

3 participants