Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use Multilingual pretrain model Bert #40

Open
kewin1807 opened this issue Oct 27, 2019 · 6 comments
Open

use Multilingual pretrain model Bert #40

kewin1807 opened this issue Oct 27, 2019 · 6 comments

Comments

@kewin1807
Copy link

please tell me, can i use Multilingual pretrain model from Bert to train custom data with albert code ???

@brightmart
Copy link
Owner

you can have a try.
and be aware that there are some differences between bert and albert in modelling.py
why do you want to train multillingual model?

@kewin1807
Copy link
Author

i want to use model with vietnamese language. The important of change is share parameters, i know. how i can train with my language. Thanks for support =)

@brightmart
Copy link
Owner

1、you can change vocab.txt in ./albert_config, then set non_chinese to True when create pretrain data using create_pretraining_data.py
2、then do pre train using run_pretraining.py

@kewin1807
Copy link
Author

okay. Thanks for support. Best repo =)

@kewin1807
Copy link
Author

I have tried to pretrain with my dataset, but i see the loss is very small but accuracy is not improve. How i can improve result

@geekboood
Copy link

geekboood commented Nov 24, 2019

@brightmart Can we have a multilingual model for just Chinese and English? Cause in practical scenerios we may meet many english words in APP names, music names, Apple's all products's name and so on, and Google's multilingual model has too many languages. Our daliy life cannot leave English, you can see that Apple try to use purely Chinese in its products, such as replace Finder with 访达 which I think is totally a mess.
Maybe a language model for just Chinese and English can have huge impact on both research and industry and many multilingual tasks can benefit from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants