Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Languages in the pretrain #25

Closed
averkij opened this issue Nov 6, 2023 · 4 comments
Closed

Languages in the pretrain #25

averkij opened this issue Nov 6, 2023 · 4 comments
Assignees
Labels
dataset doc-required Your PR changes impact docs and you will update later. question Further information is requested

Comments

@averkij
Copy link

averkij commented Nov 6, 2023

你好 🖐️

Did you explicitly filter other languages (non English and non Chinese) from the pretrain dataset?

If not, what are proportions of them?

@renxiaoyi
Copy link
Contributor

Yes, we explicitly filtered other languages in the training dataset of this model.

@ZhaoFancy ZhaoFancy added question Further information is requested dataset labels Nov 7, 2023
jiangchengSilent pushed a commit that referenced this issue Nov 9, 2023
@averkij
Copy link
Author

averkij commented Nov 9, 2023

有点伤感

@chuan298
Copy link

@ZhaoFancy Do you have any plans to build a multilingual model?

@ZhaoFancy
Copy link
Contributor

@ZhaoFancy Do you have any plans to build a multilingual model?

Yes, we've been working on it. But we not have a clear timeline on when to release it to public.

@Yimi81 Yimi81 added the doc-required Your PR changes impact docs and you will update later. label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset doc-required Your PR changes impact docs and you will update later. question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants