Languages in the pretrain #25
Labels
dataset
doc-required
Your PR changes impact docs and you will update later.
question
Further information is requested
你好 🖐️
Did you explicitly filter other languages (non English and non Chinese) from the pretrain dataset?
If not, what are proportions of them?
The text was updated successfully, but these errors were encountered: