Skip to content

Punctuation and Capitalization on non-english text #2297

Answered by ekmb
iry47 asked this question in Q&A
Discussion options

You must be logged in to vote

We haven't done any experiments with this model on non-English data but the model should work for other languages out-of-box.
The quickest wait to try this model with French data, would be to use a pre-trained BERT-like model, for example, model.language_model.pretrained_model_name=bert-base-multilingual-cased or amine/bert-base-5lang-cased. To prepare data for the punctuation and capitalization tasks, please see this tutorial. The Tatoeba dataset contains French sentences as well, you would need to modify this line to get Fr data.

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@iry47
Comment options

@iry47
Comment options

@ekmb
Comment options

@iry47
Comment options

@ekmb
Comment options

Answer selected by ekmb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants