Skip to content
This repository has been archived by the owner on Aug 31, 2022. It is now read-only.

how to train model on another language? #8

Closed
Archelunch opened this issue Jan 25, 2021 · 1 comment
Closed

how to train model on another language? #8

Archelunch opened this issue Jan 25, 2021 · 1 comment

Comments

@Archelunch
Copy link

Hi! If I want to train bort on another language, do I need first to pretrain bert on this language and then extract sub-model from it? Or can I just train bort from zero without pretrained bert?

@adewynter
Copy link
Contributor

Hi! That's a very good question, actually :)

The short version is that it depends on what you want. If you want to preserve optimality guarantees, you probably want to extract another subarchitecture (for the sake of good science!), but you might not see too much difference if you just jump ahead and do KD with the Bort architecture (not the pretrained model, just the architecture). In fact, if you just need a fast LM that works best for your target language, I'd focus my efforts on the pre-training/fine-tuning.

Here's why: when I extracted Bort (the OSE/FPTAS step), I used the English RoBERTa and an English dataset. This means that the error is minimized over an English-based dataset. However, the output model (Bort) is untrained and the "error" minimization is just because the algorithm prefers faster-converging architectures. It probably will return the same result (or something very very close to it) if you change the dataset. Since it is an approximation algorithm, "very close" probably will mean you'll get the same answer for large enough approximation parameters.
Indeed, when Danny pre-trained it with KD, he also used an English-based dataset. However, again, the fast convergence was due to the OSE/FPTAS step, so beyond some hyperparameter tuning I'd wager you'll be able to find similar speedups/results.

However, since Bort is such a small architecture, it will be very hard to fine-tune. We also have some heavy fine-tuning algorithms -- if I were creating Bort in some other language, here's where I'd focus 85% of my efforts.

Hope this helps!

PS: for KD, there are a few BERTs that are language-specific in Huggingface's transformers library.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants