Skip to content

Commit

Permalink
update FAQ
Browse files Browse the repository at this point in the history
  • Loading branch information
airaria committed Mar 2, 2020
1 parent 2827a40 commit b23edd6
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 2 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,11 @@ Fore more details, see the explanations in [API documentation](API.md)

## FAQ

TBA
**Q**: How to initialize the student model?

**A**: The student model could be randomly initialized (i.e., with no prior knwledge) or be initialized by pre-trained weights.
For example, when distilling a BERT-base model to a 3-layer BERT, you could initialize the student model with [RBT3](#https://github.com/ymcui/Chinese-BERT-wwm) (for Chinese tasks) or the first three layers of BERT (for English tasks) to avoid cold start problem.
We recommend that users use pre-trained student models whenever possible to fully take the advantage of large-scale pre-training.

## Citation

Expand Down
4 changes: 3 additions & 1 deletion README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,9 @@ Distiller负责执行实际的蒸馏过程。目前实现了以下的distillers:

## FAQ

TBA
**Q**: 学生模型该如何初始化?

**A**: 知识蒸馏本质上是“老师教学生”的过程。在初始化学生模型时,可以采用随机初始化的形式(即完全不包含任何先验知识),也可以载入已训练好的模型权重。例如,从BERT-base模型蒸馏到3层BERT时,可以预先载入[RBT3](#https://github.com/ymcui/Chinese-BERT-wwm)模型权重(中文任务)或BERT的前三层权重(英文任务),然后进一步进行蒸馏,避免了蒸馏过程的“冷启动”问题。我们建议用户在使用时尽量采用已预训练过的学生模型,以充分利用大规模数据预训练所带来的优势。

## 引用

Expand Down

0 comments on commit b23edd6

Please sign in to comment.