update FAQ

airaria · Mar 2, 2020 · b23edd6 · b23edd6
1 parent 2827a40
commit b23edd6
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -312,7 +312,11 @@ Fore more details, see the explanations in [API documentation](API.md)
 
 ## FAQ
 
-TBA
+**Q**: How to initialize the student model?
+
+**A**: The student model could be randomly initialized (i.e., with no prior knwledge) or be initialized by pre-trained weights.
+For example, when distilling a BERT-base model to a 3-layer BERT, you could initialize the student model with [RBT3](#https://github.com/ymcui/Chinese-BERT-wwm) (for Chinese tasks) or the first three layers of BERT (for English tasks) to avoid cold start problem. 
+We recommend that users use pre-trained student models whenever possible to fully take the advantage of large-scale pre-training.
 
 ## Citation
 

diff --git a/README_ZH.md b/README_ZH.md
@@ -308,7 +308,9 @@ Distiller负责执行实际的蒸馏过程。目前实现了以下的distillers:
 
 ## FAQ
 
-TBA
+**Q**: 学生模型该如何初始化？
+
+**A**: 知识蒸馏本质上是“老师教学生”的过程。在初始化学生模型时，可以采用随机初始化的形式（即完全不包含任何先验知识），也可以载入已训练好的模型权重。例如，从BERT-base模型蒸馏到3层BERT时，可以预先载入[RBT3](#https://github.com/ymcui/Chinese-BERT-wwm)模型权重(中文任务)或BERT的前三层权重(英文任务)，然后进一步进行蒸馏，避免了蒸馏过程的“冷启动”问题。我们建议用户在使用时尽量采用已预训练过的学生模型，以充分利用大规模数据预训练所带来的优势。
 
 ## 引用