Skip to content

Commit

Permalink
fix style
Browse files Browse the repository at this point in the history
  • Loading branch information
airaria committed Apr 26, 2020
1 parent 3f3e441 commit c6ce9de
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 12 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ with distiller:
**Examples can be found in the `examples` directory :**

* [examples/random_token_example](examples/random_token_example) : a simple runable toy example which demonstrates the usage of TextBrewer. This example performs distillation on the text classification task with random tokens as inputs.
* [examples/cmrc2018\_example](examples/cmrc2018_example) (Chinese): distillation on CMRC2018, a Chinese MRC task, using DRCD as data augmentation.
* [examples/cmrc2018\_example](examples/cmrc2018_example) (Chinese): distillation on CMRC 2018, a Chinese MRC task, using DRCD as data augmentation.
* [examples/mnli\_example](examples/mnli_example) (English): distillation on MNLI, an English sentence-pair classification task. This example also shows how to perform multi-teacher distillation.
* [examples/conll2003_example](examples/conll2003_example) (English): distillation on CoNLL-2003 English NER task, which is in form of sequence labeling.

Expand Down Expand Up @@ -342,8 +342,8 @@ The results are listed below.

**Note**:

1. Learning rate decay is not used in distillation on CMRC2018 and DRCD.
2. CMRC2018 and DRCD take each other as the augmentation dataset in the distillation.
1. Learning rate decay is not used in distillation on CMRC 2018 and DRCD.
2. CMRC 2018 and DRCD take each other as the augmentation dataset in the distillation.
3. The settings of training Electra-base teacher model can be found at [**Chinese-ELECTRA**](https://github.com/ymcui/Chinese-ELECTRA).
4. Electra-small student model is intialized with the [pretrained weights](https://github.com/ymcui/Chinese-ELECTRA).

Expand Down
6 changes: 3 additions & 3 deletions docs/source/ExperimentResults.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,9 @@
| T3-small (student) | 88.1 |
| T4-tiny (student) | 88.4 |

### CMRC2018 and DRCD
### CMRC 2018 and DRCD

| Model | CMRC2018 | DRCD |
| Model | CMRC 2018 | DRCD |
| --------------- | ---------------- | ------------ |
| **RoBERTa-wwm-ext** (teacher) | 68.8 / 86.4 | 86.5 / 92.5 |
| T3 (student) | 63.4 / 82.4 | 76.7 / 85.2 |
Expand All @@ -127,7 +127,7 @@
| T4-tiny (student) | 54.3 / 76.8 | 75.5 / 84.9 |
|   + DA | 61.8 / 81.8 | 77.3 / 86.1 |

**Note**: CMRC2018 and DRCD take each other as the augmentation dataset on the experiments.
**Note**: CMRC 2018 and DRCD take each other as the augmentation dataset on the experiments.

## Chinese Datasets (Electra-base as the teacher)

Expand Down
10 changes: 5 additions & 5 deletions examples/cmrc2018_example/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
[**中文说明**](README_ZH.md) | [**English**](README.md)

This example demonstrates distilltion on CMRC2018 task, and using DRCD dataset as data augmentation.
This example demonstrates distilltion on CMRC 2018 task, and using DRCD dataset as data augmentation.


* run_cmrc2018_train.sh : trains a treacher model (roberta-wwm-base) on CMRC2018.
* run_cmrc2018_distill_T3.sh : distills the teacher to T3 with CMRC2018 and DRCD datasets.
* run_cmrc2018_distill_T4tiny.sh : distills the teacher to T4tiny with CMRC2018 and DRCD datasets.
* run_cmrc2018_train.sh : trains a treacher model (roberta-wwm-base) on CMRC 2018.
* run_cmrc2018_distill_T3.sh : distills the teacher to T3 with CMRC 2018 and DRCD datasets.
* run_cmrc2018_distill_T4tiny.sh : distills the teacher to T4tiny with CMRC 2018 and DRCD datasets.

Modify the following variables in the shell scripts before running:

* BERT_DIR : where RoBERTa-wwm-base stores,including vocab.txt, pytorch_model.bin, bert_config.json
* OUTPUT_ROOT_DIR : this directory stores logs and trained model weights
* DATA_ROOT_DIR : it includes CMRC2018 and DRCD datasets:
* DATA_ROOT_DIR : it includes CMRC 2018 and DRCD datasets:
* \$\{DATA_ROOT_DIR\}/cmrc2018/squad-style-data/cmrc2018_train.json
* \$\{DATA_ROOT_DIR\}/cmrc2018/squad-style-data/cmrc2018_dev.json
* \$\{DATA_ROOT_DIR\}/drcd/DRCD_training.json
Expand Down
2 changes: 1 addition & 1 deletion examples/cmrc2018_example/README_ZH.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[**中文说明**](README_ZH.md) | [**English**](README.md)

这个例子展示CMRC2018阅读理解任务上的蒸馏,并使用DRCD数据集作为数据增强。
这个例子展示CMRC 2018阅读理解任务上的蒸馏,并使用DRCD数据集作为数据增强。

* run_cmrc2018_train.sh : 在cmrc2018数据集上训练教师模型(roberta-wwm-base)
* run_cmrc2018_distill_T3.sh : 在cmrc2018和drcd数据集上蒸馏教师模型到T3
Expand Down

0 comments on commit c6ce9de

Please sign in to comment.