[SCRIPT] Reproduce MNLI tasks based on BERT #571

vanewu · 2019-01-26T10:39:31Z

Description

Fine-tune the MNLI task using the pre-trained bert model. Add the ability to load and save checkpoint files.
This task is built on top of #481. Thanks everyone for their work.

Results [log]:
On dev_matched.tsv : 0.846
On dev_mismatched.tsv : 0.847

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Comments

I will try finer parameters with different bert models and update the log.

codecov · 2019-01-27T06:12:24Z

Codecov Report

Merging #571 into master will decrease coverage by 0.38%.
The diff coverage is 6.21%.

@@            Coverage Diff             @@
##           master     #571      +/-   ##
==========================================
- Coverage   71.59%   71.21%   -0.39%     
==========================================
  Files         125      126       +1     
  Lines       10595    10665      +70     
==========================================
+ Hits         7586     7595       +9     
- Misses       3009     3070      +61

Flag	Coverage Δ
#PR564	`?`
#PR567	`?`
#PR571	`71.21% <6.21%> (?)`
#master	`?`
#notserial	`48.09% <6.21%> (-0.25%)`	⬇️
#py2	`70.98% <6.21%> (-0.39%)`	⬇️
#py3	`71.07% <6.21%> (-0.39%)`	⬇️
#serial	`55.07% <6.21%> (-0.29%)`	⬇️

codecov · 2019-01-27T06:12:27Z

Codecov Report

Merging #571 into master will decrease coverage by 1.3%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #571      +/-   ##
==========================================
- Coverage   65.85%   64.55%   -1.31%     
==========================================
  Files         147      148       +1     
  Lines       13090    13803     +713     
==========================================
+ Hits         8621     8910     +289     
- Misses       4469     4893     +424

Flag	Coverage Δ
#PR420	`71.41% <100%> (+4.25%)`	⬆️
#PR435	`71.45% <100%> (+0.05%)`	⬆️
#PR466	`72.05% <100%> (+0.03%)`	⬆️
#PR493	`67.22% <100%> (+0.75%)`	⬆️
#PR505	`68.18% <100%> (?)`
#PR529	`71.37% <100%> (-0.04%)`	⬇️
#PR539	`69.8% <100%> (-0.04%)`	⬇️
#PR571	`71.22% <100%> (?)`
#PR573	`?`
#PR582	`?`
#PR587	`?`
#PR588	`72.02% <100%> (+1%)`	⬆️
#master	`71.3% <100%> (-0.11%)`	⬇️
#notserial	`40.3% <100%> (-1.55%)`	⬇️
#py2	`62.51% <100%> (-2.43%)`	⬇️
#py3	`64.44% <100%> (-1.28%)`	⬇️
#serial	`51.1% <100%> (-1.23%)`	⬇️

eric-haibin-lin · 2019-01-28T22:42:38Z

What is the command to reproduce the result?

eric-haibin-lin · 2019-01-28T23:13:18Z

scripts/bert/finetune_mnli.py

+dataset = 'book_corpus_wiki_en_uncased'
+bert, vocabulary = bert_12_768_12(dataset_name=dataset,
+                                  pretrained=True, ctx=ctx, use_pooler=True,
+                                  use_decoder=False, use_classifier=False)


Can we also add an option to load from a specific checkpoint file? For example, users can specify --load_checkpoint ~/.mxnet/bert/xxx.params

@eric-haibin-lin
GLUE task only requires 3 or 5 epochs for fine-tuning, I don't think checkpoint requires currently.

Good question. What I mean is that if someone uses our pre-training script to further pre-train on their dataset (work in progress, not merged yet), they're interested in loading the bert checkpoint and finetune on other tasks

Yes, adding this option will make the script work for a more specific situation, and this is easy to implement. I can add it later.

haven-jeon · 2019-01-29T02:16:29Z

Can you show training logs? 😃

vanewu · 2019-01-29T06:55:54Z

Can you show training logs?

Of course, I will open a PR later to upload the log to dmlc/web-data. Then link it here.

Add MNLI based on dmlc#481 fix MNLIDataset

mli · 2019-01-30T09:37:31Z

Job PR-571/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/3/index.html

mli · 2019-01-30T12:20:33Z

Job PR-571/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/4/index.html

haven-jeon · 2019-01-30T14:16:57Z

Can you show training logs?

Of course, I will open a PR later to upload the log to dmlc/web-data. Then link it here.

I just check, max_len was important hyper-params for mnli. is it right?

scripts/bert/finetune_classifier.py

mli · 2019-02-04T09:33:28Z

Job PR-571/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/5/index.html

vanewu · 2019-02-12T15:40:40Z

Can you show training logs?

Of course, I will open a PR later to upload the log to dmlc/web-data. Then link it here.

I just check, max_len was important hyper-params for mnli. is it right?

@haven-jeon I have not determined the value of this parameter in the original paper. In the fine-tuning I set the · max_len equal to 80, because I have a rough statistics on the data used for the MNLI task, the sample of length 80 can cover 98% of the total data set.

mli · 2019-02-13T06:48:25Z

Job PR-571/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/6/index.html

mli · 2019-02-13T13:19:55Z

Job PR-571/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/2/index.html

haven-jeon · 2019-02-13T21:33:18Z

Can you show training logs?

Of course, I will open a PR later to upload the log to dmlc/web-data. Then link it here.

I just check, max_len was important hyper-params for mnli. is it right?

@haven-jeon I have not determined the value of this parameter in the original paper. In the fine-tuning I set the · max_len equal to 80, because I have a rough statistics on the data used for the MNLI task, the sample of length 80 can cover 98% of the total data set.

@kenjewu max_len=80 would be good hyper parameter, because if not, it gets low performance. Thanks for your explanation.

scripts/bert/finetune_classifier.py

mli · 2019-02-14T06:41:31Z

Job PR-571/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/7/index.html

Update description of model_parameters fix context

mli · 2019-02-14T08:57:34Z

Job PR-571/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/9/index.html

eric-haibin-lin · 2019-02-14T18:20:36Z

@haven-jeon could you help do another round of review and approve/request changes?

mli · 2019-02-18T09:33:26Z

Job PR-571/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/10/index.html

eric-haibin-lin · 2019-02-19T00:07:24Z

@kenjewu are you able to reproduce the result with the latest commit? I am only able to get 84.1/84.4 using commit dac3940

vanewu · 2019-02-19T04:00:07Z

@kenjewu are you able to reproduce the result with the latest commit? I am only able to get 84.1/84.4 using commit dac3940

ok, I will try it again.

vanewu · 2019-02-19T06:52:24Z

In this task, setting epsilon to 1e-8 will give better results.

scripts/bert/finetune_classifier.py

mli · 2019-02-19T08:07:10Z

Job PR-571/11 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/11/index.html

mli · 2019-02-19T08:21:48Z

Job PR-571/12 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/12/index.html

eric-haibin-lin · 2019-02-19T18:32:08Z

Merged. Many thanks

* first commit MNLI * tune some params * add double dev for script * retrigger ci * Add MNLI based on dmlc#481 Add MNLI based on dmlc#481 fix MNLIDataset * Add load/save checkpoint * fix dev dataloader bug * add load pretrained bert checkpoints and update index.rst * fix the loading of parameters * Update description of model_parameters Update description of model_parameters fix context * add epsilon argument * reset default value of epsilon

vanewu added 3 commits January 25, 2019 12:26

first commit MNLI

802583b

tune some params

444d702

add double dev for script

ebd695a

vanewu requested a review from szha as a code owner January 26, 2019 10:39

eric-haibin-lin self-requested a review January 27, 2019 00:08

eric-haibin-lin added the release focus Progress focus for release label Jan 27, 2019

retrigger ci

dac3940

vanewu mentioned this pull request Jan 27, 2019

[SCRIPT] Reproducing GLUE score on 8 tasks #481

Merged

6 tasks

eric-haibin-lin reviewed Jan 28, 2019

View reviewed changes

vanewu added 3 commits January 30, 2019 14:49

Merge branch 'master' into MNLI

fc031c1

Add MNLI based on dmlc#481

1c67b98

Add MNLI based on dmlc#481 fix MNLIDataset

Add load/save checkpoint

48c3795

fix dev dataloader bug

ef05b7f

eric-haibin-lin reviewed Jan 30, 2019

View reviewed changes

scripts/bert/finetune_classifier.py Show resolved Hide resolved

vanewu added 2 commits February 13, 2019 13:13

Merge branch 'master' into MNLI

e33850a

add load pretrained bert checkpoints and update index.rst

1734e94

eric-haibin-lin reviewed Feb 13, 2019

View reviewed changes

scripts/bert/finetune_classifier.py Show resolved Hide resolved

scripts/bert/finetune_classifier.py Outdated Show resolved Hide resolved

scripts/bert/finetune_classifier.py Show resolved Hide resolved

fix the loading of parameters

c5b285e

Update description of model_parameters

6426dc6

Update description of model_parameters fix context

vanewu force-pushed the MNLI branch from 731590e to 6426dc6 Compare February 14, 2019 07:50

eric-haibin-lin approved these changes Feb 14, 2019

View reviewed changes

eric-haibin-lin requested a review from fierceX February 17, 2019 23:34

eric-haibin-lin mentioned this pull request Feb 18, 2019

[SCRIPT]QA Fine-tuning Example for BERT #493

Merged

4 tasks

fierceX approved these changes Feb 18, 2019

View reviewed changes

szha approved these changes Feb 18, 2019

View reviewed changes

add epsilon argument

4444320

eric-haibin-lin mentioned this pull request Feb 19, 2019

[FIX] Update default value of BERTAdam epsilon to 1e-6 #601

Merged

6 tasks

eric-haibin-lin suggested changes Feb 19, 2019

View reviewed changes

scripts/bert/finetune_classifier.py Outdated Show resolved Hide resolved

reset default value of epsilon

ac821a1

eric-haibin-lin approved these changes Feb 19, 2019

View reviewed changes

eric-haibin-lin merged commit 97896c6 into dmlc:master Feb 19, 2019

eric-haibin-lin mentioned this pull request Feb 20, 2019

[BERT] Reproduce BERT on MNLI #515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCRIPT] Reproduce MNLI tasks based on BERT #571

[SCRIPT] Reproduce MNLI tasks based on BERT #571

vanewu commented Jan 26, 2019 •

edited

codecov bot commented Jan 27, 2019

codecov bot commented Jan 27, 2019 •

edited

eric-haibin-lin commented Jan 28, 2019

eric-haibin-lin Jan 28, 2019

haven-jeon Jan 29, 2019

eric-haibin-lin Jan 29, 2019

vanewu Jan 29, 2019

haven-jeon commented Jan 29, 2019

vanewu commented Jan 29, 2019

mli commented Jan 30, 2019

mli commented Jan 30, 2019

haven-jeon commented Jan 30, 2019

mli commented Feb 4, 2019

vanewu commented Feb 12, 2019

mli commented Feb 13, 2019

mli commented Feb 13, 2019

haven-jeon commented Feb 13, 2019

mli commented Feb 14, 2019

mli commented Feb 14, 2019

eric-haibin-lin commented Feb 14, 2019

mli commented Feb 18, 2019

eric-haibin-lin commented Feb 19, 2019

vanewu commented Feb 19, 2019

vanewu commented Feb 19, 2019

mli commented Feb 19, 2019

mli commented Feb 19, 2019

eric-haibin-lin commented Feb 19, 2019

[SCRIPT] Reproduce MNLI tasks based on BERT #571

[SCRIPT] Reproduce MNLI tasks based on BERT #571

Conversation

vanewu commented Jan 26, 2019 • edited

Description

Checklist

Essentials

Changes

Comments

codecov bot commented Jan 27, 2019

Codecov Report

codecov bot commented Jan 27, 2019 • edited

Codecov Report

eric-haibin-lin commented Jan 28, 2019

eric-haibin-lin Jan 28, 2019

Choose a reason for hiding this comment

haven-jeon Jan 29, 2019

Choose a reason for hiding this comment

eric-haibin-lin Jan 29, 2019

Choose a reason for hiding this comment

vanewu Jan 29, 2019

Choose a reason for hiding this comment

haven-jeon commented Jan 29, 2019

vanewu commented Jan 29, 2019

mli commented Jan 30, 2019

mli commented Jan 30, 2019

haven-jeon commented Jan 30, 2019

mli commented Feb 4, 2019

vanewu commented Feb 12, 2019

mli commented Feb 13, 2019

mli commented Feb 13, 2019

haven-jeon commented Feb 13, 2019

mli commented Feb 14, 2019

mli commented Feb 14, 2019

eric-haibin-lin commented Feb 14, 2019

mli commented Feb 18, 2019

eric-haibin-lin commented Feb 19, 2019

vanewu commented Feb 19, 2019

vanewu commented Feb 19, 2019

mli commented Feb 19, 2019

mli commented Feb 19, 2019

eric-haibin-lin commented Feb 19, 2019

vanewu commented Jan 26, 2019 •

edited

codecov bot commented Jan 27, 2019 •

edited