Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[SCRIPT] Reproduce MNLI tasks based on BERT #571

Merged
merged 14 commits into from Feb 19, 2019
Merged

Conversation

vanewu
Copy link
Contributor

@vanewu vanewu commented Jan 26, 2019

Description

Fine-tune the MNLI task using the pre-trained bert model. Add the ability to load and save checkpoint files.
This task is built on top of #481. Thanks everyone for their work.

Results [log]:
On dev_matched.tsv : 0.846
On dev_mismatched.tsv : 0.847

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

Comments

  • I will try finer parameters with different bert models and update the log.

@vanewu vanewu requested a review from szha as a code owner January 26, 2019 10:39
@eric-haibin-lin eric-haibin-lin added the release focus Progress focus for release label Jan 27, 2019
@codecov
Copy link

codecov bot commented Jan 27, 2019

Codecov Report

Merging #571 into master will decrease coverage by 0.38%.
The diff coverage is 6.21%.

@@            Coverage Diff             @@
##           master     #571      +/-   ##
==========================================
- Coverage   71.59%   71.21%   -0.39%     
==========================================
  Files         125      126       +1     
  Lines       10595    10665      +70     
==========================================
+ Hits         7586     7595       +9     
- Misses       3009     3070      +61
Flag Coverage Δ
#PR564 ?
#PR567 ?
#PR571 71.21% <6.21%> (?)
#master ?
#notserial 48.09% <6.21%> (-0.25%) ⬇️
#py2 70.98% <6.21%> (-0.39%) ⬇️
#py3 71.07% <6.21%> (-0.39%) ⬇️
#serial 55.07% <6.21%> (-0.29%) ⬇️

@codecov
Copy link

codecov bot commented Jan 27, 2019

Codecov Report

Merging #571 into master will decrease coverage by 1.3%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #571      +/-   ##
==========================================
- Coverage   65.85%   64.55%   -1.31%     
==========================================
  Files         147      148       +1     
  Lines       13090    13803     +713     
==========================================
+ Hits         8621     8910     +289     
- Misses       4469     4893     +424
Flag Coverage Δ
#PR420 71.41% <100%> (+4.25%) ⬆️
#PR435 71.45% <100%> (+0.05%) ⬆️
#PR466 72.05% <100%> (+0.03%) ⬆️
#PR493 67.22% <100%> (+0.75%) ⬆️
#PR505 68.18% <100%> (?)
#PR529 71.37% <100%> (-0.04%) ⬇️
#PR539 69.8% <100%> (-0.04%) ⬇️
#PR571 71.22% <100%> (?)
#PR573 ?
#PR582 ?
#PR587 ?
#PR588 72.02% <100%> (+1%) ⬆️
#master 71.3% <100%> (-0.11%) ⬇️
#notserial 40.3% <100%> (-1.55%) ⬇️
#py2 62.51% <100%> (-2.43%) ⬇️
#py3 64.44% <100%> (-1.28%) ⬇️
#serial 51.1% <100%> (-1.23%) ⬇️

@eric-haibin-lin
Copy link
Member

What is the command to reproduce the result?

dataset = 'book_corpus_wiki_en_uncased'
bert, vocabulary = bert_12_768_12(dataset_name=dataset,
pretrained=True, ctx=ctx, use_pooler=True,
use_decoder=False, use_classifier=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add an option to load from a specific checkpoint file? For example, users can specify --load_checkpoint ~/.mxnet/bert/xxx.params

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-haibin-lin
GLUE task only requires 3 or 5 epochs for fine-tuning, I don't think checkpoint requires currently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. What I mean is that if someone uses our pre-training script to further pre-train on their dataset (work in progress, not merged yet), they're interested in loading the bert checkpoint and finetune on other tasks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding this option will make the script work for a more specific situation, and this is easy to implement. I can add it later.

@haven-jeon
Copy link
Member

Can you show training logs? 😃

@vanewu
Copy link
Contributor Author

vanewu commented Jan 29, 2019

Can you show training logs?

Of course, I will open a PR later to upload the log to dmlc/web-data. Then link it here.

@mli
Copy link
Member

mli commented Jan 30, 2019

Job PR-571/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/3/index.html

@mli
Copy link
Member

mli commented Jan 30, 2019

Job PR-571/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/4/index.html

@haven-jeon
Copy link
Member

Can you show training logs?

Of course, I will open a PR later to upload the log to dmlc/web-data. Then link it here.

I just check, max_len was important hyper-params for mnli. is it right?

@mli
Copy link
Member

mli commented Feb 4, 2019

Job PR-571/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/5/index.html

@vanewu
Copy link
Contributor Author

vanewu commented Feb 12, 2019

Can you show training logs?

Of course, I will open a PR later to upload the log to dmlc/web-data. Then link it here.

I just check, max_len was important hyper-params for mnli. is it right?

@haven-jeon I have not determined the value of this parameter in the original paper. In the fine-tuning I set the · max_len equal to 80, because I have a rough statistics on the data used for the MNLI task, the sample of length 80 can cover 98% of the total data set.

@mli
Copy link
Member

mli commented Feb 13, 2019

Job PR-571/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/6/index.html

@mli
Copy link
Member

mli commented Feb 13, 2019

Job PR-571/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/2/index.html

@haven-jeon
Copy link
Member

Can you show training logs?

Of course, I will open a PR later to upload the log to dmlc/web-data. Then link it here.

I just check, max_len was important hyper-params for mnli. is it right?

@haven-jeon I have not determined the value of this parameter in the original paper. In the fine-tuning I set the · max_len equal to 80, because I have a rough statistics on the data used for the MNLI task, the sample of length 80 can cover 98% of the total data set.

@kenjewu max_len=80 would be good hyper parameter, because if not, it gets low performance. Thanks for your explanation.

scripts/bert/finetune_classifier.py Show resolved Hide resolved
scripts/bert/finetune_classifier.py Outdated Show resolved Hide resolved
scripts/bert/finetune_classifier.py Show resolved Hide resolved
@mli
Copy link
Member

mli commented Feb 14, 2019

Job PR-571/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/7/index.html

Update description of model_parameters

fix context
@mli
Copy link
Member

mli commented Feb 14, 2019

Job PR-571/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/9/index.html

@eric-haibin-lin
Copy link
Member

@haven-jeon could you help do another round of review and approve/request changes?

@mli
Copy link
Member

mli commented Feb 18, 2019

Job PR-571/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/10/index.html

@eric-haibin-lin
Copy link
Member

@kenjewu are you able to reproduce the result with the latest commit? I am only able to get 84.1/84.4 using commit dac3940

@vanewu
Copy link
Contributor Author

vanewu commented Feb 19, 2019

@kenjewu are you able to reproduce the result with the latest commit? I am only able to get 84.1/84.4 using commit dac3940

ok, I will try it again.

@vanewu
Copy link
Contributor Author

vanewu commented Feb 19, 2019

In this task, setting epsilon to 1e-8 will give better results.

@mli
Copy link
Member

mli commented Feb 19, 2019

Job PR-571/11 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/11/index.html

@mli
Copy link
Member

mli commented Feb 19, 2019

Job PR-571/12 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-571/12/index.html

@eric-haibin-lin eric-haibin-lin merged commit 97896c6 into dmlc:master Feb 19, 2019
@eric-haibin-lin
Copy link
Member

Merged. Many thanks

paperplanet pushed a commit to paperplanet/gluon-nlp that referenced this pull request Jun 9, 2019
* first commit MNLI

* tune some params

* add double dev for script

* retrigger ci

* Add MNLI based on dmlc#481

Add MNLI based on dmlc#481

fix MNLIDataset

* Add load/save checkpoint

* fix dev dataloader bug

* add load pretrained bert checkpoints and update index.rst

* fix the loading of parameters

* Update description of model_parameters

Update description of model_parameters

fix context

* add epsilon argument

* reset default value of epsilon
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
release focus Progress focus for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants