Skip to content

@eric-haibin-lin eric-haibin-lin released this Mar 18, 2019 · 45 commits to master since this release

News

  • Tutorial proposal for GluonNLP is accepted at EMNLP 2019, Hong Kong, and KDD 2019, Anchorage.

Models and Scripts

  • BERT pre-training on BooksCorpus and English Wikipedia with mixed precision and gradient accumulation on GPUs. We achieved the following fine-tuning results based on the produced checkpoint on validation sets(#482, #505, #489). Thank you @haven-jeon

    • Dataset MRPC SQuAD 1.1 SST-2 MNLI-mm
      Score 87.99% 80.99/88.60 93% 83.6%
  • BERT fine-tuning on various sentence classification datasets with checkpoints converted from the official repository(#600, #571, #481). Thank you @kenjewu @haven-jeon

    • Dataset MRPC RTE SST-2 MNLI-m/mm
      Score 88.7% 70.8% 93% 84.55%, 84.66%
  • BERT fine-tuning on question answering datasets with checkpoints converted from the official repository(#493). Thank you @fierceX

    • Dataset SQuAD 1.1 SQuAD 1.1 SQuAD 2.0
      Model bert_12_768_12 bert_24_1024_16 bert_24_1024_16
      F1/EM 88.53/80.98 90.97/84.05 77.96/81.02
  • BERT model convertion scripts for checkpoints from the original tensorflow repository, and more converted models(#456, #461, #449). Thank you @fierceX:

    • Multilingual Wikipedia (cased, BERT Base)
    • Chinese Wikipedia (cased, BERT Base)
    • Books Corpus & English Wikipedia (uncased, BERT Large)
  • Scripts and command line interface for BERT embedding of raw sentences(#587, #618). Thank you @imgarylai

  • Scripts for exporting BERT model for deployment (#624)

New Features

  • [API] Add BERTVocab (#509) thanks @kenjewu
  • [API] Add Transforms for BERT (#526) thanks @kenjewu
  • [API] add data parallel for transformer (#387)
  • [FEATURE] Add squad2.0 Dataset (#551) thanks @fierceX
  • [FEATURE] Add NumpyDataset (#498)
  • [FEATURE] Add TruncNorm initializer for BERT (#548) thanks @Ishitori
  • [FEATURE] Add split sampler for distributed training (#494)
  • [FEATURE] Custom metric for masked accuracy (#503)
  • [FEATURE] Support custom sampler in SimpleDatasetStream (#507)
  • [FEATURE] clip gradient norm by parameter (#470)

Bug Fixes

  • [BUGFIX] Fix Data Preprocessing for Translation Data (#568)
  • [FIX] fix parameter clip (#527)
  • [FIX] Fix divergence of the training of transformer (#543)
  • [FIX] Fix documentation and a bug in NCE Block (#558)
  • [FIX] Fix hashing single ngrams in NGramHashes (#450)
  • [FIX] Fix weight dying in BERTModel.decoder for BERT pre-training (#500)
  • [BUGFIX] Modifying the FastText Classification training for accurate mean pooling (#529) thanks @sravanbabuiitm

API Changes

  • [API] BERT return intermediate encodings per layer (#606) thanks @Ishitori
  • [API] Better handle case when backoff is not possible in TokenEmbedding (#459)
  • [FIX] Rename wiki_cn/wiki_multilingual to wiki_cn_cased/wiki_multilingual_uncased (#594) thanks @kenjewu
  • [FIX] Update default value of BERTAdam epsilon to 1e-6 (#601)
  • [FIX] Fix BERT decoder API for masked language model prediction (#501)
  • [FIX] Remove bias correction term in BERTAdam (#499)

Enhancements

  • [BUGFIX] use glove.840B.300d for NLI experiments (#567)
  • [API] Add debug option for parallel (#584)
  • [FEATURE] Skip dropout layer in Transformer when rate=0 (#597) thanks @TaoLv
  • [FEATURE] update sharded loader (#468)
  • [FIX] Update BERTLayerNorm Implementation (#485)
  • [TUTORIAL] Use FixedBucketSampler in BERT tutorial for better performance (#506) thanks @Ishitori
  • [API] Add Bert tokenizer to transforms.py (#464) thanks @fierceX
  • [FEATURE] Add data parallel to big rnn lm script (#564)

Minor Fixes

Assets 2
You can’t perform that action at this time.