Add MPNet Model #869

JunnYu · 2021-08-10T02:07:29Z

飞桨论文复现挑战赛（第四期）MPNet: Masked and Permuted Pre-training for Language Understanding 论文复现提交。

gongel

Please add docstrings for all your classes and methods which might be utilized by users. You can refer to paddlenlp.transformers.bert.

gongel · 2021-08-31T02:46:02Z

examples/language_model/mpnet/README.md

+[MPNet: Masked and Permuted Pre-training for Language Understanding - Microsoft Research](https://www.microsoft.com/en-us/research/publication/mpnet-masked-and-permuted-pre-training-for-language-understanding/)
+
+**摘要:**
+BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pretraining to address this problem. However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and fine-tune on a variety of down-streaming tasks (GLUE, SQuAD, etc). Experimental results show that MPNet outperforms MLM and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods (e.g., BERT, XLNet, RoBERTa) under the same model setting. The code and the pre-trained models are available at: https://github.com/microsoft/MPNet.


Please translate to Chinese.

gongel · 2021-08-31T02:46:16Z

examples/language_model/mpnet/README.md

+
+## 快速开始
+
+### 模型精度对齐


Please remove.

gongel · 2021-08-31T02:47:05Z

examples/language_model/mpnet/compare.py

@@ -0,0 +1,50 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.


Please delete compare.py

gongel

Thanks for your contribution and please give us feedback.

gongel · 2021-09-10T09:50:44Z

paddlenlp/transformers/mpnet/tokenizer.py

+    pretrained_resource_files_map = {
+        "vocab_file": {
+            "mpnet-base":
+            "https://paddlenlp.bj.bcebos.com/models/transformers/mpnet/mpnet-base/vocab.txt",


词表貌似没有ready？

另外Tokenizer默认会返回token_type_ids，但是这个在MPNet是不需要的

词表貌似没有ready？

done

gongel · 2021-09-10T10:11:50Z

paddlenlp/transformers/mpnet/modeling.py

+        embedding_output = self.embeddings(input_ids, position_ids)
+
+        encoder_outputs, _ = self.encoder(embedding_output,
+                                          extended_attention_mask)


extended_attention_mask --->attention_mask

gongel · 2021-09-10T10:31:07Z

paddlenlp/transformers/mpnet/modeling.py

+        mpnet (:class:`MPNetModel`):
+            An instance of MPNetModel.
+        num_classes (int, optional):
+            The number of classes. Defaults to `2`.


下面的默认值没给哈

yingyibiao

图片可以删了哈，保留表格就好

JunnYu · 2021-09-10T10:57:59Z

好的，过会改

gongel

处理下冲突

gongel · 2021-09-11T02:15:17Z

paddlenlp/transformers/mpnet/tokenizer.py

+            return len(cls + token_ids_0 + sep) * [0]
+        return len(cls + token_ids_0 + sep + sep + token_ids_1 + sep) * [0]
+
+    def __call__(self,


这个放在def __init__后面吧

gongel

LGTM😊

gongel · 2021-09-11T03:46:53Z

paddlenlp/transformers/mpnet/modeling.py

@@ -0,0 +1,903 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.


Google AI Language Team？

我偷懒从别的文件复制的，待会改

JunnYu added 2 commits August 10, 2021 11:00

add mpnet

5a088b4

update

5331f7d

yingyibiao self-assigned this Aug 10, 2021

update tokenizer and update readme

3b214db

gongel self-requested a review August 31, 2021 02:34

gongel self-assigned this Aug 31, 2021

gongel requested changes Aug 31, 2021

View reviewed changes

JunnYu added 3 commits September 7, 2021 12:07

Merge branch 'develop' into add_mpnet

1686759

update readme & add docs

e073781

rm unused figure

5232e53

yingyibiao requested a review from gongel September 10, 2021 07:19

gongel requested changes Sep 10, 2021

View reviewed changes

yingyibiao reviewed Sep 10, 2021

View reviewed changes

update

74c5691

JunnYu requested review from gongel and yingyibiao September 10, 2021 13:10

gongel previously approved these changes Sep 11, 2021

View reviewed changes

JunnYu added 2 commits September 11, 2021 10:29

Merge branch 'develop' into add_mpnet

5a98758

update

37aef51

JunnYu dismissed gongel’s stale review via 37aef51 September 11, 2021 02:31

gongel previously approved these changes Sep 11, 2021

View reviewed changes

gongel reviewed Sep 11, 2021

View reviewed changes

update copyright

67953ef

JunnYu dismissed gongel’s stale review via 67953ef September 11, 2021 04:50

Merge branch 'develop' into add_mpnet

2ff2a03

gongel approved these changes Sep 11, 2021

View reviewed changes

gongel merged commit b9a4cb1 into PaddlePaddle:develop Sep 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MPNet Model #869

Add MPNet Model #869

JunnYu commented Aug 10, 2021

gongel left a comment

gongel Aug 31, 2021

gongel Aug 31, 2021

gongel Aug 31, 2021

gongel left a comment

gongel Sep 10, 2021

gongel Sep 10, 2021

yingyibiao Sep 10, 2021

JunnYu Sep 10, 2021

gongel Sep 10, 2021

JunnYu Sep 10, 2021

gongel Sep 10, 2021

JunnYu Sep 10, 2021

yingyibiao left a comment

JunnYu commented Sep 10, 2021

gongel left a comment

gongel Sep 11, 2021

JunnYu Sep 11, 2021

gongel left a comment

gongel Sep 11, 2021

JunnYu Sep 11, 2021

		@@ -0,0 +1,50 @@
		# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,903 @@
		# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
		# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.

Add MPNet Model #869

Add MPNet Model #869

Conversation

JunnYu commented Aug 10, 2021

gongel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gongel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yingyibiao left a comment

Choose a reason for hiding this comment

JunnYu commented Sep 10, 2021

gongel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gongel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment