[ErnieDoc P0] add PretrainedConfig and unit test #5210

ZwhElliott · 2023-03-13T12:53:23Z

PR types

PR changes

Description

paddle-bot · 2023-03-13T12:53:29Z

Thanks for your contribution!

ZwhElliott · 2023-03-13T12:56:03Z

paddlenlp/transformers/ernie_doc/tokenizer.py

@@ -75,6 +70,8 @@ class ErnieDocTokenizer(ErnieTokenizer):
        "ernie-doc-base-zh": {"do_lower_case": True},
    }

+    # max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES


这里缺少了预训练模型对应的PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES，导致它的max_model_input_size继承了ErnieTokenizer。导致了test_pretrained_model_lists无法通过。
但是我在文档中并未找到 "ernie-doc-base-zh"这个模型的POSITIONAL_EMBEDDINGS_SIZES，所以尚未添加。

这里对齐configurations里面的max_position_embeddings即可

sijunhe

d_model还是不要改成hidden_size，维持原样.
PR质量挺高的，稍微修改一下即可

sijunhe · 2023-03-13T13:31:56Z

paddlenlp/transformers/ernie_doc/tokenizer.py

@@ -75,6 +70,8 @@ class ErnieDocTokenizer(ErnieTokenizer):
        "ernie-doc-base-zh": {"do_lower_case": True},
    }

+    # max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES


这里对齐configurations里面的max_position_embeddings即可

sijunhe · 2023-03-13T13:34:25Z

tests/transformers/ernie_doc/test_tokenizer.py

+class ErnieTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
+
+    tokenizer_class = ErnieDocTokenizer
+    # fast_tokenizer_class = ErnieFastTokenizer


没有fast tokenizer可以删掉

codecov · 2023-03-14T04:20:48Z

Codecov Report

Merging #5210 (75eb5f4) into develop (c7395fb) will increase coverage by 0.75%.
The diff coverage is 100.00%.

❗ Current head 75eb5f4 differs from pull request most recent head d9f1207. Consider uploading reports for the commit d9f1207 to get more accurate results

@@             Coverage Diff             @@
##           develop    #5210      +/-   ##
===========================================
+ Coverage    50.93%   51.69%   +0.75%     
===========================================
  Files          461      467       +6     
  Lines        65731    66629     +898     
===========================================
+ Hits         33481    34444     +963     
+ Misses       32250    32185      -65

Impacted Files	Coverage Δ
paddlenlp/transformers/__init__.py	`100.00% <100.00%> (ø)`
paddlenlp/transformers/ernie_doc/configuration.py	`100.00% <100.00%> (ø)`
paddlenlp/transformers/ernie_doc/modeling.py	`96.55% <100.00%> (+77.18%)`	⬆️
paddlenlp/transformers/ernie_doc/tokenizer.py	`89.47% <100.00%> (+2.51%)`	⬆️

... and 25 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

sijunhe · 2023-03-14T04:25:18Z

paddlenlp/transformers/ernie_doc/modeling.py

                        std=self.initializer_range
                        if hasattr(self, "initializer_range")
-                        else self.ernie_doc.config["initializer_range"],
+                        else self.ernie_doc.initializer_range,


std=self.config.initializer_range

sijunhe

lgtm!

paddle-bot bot added contributor status: proposed labels Mar 13, 2023

ZwhElliott commented Mar 13, 2023

View reviewed changes

"add PretrainedConfig and unit test"

f7911cf

sijunhe reviewed Mar 13, 2023

View reviewed changes

"fix"

75eb5f4

sijunhe reviewed Mar 14, 2023

View reviewed changes

"fix"

d9f1207

sijunhe approved these changes Mar 14, 2023

View reviewed changes

sijunhe merged commit 9590300 into PaddlePaddle:develop Mar 14, 2023

sijunhe added the hackathon label Mar 16, 2023

ZwhElliott deleted the dth-039 branch March 17, 2023 03:08

Ligoml mentioned this pull request Mar 29, 2023

【PaddlePaddle Hackathon 第四期】任务总览 PaddlePaddle/Paddle#51281

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ErnieDoc P0] add PretrainedConfig and unit test #5210

[ErnieDoc P0] add PretrainedConfig and unit test #5210

ZwhElliott commented Mar 13, 2023

paddle-bot bot commented Mar 13, 2023

ZwhElliott Mar 13, 2023

sijunhe Mar 13, 2023

sijunhe left a comment

sijunhe Mar 13, 2023

sijunhe Mar 13, 2023

codecov bot commented Mar 14, 2023 •

edited

sijunhe Mar 14, 2023

sijunhe left a comment

[ErnieDoc P0] add PretrainedConfig and unit test #5210

[ErnieDoc P0] add PretrainedConfig and unit test #5210

Conversation

ZwhElliott commented Mar 13, 2023

PR types

PR changes

Description

paddle-bot bot commented Mar 13, 2023

ZwhElliott Mar 13, 2023

Choose a reason for hiding this comment

sijunhe Mar 13, 2023

Choose a reason for hiding this comment

sijunhe left a comment

Choose a reason for hiding this comment

sijunhe Mar 13, 2023

Choose a reason for hiding this comment

sijunhe Mar 13, 2023

Choose a reason for hiding this comment

codecov bot commented Mar 14, 2023 • edited

Codecov Report

sijunhe Mar 14, 2023

Choose a reason for hiding this comment

sijunhe left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 14, 2023 •

edited