【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel #5626

megemini · 2023-04-12T06:52:57Z

PR types

New features

PR changes

APIs

Description

【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel

Hackathon 4th No.102 这个任务里面有5个模型，我计划每个模型单独提PR，这个PR是处理 albert 模型。

使用的测试模型 hf-internal-testing/tiny-random-AlbertModel

反馈几个问题：

AlbertForQuestionAnswering 模型无法测试，transformers 的这个模型与 paddlenlp 有点区别：

class AlbertForQuestionAnswering(AlbertPreTrainedModel):
    _keys_to_ignore_on_load_unexpected = [r"pooler"]

    def __init__(self, config: AlbertConfig):
        super().__init__(config)
        self.num_labels = config.num_labels

        self.albert = AlbertModel(config, add_pooling_layer=False) # 注意这里
        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

transformers 的 albert 没有添加 pooling 层，而 paddle 是有的，导致无法做name mapping。
请问这个要怎么处理？

AlbertForMaskedLM 与 AlbertForPretraining 参考 BertModel 同样无法测试。

@wj-Mcat 请评审，谢谢！：）

…nto albert_auto_converter

paddle-bot · 2023-04-12T06:53:02Z

Thanks for your contribution!

codecov · 2023-04-12T10:55:00Z

Codecov Report

Merging #5626 (af80b45) into develop (413fd2f) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #5626      +/-   ##
===========================================
+ Coverage    59.58%   59.59%   +0.01%     
===========================================
  Files          483      483              
  Lines        68102    68121      +19     
===========================================
+ Hits         40581    40600      +19     
  Misses       27521    27521

Impacted Files	Coverage Δ
paddlenlp/transformers/albert/modeling.py	`86.19% <100.00%> (+0.61%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

megemini · 2023-04-12T11:46:21Z

刚看了一下 PaddleNLP-CI 失败的原因，好象是别的测试用例出错了？

wj-Mcat · 2023-04-13T03:08:02Z

刚看了一下 PaddleNLP-CI 失败的原因，好象是别的测试用例出错了？

是得，这个你暂时不用管。

wj-Mcat · 2023-04-13T03:18:10Z

transformers 的 albert 没有添加 pooling 层，而 paddle 是有的，导致无法做name mapping。请问这个要怎么处理？

transformers 源码中也是通过add_pooling_layer来控制是否添加pooler，所以这里有可能有pooler 层，这里你暂时加上pooler 的映射，只不过会抛 warning 。

AlbertForMaskedLM 与 AlbertForPretraining 也是同样来处理。

wj-Mcat · 2023-04-13T03:19:50Z

paddlenlp/transformers/albert/modeling.py

@@ -357,6 +358,118 @@ class AlbertPretrainedModel(PretrainedModel):
    pretrained_init_configuration = ALBERT_PRETRAINED_INIT_CONFIGURATION
    pretrained_resource_files_map = ALBERT_PRETRAINED_RESOURCE_FILES_MAP

+    @classmethod
+    def _get_name_mappings(cls, config: AlbertConfig) -> List[StateDictNameMapping]:
+        mappings: list[StateDictNameMapping] = []


这个变量定义在这里没有用到，故可以删除。

wj-Mcat · 2023-04-13T03:22:03Z

tests/transformers/albert/test_modeling.py

+            # ("AlbertForMaskedLM",),   TODO: need to tie weights
+            # ("AlbertForPretraining",),   TODO: need to tie weights
+            ("AlbertForMultipleChoice",),
+            # ("AlbertForQuestionAnswering",), TODO: transformers NOT add the last pool layer before qa_outputs


可通过 architectures 来控制 pooler 的映射。

wj-Mcat · 2023-04-13T03:23:37Z

tests/transformers/albert/test_modeling.py

+            # ("AlbertForMaskedLM",),   TODO: need to tie weights
+            # ("AlbertForPretraining",),   TODO: need to tie weights


等 #5623 合入之后就可以用 tie_weights 了。

…nto albert_auto_converter

megemini · 2023-04-13T07:18:03Z

之前 AlbertForQuestionAnswering 不过测的原因找到了，不是 pool 层的问题，是之前 name mapping 写错了，已经修改好，并且单测通过。
关于 add_pooling_layer，这个参数在 transformers 代码里面是写死的：

class AlbertForQuestionAnswering(AlbertPreTrainedModel):
  _keys_to_ignore_on_load_unexpected = [r"pooler"]

  def __init__(self, config: AlbertConfig):
      super().__init__(config)
      self.num_labels = config.num_labels

      self.albert = AlbertModel(config, add_pooling_layer=False) # 注意这里
      self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

      # Initialize weights and apply final processing
      self.post_init()

在 transformers 的 config.json 中也没有这个配置项：

{
"_name_or_path": "tiny_models/albert/AlbertForQuestionAnswering",
"architectures": [
  "AlbertForQuestionAnswering"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 2,
"classifier_dropout_prob": 0.1,
"embedding_size": 128,
"eos_token_id": 3,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 36,
"initializer_range": 0.02,
"inner_group_num": 1,
"intermediate_size": 37,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "albert",
"num_attention_heads": 6,
"num_hidden_groups": 6,
"num_hidden_layers": 6,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"torch_dtype": "float32",
"transformers_version": "4.28.0.dev0",
"type_vocab_size": 16,
"vocab_size": 30000
}

而在 paddlenlp.transformers 中是个配置项：

class AlbertModel(AlbertPretrainedModel):
  def __init__(self, config: AlbertConfig):
      ...
      if config.add_pooling_layer:  # 注意这里
          self.pooler = nn.Linear(config.hidden_size, config.hidden_size)
          self.pooler_activation = nn.Tanh()
      else:
          self.pooler = None
          self.pooler_activation = None

      self.init_weights()

当用 from_pretrained 引入模型的时候，默认将 add_pooling_layer 设为 True。

这样一般用没啥问题，因为模型的 logits 输出不走 pool 层，问题是，这时候 pool 层其实是随机初始化的，如果有人要用这个输出，那么结果是不可预测的。

我这里把 pool 层做了个判断：

      if config.add_pooling_layer:
          model_mappings.extend(
              [
                  ["pooler.weight", "pooler.weight", "transpose"],
                  ["pooler.bias", "pooler.bias"],
              ]
          )

请帮忙看看这样处理行不行？

谢谢！

sijunhe · 2023-04-13T10:48:11Z

AlbertForMaskedLM 与 AlbertForPretraining 可以暂时不需要，处理好AlbertForQuestionAnswering即可

megemini · 2023-04-14T03:02:18Z

AlbertForMaskedLM 与 AlbertForPretraining 可以暂时不需要，处理好AlbertForQuestionAnswering即可

嗯 AlbertForQuestionAnswering 已经处理好了～

sijunhe · 2023-04-14T03:08:45Z

需要merge develop一下哈

…nto albert_auto_converter

megemini · 2023-04-14T04:27:49Z

需要merge develop一下哈

搞定，请评审，谢谢！

sijunhe · 2023-04-14T05:45:22Z

请帮忙看看这样处理行不行？

谢谢！

没问题的

PR is good now

sijunhe

lgtm

megemini added 2 commits April 12, 2023 14:31

[Add]Add AlbertModel to AutoConverter

6f01e2e

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

13e935d

…nto albert_auto_converter

paddle-bot bot added contributor status: proposed labels Apr 12, 2023

megemini mentioned this pull request Apr 12, 2023

【PaddlePaddle Hackathon 第四期】任务总览 PaddlePaddle/Paddle#51281

Closed

[Change]Add classifier.bias to suppress warning

e0e60f5

wj-Mcat previously requested changes Apr 13, 2023

View reviewed changes

megemini added 2 commits April 13, 2023 14:52

[Fix]Fix AlbertForQuestionAnswering name mapping

6deaf28

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

c0b2eb1

…nto albert_auto_converter

sijunhe added the hackathon label Apr 13, 2023

megemini added 3 commits April 14, 2023 11:40

[Fix]Fix conflict

c4bdcd1

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

5b3d809

…nto albert_auto_converter

[Fix]Fix lint style

af80b45

sijunhe approved these changes Apr 14, 2023

View reviewed changes

sijunhe merged commit 65db5e5 into PaddlePaddle:develop Apr 14, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel #5626

【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel #5626

megemini commented Apr 12, 2023

paddle-bot bot commented Apr 12, 2023

codecov bot commented Apr 12, 2023 •

edited

megemini commented Apr 12, 2023

wj-Mcat commented Apr 13, 2023

wj-Mcat commented Apr 13, 2023 •

edited

wj-Mcat Apr 13, 2023

wj-Mcat Apr 13, 2023

wj-Mcat Apr 13, 2023

megemini commented Apr 13, 2023

sijunhe commented Apr 13, 2023

megemini commented Apr 14, 2023

sijunhe commented Apr 14, 2023

megemini commented Apr 14, 2023

sijunhe commented Apr 14, 2023

sijunhe left a comment

		# ("AlbertForMaskedLM",), TODO: need to tie weights
		# ("AlbertForPretraining",), TODO: need to tie weights

【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel #5626

【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel #5626

Conversation

megemini commented Apr 12, 2023

PR types

PR changes

Description

paddle-bot bot commented Apr 12, 2023

codecov bot commented Apr 12, 2023 • edited

Codecov Report

megemini commented Apr 12, 2023

wj-Mcat commented Apr 13, 2023

wj-Mcat commented Apr 13, 2023 • edited

wj-Mcat Apr 13, 2023

Choose a reason for hiding this comment

wj-Mcat Apr 13, 2023

Choose a reason for hiding this comment

wj-Mcat Apr 13, 2023

Choose a reason for hiding this comment

megemini commented Apr 13, 2023

sijunhe commented Apr 13, 2023

megemini commented Apr 14, 2023

sijunhe commented Apr 14, 2023

megemini commented Apr 14, 2023

sijunhe commented Apr 14, 2023

sijunhe left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 12, 2023 •

edited

wj-Mcat commented Apr 13, 2023 •

edited