Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel #5626

Merged
merged 8 commits into from
Apr 14, 2023

Conversation

megemini
Copy link
Contributor

PR types

New features

PR changes

APIs

Description

【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel

Hackathon 4th No.102 这个任务里面有5个模型,我计划每个模型单独提PR,这个PR是处理 albert 模型。

  • 使用的测试模型 hf-internal-testing/tiny-random-AlbertModel

反馈几个问题:

  • AlbertForQuestionAnswering 模型无法测试,transformers 的这个模型与 paddlenlp 有点区别:

    class AlbertForQuestionAnswering(AlbertPreTrainedModel):
        _keys_to_ignore_on_load_unexpected = [r"pooler"]
    
        def __init__(self, config: AlbertConfig):
            super().__init__(config)
            self.num_labels = config.num_labels
    
            self.albert = AlbertModel(config, add_pooling_layer=False) # 注意这里
            self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
    
            # Initialize weights and apply final processing
            self.post_init()  

    transformersalbert 没有添加 pooling 层,而 paddle 是有的,导致无法做name mapping。
    请问这个要怎么处理?

  • AlbertForMaskedLMAlbertForPretraining 参考 BertModel 同样无法测试。

@wj-Mcat 请评审,谢谢!:)

@paddle-bot
Copy link

paddle-bot bot commented Apr 12, 2023

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Apr 12, 2023

Codecov Report

Merging #5626 (af80b45) into develop (413fd2f) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #5626      +/-   ##
===========================================
+ Coverage    59.58%   59.59%   +0.01%     
===========================================
  Files          483      483              
  Lines        68102    68121      +19     
===========================================
+ Hits         40581    40600      +19     
  Misses       27521    27521              
Impacted Files Coverage Δ
paddlenlp/transformers/albert/modeling.py 86.19% <100.00%> (+0.61%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@megemini
Copy link
Contributor Author

刚看了一下 PaddleNLP-CI 失败的原因,好象是别的测试用例出错了?

@wj-Mcat
Copy link
Contributor

wj-Mcat commented Apr 13, 2023

刚看了一下 PaddleNLP-CI 失败的原因,好象是别的测试用例出错了?

是得,这个你暂时不用管。

@wj-Mcat
Copy link
Contributor

wj-Mcat commented Apr 13, 2023

transformers 的 albert 没有添加 pooling 层,而 paddle 是有的,导致无法做name mapping。请问这个要怎么处理?

transformers 源码中也是通过add_pooling_layer来控制是否添加pooler,所以这里有可能有pooler 层,这里你暂时加上pooler 的映射,只不过会抛 warning 。

AlbertForMaskedLM 与 AlbertForPretraining 也是同样来处理。

@@ -357,6 +358,118 @@ class AlbertPretrainedModel(PretrainedModel):
pretrained_init_configuration = ALBERT_PRETRAINED_INIT_CONFIGURATION
pretrained_resource_files_map = ALBERT_PRETRAINED_RESOURCE_FILES_MAP

@classmethod
def _get_name_mappings(cls, config: AlbertConfig) -> List[StateDictNameMapping]:
mappings: list[StateDictNameMapping] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个变量定义在这里没有用到,故可以删除。

# ("AlbertForMaskedLM",), TODO: need to tie weights
# ("AlbertForPretraining",), TODO: need to tie weights
("AlbertForMultipleChoice",),
# ("AlbertForQuestionAnswering",), TODO: transformers NOT add the last pool layer before qa_outputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可通过 architectures 来控制 pooler 的映射。

Comment on lines +429 to +430
# ("AlbertForMaskedLM",), TODO: need to tie weights
# ("AlbertForPretraining",), TODO: need to tie weights
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5623 合入之后就可以用 tie_weights 了。

@megemini
Copy link
Contributor Author

  1. 之前 AlbertForQuestionAnswering 不过测的原因找到了,不是 pool 层的问题,是之前 name mapping 写错了,已经修改好,并且单测通过。
  2. 关于 add_pooling_layer,这个参数在 transformers 代码里面是写死的:
class AlbertForQuestionAnswering(AlbertPreTrainedModel):
  _keys_to_ignore_on_load_unexpected = [r"pooler"]

  def __init__(self, config: AlbertConfig):
      super().__init__(config)
      self.num_labels = config.num_labels

      self.albert = AlbertModel(config, add_pooling_layer=False) # 注意这里
      self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

      # Initialize weights and apply final processing
      self.post_init()  

transformersconfig.json 中也没有这个配置项:

{
"_name_or_path": "tiny_models/albert/AlbertForQuestionAnswering",
"architectures": [
  "AlbertForQuestionAnswering"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 2,
"classifier_dropout_prob": 0.1,
"embedding_size": 128,
"eos_token_id": 3,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 36,
"initializer_range": 0.02,
"inner_group_num": 1,
"intermediate_size": 37,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "albert",
"num_attention_heads": 6,
"num_hidden_groups": 6,
"num_hidden_layers": 6,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"torch_dtype": "float32",
"transformers_version": "4.28.0.dev0",
"type_vocab_size": 16,
"vocab_size": 30000
}

而在 paddlenlp.transformers 中是个配置项:

class AlbertModel(AlbertPretrainedModel):
  def __init__(self, config: AlbertConfig):
      ...
      if config.add_pooling_layer:  # 注意这里
          self.pooler = nn.Linear(config.hidden_size, config.hidden_size)
          self.pooler_activation = nn.Tanh()
      else:
          self.pooler = None
          self.pooler_activation = None

      self.init_weights()

当用 from_pretrained 引入模型的时候,默认将 add_pooling_layer 设为 True

这样一般用没啥问题,因为模型的 logits 输出不走 pool 层,问题是,这时候 pool 层其实是随机初始化的,如果有人要用这个输出,那么结果是不可预测的。

我这里把 pool 层做了个判断:

      if config.add_pooling_layer:
          model_mappings.extend(
              [
                  ["pooler.weight", "pooler.weight", "transpose"],
                  ["pooler.bias", "pooler.bias"],
              ]
          )

请帮忙看看这样处理行不行?

谢谢!

@sijunhe
Copy link
Collaborator

sijunhe commented Apr 13, 2023

AlbertForMaskedLMAlbertForPretraining 可以暂时不需要,处理好AlbertForQuestionAnswering即可

@megemini
Copy link
Contributor Author

AlbertForMaskedLMAlbertForPretraining 可以暂时不需要,处理好AlbertForQuestionAnswering即可

AlbertForQuestionAnswering 已经处理好了~

@sijunhe
Copy link
Collaborator

sijunhe commented Apr 14, 2023

需要merge develop一下哈

@megemini
Copy link
Contributor Author

需要merge develop一下哈

搞定,请评审,谢谢!

@sijunhe
Copy link
Collaborator

sijunhe commented Apr 14, 2023

请帮忙看看这样处理行不行?

谢谢!

没问题的

@sijunhe sijunhe dismissed wj-Mcat’s stale review April 14, 2023 09:40

PR is good now

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sijunhe sijunhe merged commit 65db5e5 into PaddlePaddle:develop Apr 14, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants