Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MegatronBERT P0] add PretrainedConfig and unit test #4912

Merged
merged 12 commits into from
Feb 24, 2023

Conversation

ZwhElliott
Copy link
Contributor

PR types

PR changes

Description

@paddle-bot
Copy link

paddle-bot bot commented Feb 21, 2023

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Feb 21, 2023

CLA assistant check
All committers have signed the CLA.

@sijunhe sijunhe changed the title "commit megatronbert configuration" [MegatronBERT P0] add PretrainedConfig and unit test Feb 21, 2023
@sijunhe
Copy link
Collaborator

sijunhe commented Feb 21, 2023

您好,欢迎参加飞桨黑客松。这个PR是一个好的开始,后续需要您打开test_modeling和test_tokenizer单测

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

进展不错。您可以配好环境以后在本地使用pytest tests/transformers/megatronbert/test_modeling.pypytest tests/transformers/megatronbert/test_tokenizer.py进行测试

@ZwhElliott
Copy link
Contributor Author

您好,发现了tokenizer.py文件中的bug,megatronbert_tokenizer在继承bert_tokenizer时部分参数无法传递,会影响tokenizer的参数传递而造成test_tokenizer无法通过.
修改后则顺利通过.

Comment on lines 204 to 209
self.self = MegatronBertSelfAttention(
num_attention_heads=num_attention_heads,
attention_probs_dropout_prob=attention_probs_dropout_prob,
max_position_embeddings=max_position_embeddings,
position_embedding_type=position_embedding_type,
num_attention_heads=config.num_attention_heads,
attention_probs_dropout_prob=config.attention_probs_dropout_prob,
max_position_embeddings=config.max_position_embeddings,
position_embedding_type=config.position_embedding_type,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该是self.self = MegatronBertSelfAttention(config)

super(MegatronBertAttention, self).__init__()
self.layer_norm = nn.LayerNorm(hidden_size, epsilon=layer_norm_eps)
self.layer_norm = nn.LayerNorm(config.hidden_size, epsilon=layer_norm_eps)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.layer_norm_eps

max_position_embeddings=config.max_position_embeddings,
position_embedding_type=config.position_embedding_type,
)
self.output = MegatronBertSelfOutput(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.output = MegatronBertSelfOutput(config)

Comment on lines 251 to 257
self.attention = MegatronBertAttention(
hidden_size=hidden_size,
num_attention_heads=num_attention_heads,
hidden_dropout_prob=hidden_dropout_prob,
attention_probs_dropout_prob=attention_probs_dropout_prob,
max_position_embeddings=max_position_embeddings,
position_embedding_type=position_embedding_type,
hidden_size=config.hidden_size,
num_attention_heads=config.num_attention_heads,
hidden_dropout_prob=config.hidden_dropout_prob,
attention_probs_dropout_prob=config.attention_probs_dropout_prob,
max_position_embeddings=config.max_position_embeddings,
position_embedding_type=config.position_embedding_type,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.attention = MegatronBertAttention(config)

)

self.layer_norm = nn.LayerNorm(hidden_size, epsilon=layer_norm_eps)
self.layer_norm = nn.LayerNorm(config.hidden_size, epsilon=layer_norm_eps)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config. layer_norm_eps

Comment on lines 261 to 266
self.intermediate = MegatronBertIntermediate(
hidden_size=hidden_size, intermediate_size=intermediate_size, hidden_act=hidden_act
hidden_size=config.hidden_size, intermediate_size=config.intermediate_size, hidden_act=config.hidden_act
)
self.output = MegatronBertOutput(
intermediate_size, hidden_dropout_prob=hidden_dropout_prob, hidden_size=hidden_size
config.intermediate_size, hidden_dropout_prob=config.hidden_dropout_prob, hidden_size=config.hidden_size
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的输入参数统一都是只有config

position_embedding_type=None,
num_hidden_layers=24,
):
def __init__(self, config: MegatronBertConfig):
super(MegatronBertEncoder, self).__init__()
self.layer = nn.LayerList(
[
MegatronBertLayer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

输入参数应该是config

self.initializer_range = initializer_range
self.num_hidden_layers = config.num_hidden_layers
self.pad_token_id = config.pad_token_id
self.initializer_range = config.initializer_range
self.embeddings = MegatronBertEmbeddings(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

输入参数应该是config

type_vocab_size=config.type_vocab_size,
max_position_embeddings=config.max_position_embeddings,
hidden_dropout_prob=config.hidden_dropout_prob,
position_embedding_type=config.position_embedding_type,
)
self.encoder = MegatronBertEncoder(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

输入参数应该是config

@sijunhe
Copy link
Collaborator

sijunhe commented Feb 23, 2023

您好,发现了tokenizer.py文件中的bug,megatronbert_tokenizer在继承bert_tokenizer时部分参数无法传递,会影响tokenizer的参数传递而造成test_tokenizer无法通过. 修改后则顺利通过.

没问题。这个模型代码年久失修,有bug也是预期之中,在PR里一起修复提交即可

@ZwhElliott
Copy link
Contributor Author

您好,这样是不是就完成了一个模型了?

@codecov
Copy link

codecov bot commented Feb 24, 2023

Codecov Report

Merging #4912 (e2ba41a) into develop (8d0801e) will increase coverage by 1.23%.
The diff coverage is 98.49%.

@@             Coverage Diff             @@
##           develop    #4912      +/-   ##
===========================================
+ Coverage    46.32%   47.56%   +1.23%     
===========================================
  Files          448      453       +5     
  Lines        64616    65421     +805     
===========================================
+ Hits         29936    31116    +1180     
+ Misses       34680    34305     -375     
Impacted Files Coverage Δ
paddlenlp/transformers/megatronbert/tokenizer.py 100.00% <ø> (+9.09%) ⬆️
paddlenlp/transformers/megatronbert/modeling.py 94.94% <98.14%> (+70.37%) ⬆️
paddlenlp/transformers/__init__.py 100.00% <100.00%> (ø)
...ddlenlp/transformers/megatronbert/configuration.py 100.00% <100.00%> (ø)
paddlenlp/trainer/integrations.py 68.22% <0.00%> (-18.37%) ⬇️
paddlenlp/taskflow/text_classification.py 72.15% <0.00%> (-2.85%) ⬇️
paddlenlp/transformers/chineseclip/modeling.py 82.94% <0.00%> (-2.54%) ⬇️
paddlenlp/taskflow/task.py 48.78% <0.00%> (-1.40%) ⬇️
paddlenlp/transformers/ernie_vil/modeling.py 76.36% <0.00%> (-0.94%) ⬇️
paddlenlp/trainer/training_args.py 69.28% <0.00%> (-0.75%) ⬇️
... and 21 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 稍微修改一下我comment的小问题即可合入

self.decoder_weight = self.create_parameter(
shape=[vocab_size, hidden_size], dtype=self.transform.weight.dtype, is_bias=False
shape=[config.vocab_size, config.hidden_size], dtype=nn.Embedding(1, 1)._dtype, is_bias=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里dtype还是保持dtype=self.transform.weight.dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

您好,为什么改了这一块是因为'MegatronBertPredictionHeadTransform' object has no attribute 'weight'
原始modeling文件中,MegatronBertPredictionHeadTransform没有weight参数

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

了解了,那就改成dtype=self.transform.dense.weight.dtype 试试

Comment on lines 303 to 311
@parameterized_class(
("return_dict", "use_labels"),
[
[False, False],
[False, True],
[True, False],
[True, True],
],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前的模型并没有实现return_dict和labels, 所以parameterized_class这一段不需要,可以删除

Comment on lines 58 to 61
"megatronbert-cased": "http://bj.bcebos.com/paddlenlp/models/transformers/"
"megatron-bert/megatronbert-cased/model_state.pdparams",
"megatronbert-uncased": "http://bj.bcebos.com/paddlenlp/models/transformers/"
"megatron-bert/megatronbert-cased/model_state.pdparams",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"megatronbert-cased": "http://bj.bcebos.com/paddlenlp/models/transformers/"
"megatron-bert/megatronbert-cased/model_state.pdparams",
"megatronbert-uncased": "http://bj.bcebos.com/paddlenlp/models/transformers/"
"megatron-bert/megatronbert-cased/model_state.pdparams",
"megatronbert-cased": "http://bj.bcebos.com/paddlenlp/models/transformers/megatron-bert/megatronbert-cased/model_state.pdparams",
"megatronbert-uncased": "http://bj.bcebos.com/paddlenlp/models/transformers/megatron-bert/megatronbert-uncased/model_state.pdparams",

@sijunhe
Copy link
Collaborator

sijunhe commented Feb 24, 2023

您好,这样是不是就完成了一个模型了?

非常接近了,稍微修改一下我comment的小问题即可合入

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. 感谢您完成这个模型的升级!

@sijunhe sijunhe merged commit a27b4b1 into PaddlePaddle:develop Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants