-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PaddlePaddle Hackathon 4】[103] 新增tie_weights能力 代码和单元测试 #5193
Conversation
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #5193 +/- ##
===========================================
- Coverage 53.39% 51.96% -1.43%
===========================================
Files 476 469 -7
Lines 67568 66751 -817
===========================================
- Hits 36078 34688 -1390
- Misses 31490 32063 +573
... and 42 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
收到,感谢提交PR,我们尽快review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
很不错的开始!
docs/community/rfcs/20230304_api_design_for_tie_weight_task_103.md
Outdated
Show resolved
Hide resolved
@@ -13,6 +13,7 @@ | |||
# 一、概述 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,下次提交进行删除
return None | ||
|
||
# raise NotImplementedError( | ||
# f"model of {type(base_model)} has not implemented the `get_input_embeddings`" | ||
# " or `set_input_embeddings` method" | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个如果没有implement, 还是需要raise error的
|
||
if hasattr(model, 'get_input_embeddings') and hasattr(model, 'get_output_embeddings'): | ||
input_embeddings = model.get_input_embeddings() | ||
output_embeddings = model.get_input_embeddings() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_output_embeddings?
# add | ||
model.tie_weights() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个和tie_weights没什么关系吧?
# add | ||
model.tie_weights() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个和tie_weights没什么关系吧?
if hasattr(self, "get_output_embeddings") and hasattr(self, "get_input_embeddings") and tie_word_embeddings: | ||
output_embeddings = self.get_output_embeddings() | ||
if output_embeddings is not None and output_embeddings is not None: | ||
self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前这样实现的话,是不是就不需要set_input/output_embeddings这个API了?
f71695b
to
681f199
Compare
@sijunhe 针对前面提到的一些问题都进行了修改和提交. 但目前还有一个问题未进行修改, 还有些疑惑:
我们是否也参照hf中的方式来实现? |
可以通过我上面提出的加一个开关 |
2235b24
to
c5acf56
Compare
c5acf56
to
5d41677
Compare
@sijunhe 你好 按照之前给出的建议,重新实现了一遍,
目前模型调用tie_weight()方法进行input_embeding和output_embeding 的绑定需要满足如下几个条件: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiuwenbogdut 你好, 我review了你的实现,做的非常不错!
对于没有通过的模型,除了没实现单测的,可以稍微改变一下模型代码来满足tie_weights的要求
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个PR已经接近ready, 还需要一下几点:
- 需要您通过我们的lint(代码风格监测), 具体的风格错误您可以点击我们ci中的lint的details看到,如果您使用了我们的pre-commit, 应该可以每次git commit的时候自动监测
- 需要全量通过test这个ci. 目前除了reformer以外,还有一些模型(blip, chinese_clip, clip, dpt, ernie_vil)因为get_output_embeddings报错。这里我想了一下,base model里的get_output_embeddings确实可以回退回之前返回None的方案,而不用raise UnimplementedError
- 除了已经通过测试了的roberta和electra以外,我们还需要检查剩下的模型。可以在
paddlenlp/transformers
路径下搜索MaskedLM
和CausalLM
, 测试所有支持这两个任务的模型,并且统一tie_weights的实现方法。如果可以支持+测试通过,就打开flag。改动太大+测试不通过,就关闭flag同时注释 (类似reformer)
@@ -640,6 +641,7 @@ class ReformerLSHAttnModelTest(ReformerTesterMixin, ModelTesterMixin, unittest.T | |||
test_pruning = False | |||
test_headmasking = False | |||
test_torchscript = False | |||
test_tie_weights = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_tie_weights = True | |
test_tie_weights = False # reformer tie_weights implementation is problematic for now |
d68f5d2
to
51f1c2b
Compare
@sijunhe 还有两个疑问请教一下: |
这里可以安装我们的pre-commit工具,在根目录运行make install即可,然后后续的git commit都会自动触发lint检查. |
这个测试挂了是有原因的,inline comment里讲解 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice. 很接近了
@@ -761,7 +761,7 @@ def __init__(self, config: AlbertConfig): | |||
[config.vocab_size], is_bias=True, default_initializer=nn.initializer.Constant(value=0) | |||
) | |||
self.dense = nn.Linear(config.hidden_size, config.embedding_size) | |||
self.decoder = nn.Linear(config.embedding_size, config.vocab_size) | |||
self.decoder = nn.Linear(config.vocab_size, config.embedding_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里(以及所有其他地方)是不能直接更改decoder这里的linear形状的。在tie_weights=True的情况下,这么改是可以的,因为decoder.weight会被覆盖,所以你定义的时候形状是无关的。但是在tie_weights=False的情况下,你这么做就改变了self.decoder的矩阵形状,会导致预训练模型加载这个因矩阵大小不对而对不齐.
这里需要回复原状,然后可以在后续调用的时候,使用self.decoder.weight.T
这种操作
@@ -771,7 +771,7 @@ def forward(self, hidden_states): | |||
hidden_states = self.dense(hidden_states) | |||
hidden_states = self.activation(hidden_states) | |||
hidden_states = self.layer_norm(hidden_states) | |||
hidden_states = self.decoder(hidden_states) | |||
hidden_states = paddle.matmul(hidden_states, self.decoder.weight, transpose_y=True) + self.bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eg. 这里可以使用self.decoder.weight.T 试试
@@ -1982,48 +1982,6 @@ def init_weights(self): | |||
""" | |||
# Initialize weights | |||
self.apply(self._init_weights) | |||
# Tie weights if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一块因为之前的代码不work, 还是维持原状
403e096
to
bcc5d9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要处理我这里提出的问题,才能通过单测
@sijunhe |
确实是这个问题,我和别的paddlenlp同学商量一下,稍晚几天给你答复哈 |
好的.谢谢 辛苦了. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hello, 我们这边讨论一致了。
为了平衡在没有tie_weights的情况下还是保持原来的组网不变,能够加载原先的模型,可以尝试一下方案(以albert为例)
定义decoder的时候
# tie_weights() will tie decoder weight with input embeddings
if config.tie_word_embeddings:
self.decoder = nn.Linear(config.vocab_size, config.embedding_size)
# use legacy decoder shape in order to load pretrained weights
else:
self.decoder = nn.Linear(config.embedding_size, config.vocab_size)
decoder forward的时候
if config.tie_word_embeddings:
hidden_states = paddle.matmul(hidden_states, self.decoder.weight, transpose_y=True) + self.bias
else:
hidden_states = self.decoder(hidden_states)
好的, 这边重新按照提供的建议重新修改一下 thanks |
@sijunhe 你好 按照给的建议进行了修改 但是单侧依旧没有通过, 这边做了如下的分析: 未通过单例可能的原因是 albert模型默认的config.tie_word_embeddings 参数是True 那么 output_embedding的形状就是 (config.vocab_size, config.embedding_size) 但是单侧中 所以形状上又冲突了 |
你好,您的排查信息我已收到。明天我和负责prompt的同学商量一下回复你~ |
hello, 这个修改涉及到prompt模型的很多东西,学习成本较高,所以我们先直接注释掉这个test case,后续我们团队内部再来改 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
@sijunhe 你好, 目前albert模型中加上了这个逻辑:
但是其他模型 比如 bigbird,Roberta ,Fnet,Funnel 还未进行添加 |
后续还需要一定的clean-up, 比较繁琐,就由我们团队内部来完成了。这个任务已经算您完成了。您如果感兴趣,欢迎您做104号任务(生成式API对齐HF,包括sample和contrastive_search) |
PR types
New features
PR changes
APIs
Description
【PaddlePaddle Hackathon 4】 [103] 新增tie_weights能力 提交代码和单侧,具体RFC见文档