【PaddlePaddle Hackathon 4】[103] 新增tie_weights能力代码和单元测试 #5193

qiuwenbogdut · 2023-03-11T00:06:56Z

PR types

New features

PR changes

APIs

Description

【PaddlePaddle Hackathon 4】 [103] 新增tie_weights能力提交代码和单侧,具体RFC见文档

paddle-bot · 2023-03-11T00:07:00Z

Thanks for your contribution!

codecov · 2023-03-11T15:31:08Z

Codecov Report

Merging #5193 (681f199) into develop (e307c96) will decrease coverage by 1.43%.
The diff coverage is 92.00%.

❗ Current head 681f199 differs from pull request most recent head c00fcbd. Consider uploading reports for the commit c00fcbd to get more accurate results

@@             Coverage Diff             @@
##           develop    #5193      +/-   ##
===========================================
- Coverage    53.39%   51.96%   -1.43%     
===========================================
  Files          476      469       -7     
  Lines        67568    66751     -817     
===========================================
- Hits         36078    34688    -1390     
- Misses       31490    32063     +573

Impacted Files	Coverage Δ
paddlenlp/transformers/model_utils.py	`79.13% <89.47%> (+3.92%)`	⬆️
paddlenlp/transformers/codegen/modeling.py	`89.43% <100.00%> (+0.40%)`	⬆️
paddlenlp/transformers/roberta/modeling.py	`92.66% <100.00%> (+0.58%)`	⬆️

... and 42 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

gongel · 2023-03-13T04:33:44Z

收到，感谢提交PR，我们尽快review

sijunhe

很不错的开始！

docs/community/rfcs/20230304_api_design_for_tie_weight_task_103.md

sijunhe · 2023-03-13T05:57:56Z

docs/community/rfcs/api_design_template.md

@@ -13,6 +13,7 @@
 # 一、概述


same as above

好的,下次提交进行删除

sijunhe · 2023-03-13T06:03:06Z

paddlenlp/transformers/model_utils.py

+        return None
+
+        # raise NotImplementedError(
+        #     f"model of {type(base_model)} has not implemented the `get_input_embeddings`"
+        #     " or `set_input_embeddings` method"
+        # )


这个如果没有implement, 还是需要raise error的

paddlenlp/transformers/model_utils.py

sijunhe · 2023-03-13T06:10:24Z

tests/transformers/test_modeling_common.py

+
+            if hasattr(model, 'get_input_embeddings') and hasattr(model, 'get_output_embeddings'):
+                input_embeddings = model.get_input_embeddings()
+                output_embeddings = model.get_input_embeddings()


get_output_embeddings?

tests/transformers/test_modeling_common.py

sijunhe · 2023-03-13T06:19:27Z

paddlenlp/transformers/model_utils.py

+        # add
+        model.tie_weights()


这个和tie_weights没什么关系吧？

sijunhe · 2023-03-13T06:19:44Z

paddlenlp/transformers/model_utils.py

+        # add
+        model.tie_weights()
+


这个和tie_weights没什么关系吧？

sijunhe · 2023-03-13T06:23:16Z

paddlenlp/transformers/model_utils.py

+        if hasattr(self, "get_output_embeddings") and hasattr(self, "get_input_embeddings") and tie_word_embeddings:
+            output_embeddings = self.get_output_embeddings()
+            if output_embeddings is not None and output_embeddings is not None:
+                self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())


当前这样实现的话，是不是就不需要set_input/output_embeddings这个API了？

qiuwenbogdut · 2023-03-15T06:26:45Z

@sijunhe 针对前面提到的一些问题都进行了修改和提交.

但目前还有一个问题未进行修改, 还有些疑惑:
针对这个问题,paddlenlp/transformers/model_utils.py中get_output_embeding() 方法如果没有被子类重写就抛出异常 raise Error,

目前很多模型类中都没有重写get_output_embeding()方法, 单侧的时候, tie_weight() 会报错不通过
huggingface 的 transformers 中的model_utils.py 的get_output_emneding() 方法中直接返回None , 如果需要使用到tie_weight.py的地方都会重写get_output_embeding方法.

我们是否也参照hf中的方式来实现?

sijunhe · 2023-03-15T08:22:43Z

@sijunhe 针对前面提到的一些问题都进行了修改和提交.

但目前还有一个问题未进行修改, 还有些疑惑: 针对这个问题,paddlenlp/transformers/model_utils.py中get_output_embeding() 方法如果没有被子类重写就抛出异常 raise Error,

目前很多模型类中都没有重写get_output_embeding()方法, 单侧的时候, tie_weight() 会报错不通过

huggingface 的 transformers 中的model_utils.py 的get_output_emneding() 方法中直接返回None , 如果需要使用到tie_weight.py的地方都会重写get_output_embeding方法.

我们是否也参照hf中的方式来实现?

可以通过我上面提出的加一个开关test_tie_weights, defaut是False. 仅有test_tie_weights=True的时候才跑这个测试的方法来解决。对于我们已经实现了tie_weights的模型，可以设成True. 如果模型本身没有实现get_output_embeddings, 那么单测这里关闭test_tie_weights, 同时如果后续有人调用get_output_embeddings, 就会报unimplemented error，清楚明白

qiuwenbogdut · 2023-03-18T23:44:10Z

@sijunhe 你好按照之前给出的建议,重新实现了一遍,
代码实现的内容如下:

在model_utils.py 文件中的基类中实现了tie_weight 函数, 所有模型都可以继承
在test_modeling_common.py 文件中实现了对tie weight 操作有效性的单例测试脚本, 通过参数”test_tie_weights” 来控制模型是否需要进行tie weight的单例测试
对之前已经实现了tie_weight功能的的模型 reformer, convbert, electra 进行了测试, 但都没有通过tie_weight 的单例测试,原因如下:
1. reformer : 通过set_values函数来进行赋值, 只是值传递, 并未真正绑定
2. convbert: 未找到对应的测试文件,不能进行tie_weight通过测试
3. electra: 没有重写get_output_embeddings 方法 tie_weight 也不能进行tie_weight测试
将之前的模型robota 增加tie_weight()的调用, 并通过了测试, 证明了绑定的有效性

目前模型调用tie_weight()方法进行input_embeding和output_embeding 的绑定需要满足如下几个条件:
(1). 模型有重写get_output_embedings() 方法
(2) get_output_embedings() 方法返回的对象的类型是 Layer对象 ,这个对象包含了weight属性
(3) get_output_embedings() 方法返回的对象AA 和 get_input_embedings() 方法返回的对象BB ,AA.weight的形状和BB.weight的形状是一致的.

sijunhe

@qiuwenbogdut 你好, 我review了你的实现，做的非常不错！

对于没有通过的模型，除了没实现单测的，可以稍微改变一下模型代码来满足tie_weights的要求

tests/transformers/test_modeling_common.py

qiuwenbogdut · 2023-03-20T15:11:35Z

@sijunhe 你好按照建议继续做了如下修改:

添加上了只检查以CausalLM或者MaskedLM结尾的模型的逻辑
reformer 模型的output_embeding 的形状设置和input_embeding形状对不上, 所以无法进行绑定, 如果强行改成形状一致, 可能会影响模型训练. 所以目前没有进行修改
electra 仔细看了一下electra的模型文件,其通过直接传一个input_embeding对象到output_emdding, 这种共用同一权重对象,是绑定成功的

sijunhe

这个PR已经接近ready, 还需要一下几点:

需要您通过我们的lint(代码风格监测）, 具体的风格错误您可以点击我们ci中的lint的details看到，如果您使用了我们的pre-commit, 应该可以每次git commit的时候自动监测
需要全量通过test这个ci. 目前除了reformer以外，还有一些模型(blip, chinese_clip, clip, dpt, ernie_vil)因为get_output_embeddings报错。这里我想了一下，base model里的get_output_embeddings确实可以回退回之前返回None的方案，而不用raise UnimplementedError
除了已经通过测试了的roberta和electra以外，我们还需要检查剩下的模型。可以在paddlenlp/transformers路径下搜索MaskedLM和CausalLM, 测试所有支持这两个任务的模型，并且统一tie_weights的实现方法。如果可以支持+测试通过，就打开flag。改动太大+测试不通过，就关闭flag同时注释（类似reformer)

sijunhe · 2023-03-21T03:15:59Z

tests/transformers/reformer/test_modeling.py

@@ -640,6 +641,7 @@ class ReformerLSHAttnModelTest(ReformerTesterMixin, ModelTesterMixin, unittest.T
    test_pruning = False
    test_headmasking = False
    test_torchscript = False
+    test_tie_weights = True


Suggested change

test_tie_weights = True

test_tie_weights = False # reformer tie_weights implementation is problematic for now

qiuwenbogdut · 2023-03-23T05:35:18Z

@sijunhe 还有两个疑问请教一下:

(1)代码风格测试,只显示了错误,但是没有告诉是哪一行有什么办法定位到是哪一行的问题吗?

(2) 最后一个不通过的单测试涉及的脚本我没有进行修改, 我本地运行也没有报错.希望可以给些修改建议 thanks

sijunhe · 2023-03-23T07:32:11Z

(1)代码风格测试,只显示了错误,但是没有告诉是哪一行有什么办法定位到是哪一行的问题吗?

这里可以安装我们的pre-commit工具，在根目录运行make install即可，然后后续的git commit都会自动触发lint检查.
如果觉得太麻烦的话，你这里fail的是black. 所以pip install black以后，运行black <my/file/name>即可

sijunhe · 2023-03-23T07:35:06Z

(2) 最后一个不通过的单测试涉及的脚本我没有进行修改, 我本地运行也没有报错.希望可以给些修改建议 thanks

这个测试挂了是有原因的，inline comment里讲解

sijunhe

very nice. 很接近了

sijunhe · 2023-03-23T07:38:30Z

paddlenlp/transformers/albert/modeling.py

@@ -761,7 +761,7 @@ def __init__(self, config: AlbertConfig):
            [config.vocab_size], is_bias=True, default_initializer=nn.initializer.Constant(value=0)
        )
        self.dense = nn.Linear(config.hidden_size, config.embedding_size)
-        self.decoder = nn.Linear(config.embedding_size, config.vocab_size)
+        self.decoder = nn.Linear(config.vocab_size, config.embedding_size)


这里（以及所有其他地方）是不能直接更改decoder这里的linear形状的。在tie_weights=True的情况下，这么改是可以的，因为decoder.weight会被覆盖，所以你定义的时候形状是无关的。但是在tie_weights=False的情况下，你这么做就改变了self.decoder的矩阵形状，会导致预训练模型加载这个因矩阵大小不对而对不齐.

这里需要回复原状，然后可以在后续调用的时候，使用self.decoder.weight.T这种操作

sijunhe · 2023-03-23T07:39:01Z

paddlenlp/transformers/albert/modeling.py

@@ -771,7 +771,7 @@ def forward(self, hidden_states):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.activation(hidden_states)
        hidden_states = self.layer_norm(hidden_states)
-        hidden_states = self.decoder(hidden_states)
+        hidden_states = paddle.matmul(hidden_states, self.decoder.weight, transpose_y=True) + self.bias


eg. 这里可以使用self.decoder.weight.T 试试

tests/transformers/test_modeling_common.py

sijunhe · 2023-03-23T07:40:51Z

paddlenlp/transformers/reformer/modeling.py

@@ -1982,48 +1982,6 @@ def init_weights(self):
        """
        # Initialize weights
        self.apply(self._init_weights)
-        # Tie weights if needed


这一块因为之前的代码不work, 还是维持原状

sijunhe

需要处理我这里提出的问题，才能通过单测

qiuwenbogdut · 2023-03-24T12:02:25Z

@sijunhe
因为之前tie_weight() 函数中要求 input_embding 和 out_embding 形状一致才进行绑定如果形状不一致就不进行绑定.
如果这里改回去之前的形状: nn.Linear(config.embedding_size, config.vocab_size) 这样就无法进行绑定了.

sijunhe · 2023-03-27T09:27:57Z

@sijunhe 因为之前tie_weight() 函数中要求 input_embding 和 out_embding 形状一致才进行绑定如果形状不一致就不进行绑定. 如果这里改回去之前的形状: nn.Linear(config.embedding_size, config.vocab_size) 这样就无法进行绑定了.

确实是这个问题，我和别的paddlenlp同学商量一下，稍晚几天给你答复哈

qiuwenbogdut · 2023-03-27T10:11:03Z

是这个问题，我和别的paddlenlp同学商量一下，稍晚几天给你答复哈

好的.谢谢辛苦了.

sijunhe

hello, 我们这边讨论一致了。
为了平衡在没有tie_weights的情况下还是保持原来的组网不变，能够加载原先的模型，可以尝试一下方案（以albert为例)

定义decoder的时候

# tie_weights() will tie decoder weight with input embeddings
if config.tie_word_embeddings:
  self.decoder = nn.Linear(config.vocab_size, config.embedding_size)
# use legacy decoder shape in order to load pretrained weights
else:
  self.decoder = nn.Linear(config.embedding_size, config.vocab_size)

decoder forward的时候

if config.tie_word_embeddings:
  hidden_states = paddle.matmul(hidden_states, self.decoder.weight, transpose_y=True) + self.bias
else:
  hidden_states = self.decoder(hidden_states)

qiuwenbogdut · 2023-03-29T08:15:36Z

hello, 我们这边讨论一致了。为了平衡在没有tie_weights的情况下还是保持原来的组网不变，能够加载原先的模型，可以尝试一下方案（以albert为例)

定义decoder的时候
# tie_weights() will tie decoder weight with input embeddings
if config.tie_word_embeddings:
  self.decoder = nn.Linear(config.vocab_size, config.embedding_size)
# use legacy decoder shape in order to load pretrained weights
else:
  self.decoder = nn.Linear(config.embedding_size, config.vocab_size)
decoder forward的时候
if config.tie_word_embeddings:
  hidden_states = paddle.matmul(hidden_states, self.decoder.weight, transpose_y=True) + self.bias
else:
  hidden_states = self.decoder(hidden_states)

好的, 这边重新按照提供的建议重新修改一下 thanks

qiuwenbogdut · 2023-04-05T08:24:35Z

@sijunhe 你好按照给的建议进行了修改但是单侧依旧没有通过, 这边做了如下的分析:

未通过的单例报错如下图:

未通过单例可能的原因是 albert模型默认的config.tie_word_embeddings 参数是True 那么 output_embedding的形状就是 (config.vocab_size, config.embedding_size)

但是单侧中
paddlenlp/prompt/verbalizer.py:421: in _create_init_weight weight = paddle.index_select(weight, token_ids.reshape([-1]), axis=1).reshape(word_shape) 对albert 模型的output_embedding的权重值进行抽取,但是处理时将其形状当成 (config.embedding_size, config.vocab_size) 进行处理,

所以形状上又冲突了

sijunhe · 2023-04-05T09:13:56Z

@sijunhe 你好按照给的建议进行了修改但是单侧依旧没有通过, 这边做了如下的分析:

未通过的单例报错如下图:

未通过单例可能的原因是 albert模型默认的config.tie_word_embeddings 参数是True 那么 output_embedding的形状就是 (config.vocab_size, config.embedding_size)

但是单侧中 paddlenlp/prompt/verbalizer.py:421: in _create_init_weight weight = paddle.index_select(weight, token_ids.reshape([-1]), axis=1).reshape(word_shape) 对albert 模型的output_embedding的权重值进行抽取,但是处理时将其形状当成 (config.embedding_size, config.vocab_size) 进行处理,

所以形状上又冲突了

你好，您的排查信息我已收到。明天我和负责prompt的同学商量一下回复你~

sijunhe · 2023-04-06T03:13:55Z

@sijunhe 你好按照给的建议进行了修改但是单侧依旧没有通过, 这边做了如下的分析:

未通过的单例报错如下图:

未通过单例可能的原因是 albert模型默认的config.tie_word_embeddings 参数是True 那么 output_embedding的形状就是 (config.vocab_size, config.embedding_size)

但是单侧中 paddlenlp/prompt/verbalizer.py:421: in _create_init_weight weight = paddle.index_select(weight, token_ids.reshape([-1]), axis=1).reshape(word_shape) 对albert 模型的output_embedding的权重值进行抽取,但是处理时将其形状当成 (config.embedding_size, config.vocab_size) 进行处理,

所以形状上又冲突了

hello, 这个修改涉及到prompt模型的很多东西，学习成本较高，所以我们先直接注释掉这个test case，后续我们团队内部再来改

sijunhe

lgtm!

qiuwenbogdut · 2023-04-06T13:19:50Z

@sijunhe 你好, 目前albert模型中加上了这个逻辑:

# tie_weights() will tie decoder weight with input embeddings
if config.tie_word_embeddings:
  self.decoder = nn.Linear(config.vocab_size, config.embedding_size)
# use legacy decoder shape in order to load pretrained weights
else:
  self.decoder = nn.Linear(config.embedding_size, config.vocab_size)

但是其他模型比如 bigbird,Roberta ,Fnet,Funnel 还未进行添加
是否也需要加上这个逻辑?

sijunhe · 2023-04-06T14:01:39Z

但是其他模型比如 bigbird,Roberta ,Fnet,Funnel 还未进行添加是否也需要加上这个逻辑?

后续还需要一定的clean-up, 比较繁琐，就由我们团队内部来完成了。这个任务已经算您完成了。您如果感兴趣，欢迎您做104号任务(生成式API对齐HF，包括sample和contrastive_search)

qiuwenbogdut added 3 commits March 4, 2023 22:28

[103] 新增tie_weights能力提交rfc文档

9cb4f54

[103] 新增tie_weights能力提交rfc文档 v2

6929649

[103] 新增tie_weights 能力提交rfc文档 v3

a52af67

paddle-bot bot added contributor status: proposed labels Mar 11, 2023

qiuwenbogdut mentioned this pull request Mar 13, 2023

【PaddlePaddle Hackathon 第四期】任务总览 PaddlePaddle/Paddle#51281

Closed

chenxiaozeng requested review from sijunhe and gongel March 13, 2023 03:49

chenxiaozeng added the hackathon label Mar 13, 2023

sijunhe reviewed Mar 13, 2023

View reviewed changes

qiuwenbogdut force-pushed the dev_qiuwenbo branch from f71695b to 681f199 Compare March 15, 2023 05:36

qiuwenbogdut added 2 commits March 18, 2023 16:37

[code] 添加tie weight 函数以及单侧初版

2617183

[code] 完稿

0bb1520

qiuwenbogdut force-pushed the dev_qiuwenbo branch from 2235b24 to c5acf56 Compare March 18, 2023 13:33

[103] 实现tie_weight 增加单侧,.并测试之前已实现tie_weight方法模型的tie weight有效性

5d41677

qiuwenbogdut force-pushed the dev_qiuwenbo branch from c5acf56 to 5d41677 Compare March 18, 2023 15:23

qiuwenbogdut requested review from sijunhe and removed request for gongel March 20, 2023 02:22

sijunhe reviewed Mar 20, 2023

View reviewed changes

tests/transformers/test_modeling_common.py Show resolved Hide resolved

qiuwenbogdut added 2 commits March 20, 2023 23:06

[103] 添加上了只检查以CausalLM或者MaskedLM结尾的模型的逻辑,可以稍微改变一下模型代码来满足tie_weights的要求

247052c

Merge branch 'develop' into dev_qiuwenbo

5f1d7eb

sijunhe reviewed Mar 21, 2023

View reviewed changes

sijunhe mentioned this pull request Mar 21, 2023

Add blip2 models #5025

Merged

qiuwenbogdut force-pushed the dev_qiuwenbo branch from d68f5d2 to 51f1c2b Compare March 22, 2023 23:24

qiuwenbogdut requested a review from sijunhe March 23, 2023 06:44

sijunhe reviewed Mar 23, 2023

View reviewed changes

[103] 尝试解决black 问题

bcc5d9b

qiuwenbogdut force-pushed the dev_qiuwenbo branch from 403e096 to bcc5d9b Compare March 23, 2023 13:13

[103] 撤回对reformer模型修改

6e24e3d

qiuwenbogdut requested a review from sijunhe March 24, 2023 10:33

sijunhe reviewed Mar 24, 2023

View reviewed changes

qiuwenbogdut requested a review from sijunhe March 24, 2023 12:02

sijunhe reviewed Mar 29, 2023

View reviewed changes

qiuwenbogdut added 4 commits March 29, 2023 21:09

[103] 针对绑定和不绑定的两种情况的适配

5f3fde7

[103] test black

96bd8a9

[103] 修改bug

afa490e

[103] 测试

065b787

qiuwenbogdut requested a review from sijunhe April 5, 2023 08:24

[103] 修改promot的单侧

c00fcbd

sijunhe approved these changes Apr 6, 2023

View reviewed changes

sijunhe merged commit 019ce12 into PaddlePaddle:develop Apr 6, 2023
1 of 2 checks passed

	test_tie_weights = True
	test_tie_weights = False # reformer tie_weights implementation is problematic for now

【PaddlePaddle Hackathon 4】[103] 新增tie_weights能力 代码和单元测试 #5193

【PaddlePaddle Hackathon 4】[103] 新增tie_weights能力 代码和单元测试 #5193

Conversation

qiuwenbogdut commented Mar 11, 2023

PR types

PR changes

Description

paddle-bot bot commented Mar 11, 2023

codecov bot commented Mar 11, 2023 • edited

Codecov Report

gongel commented Mar 13, 2023

sijunhe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qiuwenbogdut commented Mar 15, 2023

sijunhe commented Mar 15, 2023

qiuwenbogdut commented Mar 18, 2023

sijunhe left a comment

Choose a reason for hiding this comment

qiuwenbogdut commented Mar 20, 2023

sijunhe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qiuwenbogdut commented Mar 23, 2023

sijunhe commented Mar 23, 2023

sijunhe commented Mar 23, 2023

sijunhe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sijunhe left a comment

Choose a reason for hiding this comment

qiuwenbogdut commented Mar 24, 2023

sijunhe commented Mar 27, 2023

qiuwenbogdut commented Mar 27, 2023

sijunhe left a comment

Choose a reason for hiding this comment

qiuwenbogdut commented Mar 29, 2023

qiuwenbogdut commented Apr 5, 2023

sijunhe commented Apr 5, 2023

sijunhe commented Apr 6, 2023

sijunhe left a comment

Choose a reason for hiding this comment

qiuwenbogdut commented Apr 6, 2023

sijunhe commented Apr 6, 2023

【PaddlePaddle Hackathon 4】[103] 新增tie_weights能力代码和单元测试 #5193

【PaddlePaddle Hackathon 4】[103] 新增tie_weights能力代码和单元测试 #5193

codecov bot commented Mar 11, 2023 •

edited