关于其他模型的权重转换为RoFormer模型 #2

WENGSYX · 2021-04-21T17:56:54Z

您好，我正在进行关于长文本的模型训练，但是由于原版RoFormer模型过小，效果不佳，我想尝试large版RoFormer。
由于没有相关large模型，我想将开源的'hfl/chinese-macbert-large'权重转换为RoFormer模型，以尝试长文本训练。
苏神将绝对位置编码替换为RoPE的WoBERT模型转换为RoFormer，因此我通过相同的代码(https://github.com/ZhuiyiTechnology/roformer/blob/main/train.py)
bert = build_transformer_model(
config_path,
checkpoint_path=None,
model='roformer',
with_mlm='linear',
ignore_invalid_weights=True,
return_keras_model=False
)
model = bert.model
y_in = keras.layers.Input(shape=(None,), name='Input-Label')
outputs = CrossEntropy(1)([y_in, model.output])
train_model = keras.models.Model(model.inputs + [y_in], outputs)
AdamW = extend_with_weight_decay(Adam, name='AdamW')
AdamWLR = extend_with_piecewise_linear_lr(AdamW, name='AdamWLR')
AdamWLRG = extend_with_gradient_accumulation(AdamWLR, name='AdamWLRG')
optimizer = AdamWLRG(
learning_rate=1e-5,
weight_decay_rate=0.01,
exclude_from_weight_decay=['Norm', 'bias'],
grad_accum_steps=4,
lr_schedule={20000: 1}
)
train_model.compile(optimizer=optimizer)
train_model.summary()
bert.load_weights_from_checkpoint(checkpoint_path)
model.save_weights('romac/bert_model.weights')

转换了一个macbert版本的tf权重，然后想要通过您的convert_roformer_original_tf_checkpoint_to_pytorch.py将这个权重转换为pytorch版本，可却会报错，是因为我转换的权重有问题吗,还是说无法直接转换权重？

convert_tf_checkpoint_to_pytorch('romac/bert_model.weights', 'romac/bert_config.json','romac/1')
报错：
Traceback (most recent call last):
File "C:\Users\14301\miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3427, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 24, in
romac/1')
File "", line 16, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_roformer(model, config, tf_checkpoint_path)
File "C:\Users\14301\miniconda3\lib\site-packages\roformer\modeling_roformer.py", line 115, in load_tf_weights_in_roformer
pointer.shape == array.shape
File "C:\Users\14301\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 948, in getattr
type(self).name, name))
AttributeError: 'RoFormerForPreTraining' object has no attribute 'shape'

JunnYu · 2021-04-22T03:24:14Z

@WENGSYX

直接将hfl的tf版本的chinese-macbert-large权重转化为roformer的

python convert_roformer_original_tf_checkpoint_to_pytorch.py \
    --tf_checkpoint_path=xxxx/chinese-macbert-large/chinese_macbert_large.ckpt \
    --roformer_config_file=xxxx/chinese-macbert-large/macbert_large_config.json \
    --pytorch_dump_path=xxxx/chinese-macbert-large/pytorch_model.bin

或者将hfl的pt版本的hfl/chinese-macbert-large的权重转化为roformer的

import torch
from collections import OrderedDict
DICT = OrderedDict()
# 手动下载hfl/chinese-macbert-large
state_dict = torch.load("chinese-macbert-large/pytorch_model.bin")
for k, v in state_dict.items():
    #不需要权重里面的position_ids和position_embeddings
    if "position_ids" in k or "position_embeddings" in k:
        continue
    # 修改名字
    DICT[k.replace("bert", "roformer")] = v
torch.save(DICT, "romac/pytorch_model.bin")

config.json

{
  "architectures": [
    "RoFormerForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "embedding_size":1024,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-12,
  "model_type": "roformer",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "type_vocab_size": 2,
  "vocab_size": 21128
}

renjunxiang · 2021-05-26T06:35:51Z

你好，我转base没问题，转large报错了。

RuntimeError: Error(s) in loading state_dict for RoFormerModel:
	size mismatch for roformer.embeddings.word_embeddings.weight: copying a param with shape torch.Size([21128, 1024]) from checkpoint, the shape in current model is torch.Size([21128, 768]).
	size mismatch for roformer.embeddings.token_type_embeddings.weight: copying a param with shape torch.Size([2, 1024]) from checkpoint, the shape in current model is torch.Size([2, 768]).
	size mismatch for roformer.embeddings.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for roformer.embeddings.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).

看了下modeling_roformer.py，好像embedding_size默认768了，这个应该是改为config不存embedding_size，就套用hidden_size吧？还是在config.json写入embedding_size:1024？

renjunxiang · 2021-05-26T09:00:58Z

你好，我转base没问题，转large报错了。

RuntimeError: Error(s) in loading state_dict for RoFormerModel:
	size mismatch for roformer.embeddings.word_embeddings.weight: copying a param with shape torch.Size([21128, 1024]) from checkpoint, the shape in current model is torch.Size([21128, 768]).
	size mismatch for roformer.embeddings.token_type_embeddings.weight: copying a param with shape torch.Size([2, 1024]) from checkpoint, the shape in current model is torch.Size([2, 768]).
	size mismatch for roformer.embeddings.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for roformer.embeddings.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).

看了下modeling_roformer.py，好像embedding_size默认768了，这个应该是改为config不存embedding_size，就套用hidden_size吧？还是在config.json写入embedding_size:1024？

config.json我加了embedding_size就没问题了

JunnYu · 2021-05-26T11:44:50Z

@renjunxiang 我加embedding_size主要是为了支持类似于electra，albert那种的，其中embedding_size与hidden_size可以不同。
bert，roformer中的embedding_size与hidden_size一般是一样的。

renjunxiang · 2021-05-26T15:08:00Z

@renjunxiang 我加embedding_size主要是为了支持类似于electra，albert那种的，其中embedding_size与hidden_size可以不同。
bert，roformer中的embedding_size与hidden_size一般是一样的。

感谢回答！应该是这里用来支持类似于electra，albert的对吧。

        if config.embedding_size != config.hidden_size:
            self.embeddings_project = nn.Linear(config.embedding_size, config.hidden_size)

JunnYu · 2021-05-26T15:21:51Z

@renjunxiang 是的，两者相同时就与原版bert一样，两者不同时就与albert，electra一样，多了个embedding project

JunnYu closed this as completed Apr 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于其他模型的权重转换为RoFormer模型 #2

关于其他模型的权重转换为RoFormer模型 #2

WENGSYX commented Apr 21, 2021 •

edited

JunnYu commented Apr 22, 2021 •

edited

renjunxiang commented May 26, 2021 •

edited

renjunxiang commented May 26, 2021

JunnYu commented May 26, 2021

renjunxiang commented May 26, 2021

JunnYu commented May 26, 2021

关于其他模型的权重转换为RoFormer模型 #2

关于其他模型的权重转换为RoFormer模型 #2

Comments

WENGSYX commented Apr 21, 2021 • edited

JunnYu commented Apr 22, 2021 • edited

直接将hfl的tf版本的chinese-macbert-large权重转化为roformer的

或者将hfl的pt版本的hfl/chinese-macbert-large的权重转化为roformer的

renjunxiang commented May 26, 2021 • edited

renjunxiang commented May 26, 2021

JunnYu commented May 26, 2021

renjunxiang commented May 26, 2021

JunnYu commented May 26, 2021

WENGSYX commented Apr 21, 2021 •

edited

JunnYu commented Apr 22, 2021 •

edited

renjunxiang commented May 26, 2021 •

edited