Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于其他模型的权重转换为RoFormer模型 #2

Closed
WENGSYX opened this issue Apr 21, 2021 · 6 comments
Closed

关于其他模型的权重转换为RoFormer模型 #2

WENGSYX opened this issue Apr 21, 2021 · 6 comments

Comments

@WENGSYX
Copy link

WENGSYX commented Apr 21, 2021

您好,我正在进行关于长文本的模型训练,但是由于原版RoFormer模型过小,效果不佳,我想尝试large版RoFormer。
由于没有相关large模型,我想将开源的'hfl/chinese-macbert-large'权重转换为RoFormer模型,以尝试长文本训练。
苏神将绝对位置编码替换为RoPE的WoBERT模型转换为RoFormer,因此我通过相同的代码(https://github.com/ZhuiyiTechnology/roformer/blob/main/train.py)
bert = build_transformer_model(
config_path,
checkpoint_path=None,
model='roformer',
with_mlm='linear',
ignore_invalid_weights=True,
return_keras_model=False
)
model = bert.model
y_in = keras.layers.Input(shape=(None,), name='Input-Label')
outputs = CrossEntropy(1)([y_in, model.output])
train_model = keras.models.Model(model.inputs + [y_in], outputs)
AdamW = extend_with_weight_decay(Adam, name='AdamW')
AdamWLR = extend_with_piecewise_linear_lr(AdamW, name='AdamWLR')
AdamWLRG = extend_with_gradient_accumulation(AdamWLR, name='AdamWLRG')
optimizer = AdamWLRG(
learning_rate=1e-5,
weight_decay_rate=0.01,
exclude_from_weight_decay=['Norm', 'bias'],
grad_accum_steps=4,
lr_schedule={20000: 1}
)
train_model.compile(optimizer=optimizer)
train_model.summary()
bert.load_weights_from_checkpoint(checkpoint_path)
model.save_weights('romac/bert_model.weights')

转换了一个macbert版本的tf权重,然后想要通过您的convert_roformer_original_tf_checkpoint_to_pytorch.py将这个权重转换为pytorch版本,可却会报错,是因为我转换的权重有问题吗,还是说无法直接转换权重?

convert_tf_checkpoint_to_pytorch('romac/bert_model.weights', 'romac/bert_config.json','romac/1')
报错:
Traceback (most recent call last):
File "C:\Users\14301\miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3427, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 24, in
romac/1')
File "", line 16, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_roformer(model, config, tf_checkpoint_path)
File "C:\Users\14301\miniconda3\lib\site-packages\roformer\modeling_roformer.py", line 115, in load_tf_weights_in_roformer
pointer.shape == array.shape
File "C:\Users\14301\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 948, in getattr
type(self).name, name))
AttributeError: 'RoFormerForPreTraining' object has no attribute 'shape'

@JunnYu
Copy link
Owner

JunnYu commented Apr 22, 2021

@WENGSYX

直接将hfl的tf版本的chinese-macbert-large权重转化为roformer的

python convert_roformer_original_tf_checkpoint_to_pytorch.py \
    --tf_checkpoint_path=xxxx/chinese-macbert-large/chinese_macbert_large.ckpt \
    --roformer_config_file=xxxx/chinese-macbert-large/macbert_large_config.json \
    --pytorch_dump_path=xxxx/chinese-macbert-large/pytorch_model.bin 

或者将hfl的pt版本的hfl/chinese-macbert-large的权重转化为roformer的

import torch
from collections import OrderedDict
DICT = OrderedDict()
# 手动下载hfl/chinese-macbert-large
state_dict = torch.load("chinese-macbert-large/pytorch_model.bin")
for k, v in state_dict.items():
    #不需要权重里面的position_ids和position_embeddings
    if "position_ids" in k or "position_embeddings" in k:
        continue
    # 修改名字
    DICT[k.replace("bert", "roformer")] = v
torch.save(DICT, "romac/pytorch_model.bin")

config.json

{
  "architectures": [
    "RoFormerForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "embedding_size":1024,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-12,
  "model_type": "roformer",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "type_vocab_size": 2,
  "vocab_size": 21128
}

@JunnYu JunnYu closed this as completed Apr 24, 2021
@renjunxiang
Copy link

renjunxiang commented May 26, 2021

你好,我转base没问题,转large报错了。

RuntimeError: Error(s) in loading state_dict for RoFormerModel:
	size mismatch for roformer.embeddings.word_embeddings.weight: copying a param with shape torch.Size([21128, 1024]) from checkpoint, the shape in current model is torch.Size([21128, 768]).
	size mismatch for roformer.embeddings.token_type_embeddings.weight: copying a param with shape torch.Size([2, 1024]) from checkpoint, the shape in current model is torch.Size([2, 768]).
	size mismatch for roformer.embeddings.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for roformer.embeddings.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).

看了下modeling_roformer.py,好像embedding_size默认768了,这个应该是改为config不存embedding_size,就套用hidden_size吧?还是在config.json写入embedding_size:1024?

@renjunxiang
Copy link

你好,我转base没问题,转large报错了。

RuntimeError: Error(s) in loading state_dict for RoFormerModel:
	size mismatch for roformer.embeddings.word_embeddings.weight: copying a param with shape torch.Size([21128, 1024]) from checkpoint, the shape in current model is torch.Size([21128, 768]).
	size mismatch for roformer.embeddings.token_type_embeddings.weight: copying a param with shape torch.Size([2, 1024]) from checkpoint, the shape in current model is torch.Size([2, 768]).
	size mismatch for roformer.embeddings.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for roformer.embeddings.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).

看了下modeling_roformer.py,好像embedding_size默认768了,这个应该是改为config不存embedding_size,就套用hidden_size吧?还是在config.json写入embedding_size:1024?

config.json我加了embedding_size就没问题了

@JunnYu
Copy link
Owner

JunnYu commented May 26, 2021

@renjunxiang 我加embedding_size主要是为了支持类似于electra,albert那种的,其中embedding_size与hidden_size可以不同。
bert,roformer中的embedding_size与hidden_size一般是一样的。

@renjunxiang
Copy link

@renjunxiang 我加embedding_size主要是为了支持类似于electra,albert那种的,其中embedding_size与hidden_size可以不同。
bert,roformer中的embedding_size与hidden_size一般是一样的。

感谢回答!应该是这里用来支持类似于electra,albert的对吧。

        if config.embedding_size != config.hidden_size:
            self.embeddings_project = nn.Linear(config.embedding_size, config.hidden_size)

@JunnYu
Copy link
Owner

JunnYu commented May 26, 2021

@renjunxiang 是的,两者相同时就与原版bert一样,两者不同时就与albert,electra一样,多了个embedding project

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants