New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于其他模型的权重转换为RoFormer模型 #2
Comments
直接将hfl的tf版本的chinese-macbert-large权重转化为roformer的python convert_roformer_original_tf_checkpoint_to_pytorch.py \
--tf_checkpoint_path=xxxx/chinese-macbert-large/chinese_macbert_large.ckpt \
--roformer_config_file=xxxx/chinese-macbert-large/macbert_large_config.json \
--pytorch_dump_path=xxxx/chinese-macbert-large/pytorch_model.bin 或者将hfl的pt版本的hfl/chinese-macbert-large的权重转化为roformer的import torch
from collections import OrderedDict
DICT = OrderedDict()
# 手动下载hfl/chinese-macbert-large
state_dict = torch.load("chinese-macbert-large/pytorch_model.bin")
for k, v in state_dict.items():
#不需要权重里面的position_ids和position_embeddings
if "position_ids" in k or "position_embeddings" in k:
continue
# 修改名字
DICT[k.replace("bert", "roformer")] = v
torch.save(DICT, "romac/pytorch_model.bin") config.json {
"architectures": [
"RoFormerForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"embedding_size":1024,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-12,
"model_type": "roformer",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 21128
} |
你好,我转base没问题,转large报错了。
看了下modeling_roformer.py,好像embedding_size默认768了,这个应该是改为config不存embedding_size,就套用hidden_size吧?还是在config.json写入embedding_size:1024? |
config.json我加了embedding_size就没问题了 |
@renjunxiang 我加embedding_size主要是为了支持类似于electra,albert那种的,其中embedding_size与hidden_size可以不同。 |
感谢回答!应该是这里用来支持类似于electra,albert的对吧。
|
@renjunxiang 是的,两者相同时就与原版bert一样,两者不同时就与albert,electra一样,多了个embedding project |
您好,我正在进行关于长文本的模型训练,但是由于原版RoFormer模型过小,效果不佳,我想尝试large版RoFormer。
由于没有相关large模型,我想将开源的'hfl/chinese-macbert-large'权重转换为RoFormer模型,以尝试长文本训练。
苏神将绝对位置编码替换为RoPE的WoBERT模型转换为RoFormer,因此我通过相同的代码(https://github.com/ZhuiyiTechnology/roformer/blob/main/train.py)
bert = build_transformer_model(
config_path,
checkpoint_path=None,
model='roformer',
with_mlm='linear',
ignore_invalid_weights=True,
return_keras_model=False
)
model = bert.model
y_in = keras.layers.Input(shape=(None,), name='Input-Label')
outputs = CrossEntropy(1)([y_in, model.output])
train_model = keras.models.Model(model.inputs + [y_in], outputs)
AdamW = extend_with_weight_decay(Adam, name='AdamW')
AdamWLR = extend_with_piecewise_linear_lr(AdamW, name='AdamWLR')
AdamWLRG = extend_with_gradient_accumulation(AdamWLR, name='AdamWLRG')
optimizer = AdamWLRG(
learning_rate=1e-5,
weight_decay_rate=0.01,
exclude_from_weight_decay=['Norm', 'bias'],
grad_accum_steps=4,
lr_schedule={20000: 1}
)
train_model.compile(optimizer=optimizer)
train_model.summary()
bert.load_weights_from_checkpoint(checkpoint_path)
model.save_weights('romac/bert_model.weights')
转换了一个macbert版本的tf权重,然后想要通过您的convert_roformer_original_tf_checkpoint_to_pytorch.py将这个权重转换为pytorch版本,可却会报错,是因为我转换的权重有问题吗,还是说无法直接转换权重?
convert_tf_checkpoint_to_pytorch('romac/bert_model.weights', 'romac/bert_config.json','romac/1')
报错:
Traceback (most recent call last):
File "C:\Users\14301\miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3427, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 24, in
romac/1')
File "", line 16, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_roformer(model, config, tf_checkpoint_path)
File "C:\Users\14301\miniconda3\lib\site-packages\roformer\modeling_roformer.py", line 115, in load_tf_weights_in_roformer
pointer.shape == array.shape
File "C:\Users\14301\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 948, in getattr
type(self).name, name))
AttributeError: 'RoFormerForPreTraining' object has no attribute 'shape'
The text was updated successfully, but these errors were encountered: