Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于transformers的一些问题 #14

Closed
yclzju opened this issue Sep 8, 2021 · 9 comments
Closed

关于transformers的一些问题 #14

yclzju opened this issue Sep 8, 2021 · 9 comments

Comments

@yclzju
Copy link

yclzju commented Sep 8, 2021

嗨,想问一下,现在使用transformers调用roformer相关模型,和使用本代码库的,是完全一样吗?
我想调用roformer-sim相关模型,是选用什么接口呀? RoFormerForMaskedLM吗?
我使用transformers RoFormerForMaskedLM调用之后,发现有一部分参数并没有被load(应该是pooler相关);测试的时候,直接拿来用是很不错的;但如果想拿来作为底座训练,发现loss降不下来(同样代码roberta是可以正常训练的),想知道是不是因为没有加pooler的原因。在你的example中没找到关于roformer-sim相关的例子,大佬有空的时候帮忙解答下哈,多谢啦!

@JunnYu
Copy link
Owner

JunnYu commented Sep 8, 2021

@yclzju huggingface仓库的roformer没有加pooling。这个仓库的代码加了pooling,可以直接加载

https://github.com/JunnYu/RoFormer_pytorch/blob/f0ca803094eab5dacb3f657d93b629d81a0981c0/src/roformer/modeling_roformer.py#L1086。 这里加了pooling了。

@JunnYu
Copy link
Owner

JunnYu commented Sep 8, 2021

安装本仓库的代码,pip install roformer==0.2.1

import torch
from roformer import RoFormerModel
model = RoFormerModel.from_pretrained("junnyu/roformer_chinese_sim_char_small",add_pooling_layer=True)
model.eval()
x = torch.tensor([[5,6,7,8,9]])
output = model(x)
print(output.pooler_output.shape)
# torch.Size([1, 384])

@yclzju
Copy link
Author

yclzju commented Sep 8, 2021

嗯现在可以了
我原来是transformer 加载模型,然后自己加了一个cls pool操作,如果是用来测试,在sts任务上感觉是不错的;但是用来训练的话,训练loss后面降不下去;现在换成你给的例子,又可以了

@yclzju
Copy link
Author

yclzju commented Sep 8, 2021

安装本仓库的代码,pip install roformer==0.2.1

import torch
from roformer import RoFormerModel
model = RoFormerModel.from_pretrained("junnyu/roformer_chinese_sim_char_small",add_pooling_layer=True)
model.eval()
x = torch.tensor([[5,6,7,8,9]])
output = model(x)
print(output.pooler_output.shape)
# torch.Size([1, 384])

这里为什么是用RoFormerModel 而不是RoFormerForMaskedLM 呢

@JunnYu
Copy link
Owner

JunnYu commented Sep 8, 2021

self.roformer = RoFormerModel(config, add_pooling_layer=False)

这里maskedlm没有加adding pooler,如果需要的话,自己可以改一下原来代码,然后输出的时候return_dict=Fasle,就可以得到pooler的结果了

@yclzju
Copy link
Author

yclzju commented Sep 14, 2021

嗨,感谢大佬的回答
你有计划在transformers里也加pooler吗?或者有办法比较快地从transformers的模型中将pooler参数加上吗?

@yclzju
Copy link
Author

yclzju commented Sep 14, 2021

对了,transformers上的代码大部分是不是和你这里的一致,如RoFormerSelfAttention,我只需要在transformers基础上修改RoformerModel等几个类应该就行了吧

@JunnYu
Copy link
Owner

JunnYu commented Sep 14, 2021

(1)transformers库的代码跟这里的基本一致,只是缺少了pooler部分。
(2)没有什么好办法,最好的办法是使用本仓库的代码(本仓库的代码添加了pooler部分),然后调用就行了。

@JunnYu
Copy link
Owner

JunnYu commented Apr 2, 2022

如果想要生成结果的话,下载这下面的roformer就可以生成结果了。

roformer.zip

import torch
import numpy as np
from roformer import RoFormerTokenizer, RoFormerForCausalLM, RoFormerConfig

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
pretrained_model = "junnyu/roformer_chinese_sim_char_base"
tokenizer = RoFormerTokenizer.from_pretrained(pretrained_model)
config = RoFormerConfig.from_pretrained(pretrained_model)
config.is_decoder = True
config.eos_token_id = tokenizer.sep_token_id
config.pooler_activation = "linear"
model = RoFormerForCausalLM.from_pretrained(pretrained_model, config=config)
model.to(device)
model.eval()

def gen_synonyms(text, n=100, k=20):
    ''''含义: 产生sent的n个相似句,然后返回最相似的k个。
    做法:用seq2seq生成,并用encoder算相似度并排序。
    '''
    # 寻找所有相似的句子
    r = []
    inputs1 = tokenizer(text, return_tensors="pt")
    for _ in range(n):
        inputs1.to(device)
        output = tokenizer.batch_decode(model.generate(**inputs1, top_p=0.95, do_sample=True, max_length=128), skip_special_tokens=True)[0].replace(" ","").replace(text, "") # 去除空格,去除原始text文本。
        r.append(output)
    
    # 对相似的句子进行排序
    r = [i for i in set(r) if i != text and len(i) > 0]
    r = [text] + r
    inputs2 = tokenizer(r, padding=True, return_tensors="pt")
    with torch.no_grad():
        inputs2.to(device)
        outputs = model(**inputs2)
        Z = outputs.pooler_output.cpu().numpy()
    Z /= (Z**2).sum(axis=1, keepdims=True)**0.5
    argsort = np.dot(Z[1:], -Z[0]).argsort()
    
    return [r[i + 1] for i in argsort[:k]]

out = gen_synonyms("广州和深圳哪个好?")
print(out)
# ['深圳和广州哪个好?',
#  '广州和深圳哪个好',
#  '深圳和广州哪个好',
#  '深圳和广州哪个比较好。',
#  '深圳和广州哪个最好?',
#  '深圳和广州哪个比较好',
#  '广州和深圳那个比较好',
#  '深圳和广州哪个更好?',
#  '深圳与广州哪个好',
#  '深圳和广州,哪个比较好',
#  '广州与深圳比较哪个好',
#  '深圳和广州哪里比较好',
#  '深圳还是广州比较好?',
#  '广州和深圳哪个地方好一些?',
#  '广州好还是深圳好?',
#  '广州好还是深圳好呢?',
#  '广州与深圳哪个地方好点?',
#  '深圳好还是广州好',
#  '广州好还是深圳好',
#  '广州和深圳哪个城市好?']

@JunnYu JunnYu closed this as completed Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants