Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BiLLa-7B-LLM生成文本问题 #16

Open
chk4991 opened this issue May 23, 2023 · 5 comments
Open

BiLLa-7B-LLM生成文本问题 #16

chk4991 opened this issue May 23, 2023 · 5 comments

Comments

@chk4991
Copy link

chk4991 commented May 23, 2023

用BiLLA-7b-LLM,根据issues# 8, 采用commit id=887dd5e259104ed6fe7816cd0c0997ab68bbb94e版本的pytorch_model-33-of-33.bin替换原来的权重,并且没有用embedding_convery.py。


测试代码如下
from transformers import LlamaForCausalLM, LlamaTokenizer
CKPT = 'BiLLa-7B-LLM'
DEVICE = 'cuda:0'
tokenizer = LlamaTokenizer.from_pretrained(CKPT, add_special_tokens=True)
model = LlamaForCausalLM.from_pretrained(CKPT).to(DEVICE)

prompts = ["我看见一群人走在大", "今天是个阳光明媚的", "这件事情的发展出乎意"]

for prompt in prompts:

input_ids = tokenizer(prompt, return_tensors='pt').input_ids.to(DEVICE)

output_ids = model.generate(input_ids, max_new_tokens=100)[0]

out_text = tokenizer.decode(output_ids)

print(out_text) 

生成结果如下:
<s> 我看见一群人走在大马路可以指:</s>
<s> 今天是个阳光明媚的日子,可以指:</s>
<s> 这件事情的发展出乎意料 </s>


请问这个结果是正常的吗

@lucasjinreal
Copy link

不太正常

@Neutralzz
Copy link
Owner

确实看起来有点奇怪。我得到周日才能抽出时间来再验证一下。

@lucasjinreal
Copy link

不过我也发现这个LLM,非sft,已转换embedding,生成结果不太对

总是会输出:

可以指:

或者输入Hi, Hello,返回空,或者返回world的情形。

楼主麻烦check一下是不是传错了模型。。。。

@Neutralzz
Copy link
Owner

上传的模型应该是对的,但观察输出结果时,看到的现象跟你们一致。

我这边对比了原始的模型文件和上传的模型文件,确认两边参数是一致的。同时计算了两个模型的perplexity,跟README里结果也是一致的(甚至更低一点)。

目前推测这种现象跟模型训练方式有关:

  • 训练语言建模时是Conditional Generation,见代码
  • 任务数据中多数是强依赖于prompt信息的数据,只有WebQA和翻译是prompt相对短的数据。

所以,大概率可以通过提升prompt长度(至100~200字)来避免该现象的出现。

@luohao123
Copy link

我这边训的好几个LORA基本上都是崩溃的,根本无法正常输出,训练数据就sft常用的coig alpaca_zn等数据,都是较长的输出。
基本可以确定这个与训练模型,比较难训练出一个可用的lora了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants