Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

输出不理想~ #12

Closed
frog-game opened this issue May 8, 2023 · 10 comments
Closed

输出不理想~ #12

frog-game opened this issue May 8, 2023 · 10 comments

Comments

@frog-game
Copy link

这是我写的调用脚本

from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM
import sys

import torch

device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")


model_path = "/root/pandallm/vicuna-7b"
model =  LlamaForCausalLM.from_pretrained(model_path, device_map='auto', low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(model_path)
print("Human:")
line = input()
while line:
        inputs = 'Human: ' + line.strip() + '\n\nAssistant:'
        input_ids = tokenizer(inputs, return_tensors="pt").input_ids
        input_ids = input_ids.to(device)
        outputs = model.generate(input_ids, max_new_tokens=500, do_sample = True, top_k = 30, top_p = 0.85, temperature = 0.5, repetition_penalty=1., eos_token_id=2, bos_token_id=1, pad_token_id=0)
        rets = tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False)
        print("Assistant:\n" + rets[0].strip()[len(inputs) + len(line):])
        print("\n------------------------------------------------\nHuman:")
        line = input()

这是我把原始7b模型权重转换以后的llama-7b-hf

image

这是我将llam-7b-hf和你的https://huggingface.co/chitanda/llama-panda-zh-7b-delta差异模型合并以后的vicuna
image

这是结果,输出不理想
image

@frog-game frog-game changed the title 完全在胡言乱语~ 输出不理想~ May 8, 2023
@SparkJiao
Copy link
Collaborator

  1. inputs里面不需要加Human:或者Assistant:前缀,直接提问即可
  2. 7B的模型 要么greedy search要么beam search 不要加温度系数也不要采样,如果你想要多样性的回答可以用beam search取top-3
  3. 从你的例子来看主要有两个问题,事实性问题和无法停止,目前这两个问题在LLaMA系列里都很常见,即使是英文模型,后续可以在13B模型上试试

@frog-game
Copy link
Author

好的感谢up主回答~
我把温度系数和采样删除试试

@SparkJiao
Copy link
Collaborator

另外你这个去除输入部分的代码不太对 len(inputs) + len(line)把line算了两遍?

你直接把inputs从batch decode的结果里replace掉不行吗

outputs = [seq.replace(self.tokenizer.decode(input_ids[i], skip_special_tokens=True), "")
                                     for i, seq in enumerate(rets)]

@frog-game
Copy link
Author

这个是因为返回把前面的的iput也加进去了,我临时做得切片输出 比如你输入来首李白的诗句,然后他就会把李白的诗句也在输出的最前面输出出来

@SparkJiao
Copy link
Collaborator

SparkJiao commented May 8, 2023

我知道啊 因为这是decoder-only 我的意思是你去掉输入时候的切片为什么用rets[len(inputs) + len(line):] 而不是 rets[len(inputs):]

@frog-game
Copy link
Author

我知道啊 因为这是decoder-only 我的意思是你去掉输入时候的切片为什么用rets[len(inputs) + len(line):] 而不是 rets[len(inputs):]

我试试哈,懂up主的意思了~

@frog-game
Copy link
Author

我知道啊 因为这是decoder-only 我的意思是你去掉输入时候的切片为什么用rets[len(inputs) + len(line):] 而不是 rets[len(inputs):]

因为 那个line 也带了~

@frog-game
Copy link
Author

我知道啊 因为这是decoder-only 我的意思是你去掉输入时候的切片为什么用rets[len(inputs) + len(line):] 而不是 rets[len(inputs):]

就是 有两段不需要的长度字符输出,一个是inputs的内容,一个是line的内容

@SparkJiao
Copy link
Collaborator

Reopen this if need further help.

@frog-game
Copy link
Author

import torch
import sys
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM
import os
from accelerate import infer_auto_device_map

model_path = "/root/vicuna-7b"
model = LlamaForCausalLM.from_pretrained(
    model_path, device_map='auto', low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(model_path)

print('Human:')
line = input().strip()

while line:
    batch = tokenizer(line, return_tensors="pt")
    print('\n\nAssistant:' + tokenizer.decode(model.generate(batch["input_ids"].cuda(
    ), do_sample=True, top_k=50, max_length=100, top_p=0.95, temperature=1.0)[0]))
    print("\n------------------------------------------------\nHuman:")
    line = input().strip()

image

还是不理想

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants