Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoChat2, 仅finetune stage3后,视频描述存在大量重复 #146

Open
ruishuzhao opened this issue Mar 14, 2024 · 3 comments
Open

Comments

@ruishuzhao
Copy link

作者好,
通过对videochat2中第三阶段的训练(训练集数据均来源于论文中提供的数据集),通过训练后模型对视频进行描述,会出现大量重复语句。
提问内容为:"Describe the following video clip in detail."
回答如:"The video clip shows a woman wearing a black shirt and black pants standing in a dark room. She is holding a white shirt and putting it into a washing machine. The woman is wearing a black shirt and black pants and is standing in a dark room. She is holding a white shirt and putting it into a washing machine. The woman is wearing a black shirt and black pants and is standing in a dark room. She is holding a white shirt and putting it into a washing machine. The woman is wearing a black shirt and black pants and is standing in a dark room. She is holding a white shirt and putting it into a washing machine. The woman is wearing a black shirt and black pants and is standing in a dark room. She is holding a white shirt and putting it into a washing machine. The woman is wearing a black shirt and black pants and is standing in a dark room. She is holding a white shirt and putting it into a washing machine. The woman is wearing a black shirt and black pants and is standing in a dark room. She is holding a white shirt and putting it into a"
其中反复出现“The woman is wearing a black shirt and black pants and is standing in a dark room. She is holding a white shirt and putting it into a washing machine. ”。

针对这种finetune后出现重复的问题,有比较好的解决方案吗?

辛苦帮忙解答一下。多谢~

@Andy1621
Copy link
Collaborator

这是一个不错的学术topic,因为在我们之前一些模型的推理上,也发现类似的问题,但具体是什么原因导致的不甚清楚。可以尝试修改不同的prompt,或者进行一些多轮对话提示,找一找原因或者解决方案

@ruishuzhao
Copy link
Author

这是一个不错的学术topic,因为在我们之前一些模型的推理上,也发现类似的问题,但具体是什么原因导致的不甚清楚。可以尝试修改不同的prompt,或者进行一些多轮对话提示,找一找原因或者解决方案

再次感谢~

@yinanhe
Copy link
Member

yinanhe commented Mar 14, 2024

这种复读机问题在LLM中就存在,而且也有一些相应的介绍和探索
对于VideoChat中,您可以尝试调整answer函数中的num_beams, do_sample, top_p, repetition_penalty, temperature. 其中调整 repetition_penalty可能效果会大一点,但是也会影响生成文本的随机性和多样性。您可以尝试repetition_penalty的取值范围调整到1以上进行探索一下~

def answer(self, conv, img_list, max_new_tokens=200, num_beams=1, min_length=1, top_p=0.9,
repetition_penalty=1.0, length_penalty=1, temperature=1.0):
conv.messages.append([conv.roles[1], None])
embs = self.get_context_emb(conv, img_list)
outputs = self.model.llama_model.generate(
inputs_embeds=embs,
max_new_tokens=max_new_tokens,
stopping_criteria=self.stopping_criteria,
num_beams=num_beams,
do_sample=True,
min_length=min_length,
top_p=top_p,
repetition_penalty=repetition_penalty,
length_penalty=length_penalty,
temperature=temperature,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants