Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant outputs when performing in-context learning #77

Open
chan-ming opened this issue Jan 16, 2024 · 3 comments
Open

Redundant outputs when performing in-context learning #77

chan-ming opened this issue Jan 16, 2024 · 3 comments

Comments

@chan-ming
Copy link

Hello, thanks for the great work!
I am trying to do some VQA tasks using Emu2. I selected the examples following the same approach mentioned in the paper and used the same prompt. And I used the function 'generate' in the same way as the inference example in readme.
However, the model may repeat the question and generate redundant outputs when the number of in-context examples is 4 or 8, like the picture below. When the number of in-context examples is 16, the model won't generate redundant outputs. I wonder whether it's normal or I have made some mistakes. If there are mistakes, how can I fix the problem?
Thanks in advance!
WeChat1ed76ee74c3c91d3c06bd59d263aeff8

@yqy2001
Copy link
Member

yqy2001 commented Jan 17, 2024

Yes, this is a normal phenomenon as the output format of pretrained model is hard to control. The behavior is also similar to that of LLaMA-1-30B, especially when the model is provided with a limited number of in-context examples.

We change the hyperparams and conduct post-processing on model's outputs to refine the answers:

  1. hyperparameters for questions requiring short answers (e.g. VQAv2):
    max_new_tokens: 10
    min_length: 1
    num_beams: 5
    length_penalty: -1.0

  2. post-process to refine the outputs' format:

def short_answer(answer):
    # shorten pretrained model's uncontrollable output for benchmark evaluation
    answer = answer.split('\n')[0]
    answer = answer.split('. ')[0]
    answer = answer.split('\"')[0]
    answer = answer.split(', ')[0]
    answer = answer.strip()
    answer = answer.lower()
    answer = answer if len(answer) == 0 or answer[-1] != '.' else answer[:-1]
    answer = answer.replace('it is ', '', 1) if answer.startswith('it is ') else answer
    answer = answer.replace('it\'s ', '', 1) if answer.startswith('it\'s ') else answer
    answer = answer.replace('a ', '', 1) if answer.startswith('a ') else answer
    answer = answer.replace('an ', '', 1) if answer.startswith('an ') else answer
    answer = answer.replace('the ', '', 1) if answer.startswith('the ') else answer
    return answer

By the way, the evaluation code will be released in the near future. We will notify you and you can also refer to our implementation then.

@chan-ming
Copy link
Author

Thanks for your reply! I used the same hyper parameters and followed the post-process procedure and evaluation code of VQAv2. However, the VQA score is only 49 on OKVQA dataset when the number of in-context examples is 16. I wonder where the problem is.

@sachit-menon
Copy link

@yqy2001 thank you for the post-processing and hyperparameters! Do you have an approximate estimate for when the evaluation code might be released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants