Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please provide full narration hyper-parameters #5

Open
maximotus opened this issue Apr 24, 2024 · 3 comments
Open

Please provide full narration hyper-parameters #5

maximotus opened this issue Apr 24, 2024 · 3 comments

Comments

@maximotus
Copy link

maximotus commented Apr 24, 2024

Hi ;)

for comparability reasons, it would be beneficial for the community to have insights into the full hyper-parameter setups.
I am especially interested in the LaViLa captioning config to use with your provided fair checkpoint for EgoSchema.
In detail, I need the following information:

  1. You say you use nucleus sampling with top_p=0.95 and choose k=5 for having 5 candidates. What temperature do you use?
  2. Besides, I saw you reporting in the paper that you use a temperature=0.0 for the LLMs, but I see in the readme you provide example commands for the summarization task with temperature=1.0. So do you use temperature=1.0 for LLM in summarization task and temperature=0.0 in QA task?

Clearification would be much appreciated! :)

Cheers,
Maximotus

@CeeZh
Copy link
Owner

CeeZh commented May 15, 2024 via email

@maximotus
Copy link
Author

Thank you very much for the provided information, that is very helpful!

May I ask why you do not provide ChatGPT 4 results using your sum + qa pipeline (or have you tested it, and if so, could you share the results)?
And I also wonder why you have used ChatGPT 4 for NextQA and IntentQA, but there without your proposed sum + qa pipeline, but just with the qa. Did you also test your sum + qa on these datasets?

More insights would be much appreciated! :)

Cheers,
Maximotus

@CeeZh
Copy link
Owner

CeeZh commented Jul 8, 2024

Hi Maximotus,

Sorry for the late reply.

I had a simple try on sum+qa using GPT-4 on EgoSchema subset. I did not observe a large improvement from the sum + qa pipeline. This is probably because: 1) hparams such as num_words need to be tuned more, 2) GPT-4 itself is strong enough so prompting does not bring more improvements anymore. Considering that GPT-4 is much more expensive than GPT-3.5 (a lot money to make sum+qa work), we decided to focus more on the GPT-3.5 experiments.

For NextQA, I tried sum + qa but did not get improvement, so I just used the standard prompt. In our latest submission, we reported both GPT-3.5 and GPT-4 results on next-qa.
Screenshot 2024-07-08 014007
Hope this information help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants