Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于论文中,只用BCE loss在activitinet上面效果的一点疑问 #1

Closed
starmemda opened this issue Dec 17, 2021 · 3 comments
Closed

Comments

@starmemda
Copy link

您好,
您的工作提出了一个很好的针对video grounding的组织对比学习的范式。再各个数据集上都表现的很惊艳,令人印象深刻。
关于您论文里面消融实验部分,我有一点点疑问,
好像您论文里面只用BCEloss,在activitynet上面就可以达到(R@1,IoU=0.5)= 46.75,这个比原先2D-TAN高了两个点。是不是可以认为这两个点是bert带来的呢?

@zhenzhiwang
Copy link
Collaborator

Hi,

我们在Charades-STA上复现了2D-TAN+Distillbert,其效果和原始的2D-TAN几乎一样,不过目前我们没有ActivityNet上的结果。有两个变量有可能对最终效果有影响,一个是Distillbert,一个是late fusion(metric learning)。虽然我们的recall@1,IoU=0.3/0.5和2D-TAN的效果存在比较明显的差别,但是其他4个指标和原始2D-TAN几乎一样,Recall@5的指标甚至更低一点。需要注意的是对于同一个网络来说R@1和R@5这两个指标往往并不是同步的(在更改nms参数时一个的提高往往会导致另一个的下降),如果是一个更强的网络的话这两者都应该提高才对,因此我倾向于认为也许Distillbert会有一些影响,但是差异是late fusion带来的可能性会更大一点,从而导致了部分指标不变,部分指标有变化的情况。你可以尝试在ActivityNet上复现2D-TAN+Distillbert,基于transformers库的改动是比较简单的。希望可以尽量使用英文提issue,这样所有人都能看懂,谢谢。如果有其他问题欢迎继续在issue里留言。

@starmemda
Copy link
Author

Thanks for your answer. I am not challenging you by asking this question, just because we conducted the bert replacing ourselves. And it turns out to be increasing on ActivityNet and decreasing on Charades.
I have no more questions. Thanks again!

@zhenzhiwang
Copy link
Collaborator

In fact I have some thoughts about your results: activitynet is a very large dataset with rich vocabularies while charades has much smaller vocabulary size. I guess that BERT-family models will tend to improve the performance in a larger vocabulary size and LSTM is good enough in small vocabulary size. For the comparisons between datasets, you could refer to table.1 of this paper (https://openaccess.thecvf.com/content/ICCV2021W/CVEU/papers/Soldan_VLG-Net_Video-Language_Graph_Matching_Network_for_Video_Grounding_ICCVW_2021_paper.pdf).

I will close this issue. Please raise another issue if you have other independent questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants