Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于专家的职能 #18

Closed
spidercatfly opened this issue Mar 23, 2024 · 2 comments
Closed

关于专家的职能 #18

spidercatfly opened this issue Mar 23, 2024 · 2 comments

Comments

@spidercatfly
Copy link

您好,感谢您这项很有启发性的工作。

请问能给出不同专家的职能范围的大概描述吗,感觉不同专家并不是针对不同的模态,而是对image模态有不同侧重的理解,所以导致image和video等与image相近的模态对专家的数量更加敏感。

此外,这种情况的出现是否与encode阶段使用freeze的image encoder,限制了其他模态的学习有关?或者说这是在做一种软对齐,将其他模态与image做对齐是吗?

@csuhan
Copy link
Owner

csuhan commented Mar 24, 2024

感谢您对我们工作的关注!

您的理解是合理的。本文首先训练Image-to-LLM的projection module,并逐渐将其他模态的X-to-LLM projection也加入到相同的模块当中。 本质上是将一个Image-to-LLM模块进行微调,使其适应X-to-LLM alignment。

此处frozen image encoder作为一个通用的high-level semantic feature extractor,在一定程度上会限制其他模态的学习。可以简单的可以理解为其他模态与image的对齐。然而由于projection module会在多种数据上联合训练,其最终状态可能为一种折中状态,而非单纯的将其他模态对齐到image。
image

@spidercatfly
Copy link
Author

get! 感谢讲解!
确实是很nice的思路

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants