Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

非常优秀的工作,有个关于Spatial Routing Module (SRM)的问题想请教下您。 #1

Closed
Threegood-student opened this issue Jun 1, 2024 · 1 comment

Comments

@Threegood-student
Copy link

作者您好!Spatial Routing Module (SRM)当中的num_experts=4,use_experts=2。想请教下您,这个num_experts和use_experts是如何进行判断,设置为多少个呢?

@Yaziwel
Copy link
Owner

Yaziwel commented Jun 1, 2024

作者您好!Spatial Routing Module (SRM)当中的num_experts=4,use_experts=2。想请教下您,这个num_experts和use_experts是如何进行判断,设置为多少个呢?

谢谢您对我们工作的认可!

SRM的use_experts/num_experts的设置主要考虑(1)任务的数量和(2)任务间可能存在的干扰和协同作用两个因素。

(1)目前任务数量是3,所以num_experts≥3;

(2)解决任务之间的干扰主要依赖Top-K gating对专家网络选择实现的,目前MoE的工作常用的是Top-1 gating (use_experts=1)和Top-2 gating (use_experts=2)。我们选择use_experts=2 是因为其能更细粒度的同时考虑任务之间的干扰和协同作用,即两个不同任务图像的token既能选择不同的expert来处理他们之间的差异,又能同时选择同一个expert来处理潜在的共性。在3个任务,use_experts=2的情况下,设置num_experts=3也不合适,因为如果3个expert各自负责一个任务,那use_experts=2的设定会强制一个task去使用另一个task的expert,这样依然会导致干扰,所以num_experts=4是最基础的设置。

当然以上都是基于直觉和理论上的分析,在之后的工作中我们会对expert的设定做更多深入的研究。

@Yaziwel Yaziwel closed this as completed Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants