We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
作者您好!Spatial Routing Module (SRM)当中的num_experts=4,use_experts=2。想请教下您,这个num_experts和use_experts是如何进行判断,设置为多少个呢?
The text was updated successfully, but these errors were encountered:
谢谢您对我们工作的认可!
SRM的use_experts/num_experts的设置主要考虑(1)任务的数量和(2)任务间可能存在的干扰和协同作用两个因素。
(1)目前任务数量是3,所以num_experts≥3;
(2)解决任务之间的干扰主要依赖Top-K gating对专家网络选择实现的,目前MoE的工作常用的是Top-1 gating (use_experts=1)和Top-2 gating (use_experts=2)。我们选择use_experts=2 是因为其能更细粒度的同时考虑任务之间的干扰和协同作用,即两个不同任务图像的token既能选择不同的expert来处理他们之间的差异,又能同时选择同一个expert来处理潜在的共性。在3个任务,use_experts=2的情况下,设置num_experts=3也不合适,因为如果3个expert各自负责一个任务,那use_experts=2的设定会强制一个task去使用另一个task的expert,这样依然会导致干扰,所以num_experts=4是最基础的设置。
当然以上都是基于直觉和理论上的分析,在之后的工作中我们会对expert的设定做更多深入的研究。
Sorry, something went wrong.
No branches or pull requests
作者您好!Spatial Routing Module (SRM)当中的num_experts=4,use_experts=2。想请教下您,这个num_experts和use_experts是如何进行判断,设置为多少个呢?
The text was updated successfully, but these errors were encountered: