Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the ablation #1

Open
Richar-Du opened this issue May 20, 2023 · 2 comments
Open

Question about the ablation #1

Richar-Du opened this issue May 20, 2023 · 2 comments

Comments

@Richar-Du
Copy link

Thanks for your awesome work! VisionLLM opens a way towards a generalist vision and language model.

However, from the result in the single task vs. multiple tasks in ablation study, it seems that multi-task training hurts the performance, what do you think caused this? Is the training data not large enough? OFA also introduce coordinate tokens and find that multi-task learning can improve performance. Thanks in advance :)

@czczup
Copy link
Member

czczup commented Jun 1, 2023

Hi, thanks for this question and apologize for the delayed response. Regarding the performance degradation observed in multi-task training, several factors could contribute to this result. First, we only used COCO data, which may not be enough; Second, it may be that multi-task training requires a longer training schedule to achieve comparable performance; Third, sharing parameters for multi-task training exists the task-interference issue.

As described in UniPerceiver-MoE:

Compared to specialized models with specific parameters for each task, generalist models with shared parameters would suffer from the task-interference issue — different tasks with shared parameters may conflict with each other [88]. The same issue is also observed in multilingual NLP models [4, 81, 83]. We argue that the task-interference issue is mainly caused by the inconsistent optimization in multi-task learning. As shown in Tab. 1, during the training phase of generalist models, the gradient directions of different tasks would be inconsistent or even opposite. Thus, if multiple tasks share parameters, the optimal update direction of the shared parameters will be uncertain, resulting in sub-optimal performance.

@Richar-Du
Copy link
Author

OK, thanks for your reply :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants