Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why meanpooling is better? #42

Closed
code10086web opened this issue Nov 29, 2021 · 1 comment
Closed

Why meanpooling is better? #42

code10086web opened this issue Nov 29, 2021 · 1 comment

Comments

@code10086web
Copy link

Thanks to open source for such a good job. why mean pooling is better than other methods? In theory, LSTM and Transformer use the time dimension relationship, and the result may be better. Could you give some theoretical analysis or tips? Thanks a lot. By the way, Can the code provide how to save the weights after each epoch?

@ArrowLuo
Copy link
Owner

Hi @code10086web, good question. To be honest, I have no idea on theoretical analysis to explain the reason. But intuitively, using LSTM and Transformer can model the sequential representation, and will obtain better performance. In some cases, the results of LSTM and Transformer in Sequential type are comparable. However, the mean pooling is stable. We guess the mean pooling is a Parameter-free type, so the CLIP weights will be changed consistently with gradient propagation. But the LSTM and Transformer bring randomly initialized weights, which makes the model hard to train (e.g., we use different learning rates for these new weights). And the CLIP is learning rate-sensitive when transferring to the retrieval task.

Uncomment Line #543 to save weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants