Why meanpooling is better? #42

code10086web · 2021-11-29T06:22:16Z

Thanks to open source for such a good job. why mean pooling is better than other methods? In theory, LSTM and Transformer use the time dimension relationship, and the result may be better. Could you give some theoretical analysis or tips? Thanks a lot. By the way, Can the code provide how to save the weights after each epoch?

ArrowLuo · 2021-11-29T16:29:31Z

Hi @code10086web, good question. To be honest, I have no idea on theoretical analysis to explain the reason. But intuitively, using LSTM and Transformer can model the sequential representation, and will obtain better performance. In some cases, the results of LSTM and Transformer in Sequential type are comparable. However, the mean pooling is stable. We guess the mean pooling is a Parameter-free type, so the CLIP weights will be changed consistently with gradient propagation. But the LSTM and Transformer bring randomly initialized weights, which makes the model hard to train (e.g., we use different learning rates for these new weights). And the CLIP is learning rate-sensitive when transferring to the retrieval task.

Uncomment Line #543 to save weights.

ArrowLuo closed this as completed Dec 1, 2021

celestialxevermore mentioned this issue Oct 2, 2022

Issues about Freezing some additional layers instead of meanP in CLIP4Clip microsoft/UniVL#43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why meanpooling is better? #42

Why meanpooling is better? #42

code10086web commented Nov 29, 2021

ArrowLuo commented Nov 29, 2021

Why meanpooling is better? #42

Why meanpooling is better? #42

Comments

code10086web commented Nov 29, 2021

ArrowLuo commented Nov 29, 2021