You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks to open source for such a good job. why mean pooling is better than other methods? In theory, LSTM and Transformer use the time dimension relationship, and the result may be better. Could you give some theoretical analysis or tips? Thanks a lot. By the way, Can the code provide how to save the weights after each epoch?
The text was updated successfully, but these errors were encountered:
Hi @code10086web, good question. To be honest, I have no idea on theoretical analysis to explain the reason. But intuitively, using LSTM and Transformer can model the sequential representation, and will obtain better performance. In some cases, the results of LSTM and Transformer in Sequential type are comparable. However, the mean pooling is stable. We guess the mean pooling is a Parameter-free type, so the CLIP weights will be changed consistently with gradient propagation. But the LSTM and Transformer bring randomly initialized weights, which makes the model hard to train (e.g., we use different learning rates for these new weights). And the CLIP is learning rate-sensitive when transferring to the retrieval task.
Thanks to open source for such a good job. why mean pooling is better than other methods? In theory, LSTM and Transformer use the time dimension relationship, and the result may be better. Could you give some theoretical analysis or tips? Thanks a lot. By the way, Can the code provide how to save the weights after each epoch?
The text was updated successfully, but these errors were encountered: