You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @nreimers@kwang2049 ,
First of all, thanks for sharing your great work on sentence transformers!
Regarding TSDAE implementation, I understood that CLS pooling method was used as it gave the best results, or at least almost the same results as Mean pooling with the advantage of keeping position information. But I was wondering if you have any theorical insight to explain this empirical result, knowing that:
Mean pooling was considered as a better method for previous SBERT implementations (right?)
I don't really get the fact that position information is useful for this training
Thanks in advance!
Thomas
The text was updated successfully, but these errors were encountered:
Hi @nreimers @kwang2049 ,
First of all, thanks for sharing your great work on sentence transformers!
Regarding TSDAE implementation, I understood that CLS pooling method was used as it gave the best results, or at least almost the same results as Mean pooling with the advantage of keeping position information. But I was wondering if you have any theorical insight to explain this empirical result, knowing that:
Thanks in advance!
Thomas
The text was updated successfully, but these errors were encountered: