New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

请问为什么要用max处理sequence_out #29

Open

leomeow33 opened this issue Nov 3, 2022 · 1 comment

leomeow33 commented Nov 3, 2022

在modeling.py的1129行附近
#first_token_tensor = sequence_output[:, 0]
first_token_tensor, pool_index = torch.max(sequence_output, dim=1)

我调试看到sequence_output是一个[8,46,1034]维的tensor，为什么要用在1维上的max来处理它呢？这样会把不同token的第三维混合吧。
一般不是用cls来获得全句的语义信息吗？就像注释前的那样。

而且我按照目前的代码复现了，效果非常棒，就让我更想不通了，不用cls为什么可以有这么优秀的效果。
谢谢作者!

ALR-alr commented May 16, 2024

我也有同样的疑问。网上说cls其实只是一个分类信息，不能很好地代表整个句子的信息，但BertForSequenceClassification是个分类任务模型呀，用cls不是更好吗？

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment