We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在modeling.py的1129行附近 #first_token_tensor = sequence_output[:, 0] first_token_tensor, pool_index = torch.max(sequence_output, dim=1)
我调试看到sequence_output是一个[8,46,1034]维的tensor,为什么要用在1维上的max来处理它呢?这样会把不同token的第三维混合吧。 一般不是用cls来获得全句的语义信息吗?就像注释前的那样。
而且我按照目前的代码复现了,效果非常棒,就让我更想不通了,不用cls为什么可以有这么优秀的效果。 谢谢作者!
The text was updated successfully, but these errors were encountered:
我也有同样的疑问。网上说cls其实只是一个分类信息,不能很好地代表整个句子的信息,但BertForSequenceClassification是个分类任务模型呀,用cls不是更好吗?
Sorry, something went wrong.
No branches or pull requests
在modeling.py的1129行附近
#first_token_tensor = sequence_output[:, 0]
first_token_tensor, pool_index = torch.max(sequence_output, dim=1)
我调试看到sequence_output是一个[8,46,1034]维的tensor,为什么要用在1维上的max来处理它呢?这样会把不同token的第三维混合吧。
一般不是用cls来获得全句的语义信息吗?就像注释前的那样。
而且我按照目前的代码复现了,效果非常棒,就让我更想不通了,不用cls为什么可以有这么优秀的效果。
谢谢作者!
The text was updated successfully, but these errors were encountered: