Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoder部分输入没有看懂 #1

Closed
iceabing opened this issue Jan 29, 2024 · 2 comments
Closed

decoder部分输入没有看懂 #1

iceabing opened this issue Jan 29, 2024 · 2 comments

Comments

@iceabing
Copy link

师兄你好,你的decoder输入的是什么啊?论文里面说它的输入是the first 𝑇 − 1 time steps of the input sequence of the encoder,所以encoder和decoder的输入有什么区别吗?为什么要用这个作为输入啊,一般的transformer模型不都是用encoder的输出作为decoder的输入吗?

@defineZYP
Copy link
Owner

你好,这里的decoder结构就是最原始的decoder结构,decoder不仅仅需要encoder的输出作为输入(一般在中间进行,即第二个attention模块),它还需要上一个时间步的序列作为初始输入。举一个例子,假设当前序列数据共n步,为[t1, t2, ..., tn],对于decoder结构,它在最开始被设计出来的目的其实是预测下一步的结果,由于我们想要的是通过decoder重建当前的n步序列,因此我们需要提供前n-1步的序列。一般而言,decoder有两个主要的模块,self-attention和encoder-decoder attention。对于前者,需要接收前n-1步的序列,对于后者,则利用前者的输出和encoder的输出进行attention操作。文章里的描述可能有一点绕,但其实就是对当前序列进行右移然后再做padding。同学你也可以参考一些机器翻译的例子这样可能更直观一些,如https://zhuanlan.zhihu.com/p/338817680。

@iceabing
Copy link
Author

好耶,谢谢师兄解答!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants