Why use the expected output in decoder training? #76

xiongma · 2019-02-25T05:52:55Z

@Kyubyong I have a question, why use decoder_input in decoder training? I think it will influence the model output,

xiongma · 2019-02-26T08:02:30Z

Ok.thx from Alimail iPhone ------------------Original Mail ------------------ From:Yang Tian <notifications@github.com> Date:2019-02-26 16:01:49 Recipient:Kyubyong/transformer <transformer@noreply.github.com> CC:Cally <mx15025700935@aliyun.com>, Author <author@noreply.github.com> Subject:Re: [Kyubyong/transformer] Why use the expected output in decoder training? (#76) Maybe you can find answer of the question on the paper Attention is all you need. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ty5491003 · 2019-03-16T07:43:59Z

In the paper section 3.1:

We also modify the self-attention sub-layer in the decoder stack to prevent positions from attending to subsequent positions. This masking, combined with fact that the output embeddings are offset by one position, ensures that the predictions for position i can depend only on the known outputs at positions less than i.

It takes a mask mechanism to prevent output from decoder_input. And the reason of using decoder_input, i think it is used to calc the loss value.
@policeme @moonlight1776

ywl0911 · 2019-03-19T10:45:51Z

I am also confused about this question.

xiongma · 2019-03-19T10:55:10Z

@ywl0911 if you understand it, please contract me, thx!

ywl0911 · 2019-03-19T12:38:00Z

In the paper section 3.1:

We also modify the self-attention sub-layer in the decoder stack to prevent positions from attending to subsequent positions. This masking, combined with fact that the output embeddings are offset by one position, ensures that the predictions for position i can depend only on the known outputs at positions less than i.

It takes a mask mechanism to prevent output from decoder_input. And the reason of using decoder_input, i think it is used to calc the loss value.
@policeme @moonlight1776

@ty5491003
您好，能方便解释下为什么decoder部分用decoder_input作为输入，这样的话在测试的时候我们拿不到decoder_input如何进行测试呢～

ty5491003 · 2019-03-19T12:58:00Z

@ywl0911 当我说出原因时你一定会觉得很搞笑。
在 test.py代码的数据处理部分：
test_batches, num_test_batches, num_test_samples = get_batch(hp.test1, hp.test1,
注意看两个参数，是将同一个 hp.test1输入了两次。

ywl0911 · 2019-03-20T02:54:24Z

@ywl0911 当我说出原因时你一定会觉得很搞笑。
在 test.py代码的数据处理部分：
test_batches, num_test_batches, num_test_samples = get_batch(hp.test1, hp.test1,
注意看两个参数，是将同一个 hp.test1输入了两次。

这……，那就是这个代码这个地方有问题么，是不是应该改成将上个时刻decoder的输出作为下个时刻的输入～

ni1lloc · 2019-03-20T03:23:45Z

@ywl0911 当我说出原因时你一定会觉得很搞笑。
在 test.py代码的数据处理部分：
test_batches, num_test_batches, num_test_samples = get_batch(hp.test1, hp.test1,
注意看两个参数，是将同一个 hp.test1输入了两次。

这……，那就是这个代码这个地方有问题么，是不是应该改成将上个时刻decoder的输出作为下个时刻的输入～

Transformer.eval(), at model.py:152
In inference section, we send ['<s>'] + y[0..t-1] into the model, and then it returns y[0..t], both of them has a length of t+1. Repeat caculating until all sentence in the batch outputs '<pad>'(or reach the max length)

the second input is unused for predictions.
Just check the definition of get_batch() at data_load.py:132 and input_fn() at data_load.py:92

trx14 · 2019-04-12T08:44:19Z

@ywl0911 当我说出原因时你一定会觉得很搞笑。
在 test.py代码的数据处理部分：
test_batches, num_test_batches, num_test_samples = get_batch(hp.test1, hp.test1,
注意看两个参数，是将同一个 hp.test1输入了两次。

这……，那就是这个代码这个地方有问题么，是不是应该改成将上个时刻decoder的输出作为下个时刻的输入～

Transformer.eval(), at model.py:152
In inference section, we send ['<s>'] + y[0..t-1] into the model, and then it returns y[0..t], both of them has a length of t+1. Repeat caculating until all sentence in the batch outputs '<pad>'(or reach the max length)

the second input is unused for predictions.
Just check the definition of get_batch() at data_load.py:132 and input_fn() at data_load.py:92

But why don't use this method in training? Because it was very slow?

Pydataman · 2019-06-25T10:33:16Z

when training, use the expected output in decoder training to accelerate convergence， teacher forcing。

xiongma closed this as completed Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why use the expected output in decoder training? #76

Why use the expected output in decoder training? #76

xiongma commented Feb 25, 2019

xiongma commented Feb 26, 2019 via email

ty5491003 commented Mar 16, 2019

ywl0911 commented Mar 19, 2019

xiongma commented Mar 19, 2019

ywl0911 commented Mar 19, 2019 •

edited

ty5491003 commented Mar 19, 2019

ywl0911 commented Mar 20, 2019

ni1lloc commented Mar 20, 2019 •

edited

trx14 commented Apr 12, 2019

Pydataman commented Jun 25, 2019

Why use the expected output in decoder training? #76

Why use the expected output in decoder training? #76

Comments

xiongma commented Feb 25, 2019

xiongma commented Feb 26, 2019 via email

ty5491003 commented Mar 16, 2019

ywl0911 commented Mar 19, 2019

xiongma commented Mar 19, 2019

ywl0911 commented Mar 19, 2019 • edited

ty5491003 commented Mar 19, 2019

ywl0911 commented Mar 20, 2019

ni1lloc commented Mar 20, 2019 • edited

trx14 commented Apr 12, 2019

Pydataman commented Jun 25, 2019

ywl0911 commented Mar 19, 2019 •

edited

ni1lloc commented Mar 20, 2019 •

edited