Using trg[: ,;-1] during training #136

wajihullahbaig · 2020-09-04T07:30:04Z

Thank you for this awesome repo you have made public. I had one question, during the training loop, you perform the following step
output, _ = model(src, trg[:,:-1])

I was wondering why are we doing the trg[:,:-1] step?

Kind regards
Wajih

The text was updated successfully, but these errors were encountered:

bentrevett · 2020-09-08T22:00:49Z

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]).

Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses.

Let me know if this needs clarifying.

wajihullahbaig · 2020-09-10T05:30:11Z

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]).

Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses.

Let me know if this needs clarifying.

Oh I understand now, Thanks indeed for the elaborated reply.

Wajih

fabio-deep · 2020-11-20T00:41:05Z

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]).

Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses.

Let me know if this needs clarifying.

Hi, how does this work when the trg sentence is padded? In that case I imagine the eos token would no longer be in last position right? or am I missing something.

EDIT: nevermind I figured it out, in case anyone else is wondering: it works with padded inputs anyway because of ignore_index in the loss function.

bentrevett · 2020-11-30T17:51:36Z

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]).
Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses.
Let me know if this needs clarifying.

Hi, how does this work when the trg sentence is padded? In that case I imagine the eos token would no longer be in last position right? or am I missing something.

EDIT: nevermind I figured it out, in case anyone else is wondering: it works with padded inputs anyway because of ignore_index in the loss function.

Sorry for the late reply - seems like you've figured it out now but just in case someone else is reading this then I'll explain.

When we have padding then our trg sequence will be something like [<sos>, A, B, C, <eos>, <pad>, <pad>]. So the sequence input into the decoder is [<sos>, A, B, C, <eos>, <pad>] (trg[:,:,-1]) and our decoder will be trying to predict the sequence [A, B, C, <eos>, <pad>] (trg[:,1:]).

This means that yes, the <eos> token is input into the model even though it shouldn't be - because why should you predict something after the end of the sequence? - but there is no way to avoid this when padding sequences. However, because we set the ignore_index of our CrossEntropyLoss to be the index of the padding token, whenever the decoder's target token is a <pad> token we don't calculate losses over that token.

So in the above example, we only calculate the losses when the decoder's input is [<sos>, A, B, C] because the <eos> and <pad> token both have a target token of <pad>. This means we calculate our losses (and thus update our parameters) as if the padding tokens didn't exist (sort of, we still have to waste some computation but this is offset by the fact that we can use batches instead of feeding in examples one at a time or only making batches where every sequence is the exact same length)

liuxiaoqun · 2021-06-16T06:43:13Z

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]).

Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses.

Let me know if this needs clarifying.

I have a question
the sentence is padded after eos
so, the sentences are like:
sos y1,y2, eos , pad , pad , pad
sos y1,y2,y3,y4,y5, eos
sos y1,y2, y3,y4, eos , pad

the size is trg is [3,7]
if trg is trg[:,:-1]
the sentences is cutted like
sos y1,y2, eos , pad , pad
sos y1,y2,y3,y4,y5,
sos y1,y2, y3,y4, eos
so, it is not cut all eos

I check the torchtxt, the sentence is concatenated as:
sos sentence eos pad , trg[:,:-1] will not cut all eos

if sentence is concatenated like:
sos sentence pad eos ,
in this case, it will cut all eos

ProxJ · 2022-02-27T23:20:18Z

For anyone who'll find this in future, output, _ = model(src, trg[:,:-1]) seems to no longer be there, but the decoder loop in the Seq2Seq class starts from 0 to trg-1. It's currently written as for t in range(1, trg_len):, where input is always t-1 at the start of each loop (it increments at the end).
Took me a minute to figure out where the [:,:-1] went.

#182 # more in depth explaination of trg[:,:-1] and how it interacts with padding.
#43 (comment) #impact of <sos> and <eos> tokens on src -> model learns to ignore.

wajihullahbaig · 2022-02-28T07:34:52Z

For anyone who'll find this in future, output, _ = model(src, trg[:,:-1]) seems to no longer be there, but the decoder loop in the Seq2Seq class starts from 0 to trg-1. It's currently written as for t in range(1, trg_len):, where input is always t-1 at the start of each loop (it increments at the end). Took me a minute to figure out where the [:,:-1] went.

#182 # more in depth explaination of trg[:,:-1] and how it interacts with padding. #43 (comment) #impact of <sos> and <eos> tokens on src -> model learns to ignore.

You are correct. Seems to have been updated now.

bentrevett mentioned this issue Feb 10, 2022

[Bug] Tranformer Seq2Seq Have Wrong Inputs! #182

Closed

bentrevett closed this as completed Jan 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using trg[: ,;-1] during training #136

Using trg[: ,;-1] during training #136

wajihullahbaig commented Sep 4, 2020 •

edited

bentrevett commented Sep 8, 2020

wajihullahbaig commented Sep 10, 2020

fabio-deep commented Nov 20, 2020 •

edited

bentrevett commented Nov 30, 2020

liuxiaoqun commented Jun 16, 2021 •

edited

ProxJ commented Feb 27, 2022

wajihullahbaig commented Feb 28, 2022

Using trg[: ,;-1] during training #136

Using trg[: ,;-1] during training #136

Comments

wajihullahbaig commented Sep 4, 2020 • edited

bentrevett commented Sep 8, 2020

wajihullahbaig commented Sep 10, 2020

fabio-deep commented Nov 20, 2020 • edited

bentrevett commented Nov 30, 2020

liuxiaoqun commented Jun 16, 2021 • edited

ProxJ commented Feb 27, 2022

wajihullahbaig commented Feb 28, 2022

wajihullahbaig commented Sep 4, 2020 •

edited

fabio-deep commented Nov 20, 2020 •

edited

liuxiaoqun commented Jun 16, 2021 •

edited