Questions about decoder input and positional encoding #7

TimHo0331 · 2021-06-18T13:45:49Z

Hi,

In page 4, it is said that 'The decoder inputs are the trajectory proposals, which are initialied by a set of learnable positional encoding'. And in page 9, it is said that 'The decoder receives proposals(randomly initialized), positional encoding of proposals, as well as encoder memory...'
So, what is the input of the first decoder layer? Is it randomly initialized proposals added by learnable positional encoding? And what is the initialization distribution?
In page 9, it is said that 'In encoder, spatial positional encoding are added to the queries and keys at each MHSA layer'
Is the pisitional encoding in encoder fixed or learnable? Is this positional encoding used in both motion extractor, map aggregator and social constructor or only one of them?
Thank you.

sparshgarg23 · 2021-06-23T07:48:32Z

don't know if this helps but from what I could figure out : the decoder's input is a set of on the quality of the predefined anchors/waypoints/trajectories randomly initialzied(or learnt from heuristics). along with a the positional encoding of the proposals.These two along with the the set of histories serve as the input for the first transformer which outputs a list of candidate proposals.
Now lets talk about point 2.
In the map stage we have 3 inputs

The context from the map
The motion extracter's output
As well as the positional encoding(this can be skipped and is only used in map aggregator and motion extractor).And yes it's learnable
Of course,it would be better if the authors/ of the paper or anyone else could correct me.

jzhang538 · 2021-07-07T15:46:29Z

Hi,

In page 4, it is said that 'The decoder inputs are the trajectory proposals, which are initialied by a set of learnable positional encoding'. And in page 9, it is said that 'The decoder receives proposals(randomly initialized), positional encoding of proposals, as well as encoder memory...'
So, what is the input of the first decoder layer? Is it randomly initialized proposals added by learnable positional encoding? And what is the initialization distribution?

In page 9, it is said that 'In encoder, spatial positional encoding are added to the queries and keys at each MHSA layer'
Is the pisitional encoding in encoder fixed or learnable? Is this positional encoding used in both motion extractor, map aggregator and social constructor or only one of them?
Thank you.

Sorry for the late reply.

For the first question, I need to declare that the trajectory proposals are learnable parameters initialized by torch.nn.Parameter(). It keeps updating during the training process and becomes more and more meaningful. The input of the first decoder layer (motion extractor) is only the initialized trajectory proposals as well as the encoding memory of trajectory history.

For the second question, similar to DETR, the initialized proposals (by nn.Parameter function) are also used as positional encoding for each decoder layer. It is used in all three transformer modules with no distinction.

jzhang538 · 2021-07-07T15:50:08Z

don't know if this helps but from what I could figure out : the decoder's input is a set of on the quality of the predefined anchors/waypoints/trajectories randomly initialzied(or learnt from heuristics). along with a the positional encoding of the proposals.These two along with the the set of histories serve as the input for the first transformer which outputs a list of candidate proposals.
Now lets talk about point 2.
In the map stage we have 3 inputs

The context from the map

The motion extracter's output

As well as the positional encoding(this can be skipped and is only used in map aggregator and motion extractor).And yes it's learnable
Of course,it would be better if the authors/ of the paper or anyone else could correct me.

Thanks so much for the kind reply! For point 2, we also add positional encoding for each trajectory proposal, which is the initialized representation.

sparshgarg23 · 2021-07-07T16:46:25Z

thanks for the reply,any ideas on when the code will be made public??
Maybe you could provide some data visualization stuff first and then slowly make parts of it public.

TimHo0331 · 2021-07-08T09:38:16Z

Hi,

In page 4, it is said that 'The decoder inputs are the trajectory proposals, which are initialied by a set of learnable positional encoding'. And in page 9, it is said that 'The decoder receives proposals(randomly initialized), positional encoding of proposals, as well as encoder memory...'
So, what is the input of the first decoder layer? Is it randomly initialized proposals added by learnable positional encoding? And what is the initialization distribution?

In page 9, it is said that 'In encoder, spatial positional encoding are added to the queries and keys at each MHSA layer'
Is the pisitional encoding in encoder fixed or learnable? Is this positional encoding used in both motion extractor, map aggregator and social constructor or only one of them?
Thank you.

Sorry for the late reply.

For the first question, I need to declare that the trajectory proposals are learnable parameters initialized by torch.nn.Parameter(). It keeps updating during the training process and becomes more and more meaningful. The input of the first decoder layer (motion extractor) is only the initialized trajectory proposals as well as the encoding memory of trajectory history.

For the second question, similar to DETR, the initialized proposals (by nn.Parameter function) are also used as positional encoding for each decoder layer. It is used in all three transformer modules with no distinction.

thanks for your kind reply.
For the second question, I think you misunderstand my meaning. I want to ask that for the encoder of all three transformer module, do you use positional encoding? fixed or learnable?

jzhang538 · 2021-07-08T13:47:17Z

Hi,

In page 4, it is said that 'The decoder inputs are the trajectory proposals, which are initialied by a set of learnable positional encoding'. And in page 9, it is said that 'The decoder receives proposals(randomly initialized), positional encoding of proposals, as well as encoder memory...'
So, what is the input of the first decoder layer? Is it randomly initialized proposals added by learnable positional encoding? And what is the initialization distribution?

In page 9, it is said that 'In encoder, spatial positional encoding are added to the queries and keys at each MHSA layer'
Is the pisitional encoding in encoder fixed or learnable? Is this positional encoding used in both motion extractor, map aggregator and social constructor or only one of them?
Thank you.

Sorry for the late reply.
For the first question, I need to declare that the trajectory proposals are learnable parameters initialized by torch.nn.Parameter(). It keeps updating during the training process and becomes more and more meaningful. The input of the first decoder layer (motion extractor) is only the initialized trajectory proposals as well as the encoding memory of trajectory history.
For the second question, similar to DETR, the initialized proposals (by nn.Parameter function) are also used as positional encoding for each decoder layer. It is used in all three transformer modules with no distinction.

thanks for your kind reply.
For the second question, I think you misunderstand my meaning. I want to ask that for the encoder of all three transformer module, do you use positional encoding? fixed or learnable?

According to our experiments, the positional encoding of encoder part doesn't affect much to the final results. You can ignore it for simplicity and the result will be almost the same.

jzhang538 · 2021-07-08T13:48:39Z

thanks for the reply,any ideas on when the code will be made public??
Maybe you could provide some data visualization stuff first and then slowly make parts of it public.

Thanks so much for the suggestion. We will try to make the visualization code public soon.

jzhang538 closed this as completed Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about decoder input and positional encoding #7

Questions about decoder input and positional encoding #7

TimHo0331 commented Jun 18, 2021

sparshgarg23 commented Jun 23, 2021 •

edited

jzhang538 commented Jul 7, 2021

jzhang538 commented Jul 7, 2021

sparshgarg23 commented Jul 7, 2021 •

edited

TimHo0331 commented Jul 8, 2021

jzhang538 commented Jul 8, 2021

jzhang538 commented Jul 8, 2021

Questions about decoder input and positional encoding #7

Questions about decoder input and positional encoding #7

Comments

TimHo0331 commented Jun 18, 2021

sparshgarg23 commented Jun 23, 2021 • edited

jzhang538 commented Jul 7, 2021

jzhang538 commented Jul 7, 2021

sparshgarg23 commented Jul 7, 2021 • edited

TimHo0331 commented Jul 8, 2021

jzhang538 commented Jul 8, 2021

jzhang538 commented Jul 8, 2021

sparshgarg23 commented Jun 23, 2021 •

edited

sparshgarg23 commented Jul 7, 2021 •

edited