New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about decoder input and positional encoding #7
Comments
don't know if this helps but from what I could figure out : the decoder's input is a set of on the quality of the predefined anchors/waypoints/trajectories randomly initialzied(or learnt from heuristics). along with a the positional encoding of the proposals.These two along with the the set of histories serve as the input for the first transformer which outputs a list of candidate proposals.
|
Sorry for the late reply. For the first question, I need to declare that the trajectory proposals are learnable parameters initialized by torch.nn.Parameter(). It keeps updating during the training process and becomes more and more meaningful. The input of the first decoder layer (motion extractor) is only the initialized trajectory proposals as well as the encoding memory of trajectory history. For the second question, similar to DETR, the initialized proposals (by nn.Parameter function) are also used as positional encoding for each decoder layer. It is used in all three transformer modules with no distinction. |
Thanks so much for the kind reply! For point 2, we also add positional encoding for each trajectory proposal, which is the initialized representation. |
thanks for the reply,any ideas on when the code will be made public?? |
thanks for your kind reply. |
According to our experiments, the positional encoding of encoder part doesn't affect much to the final results. You can ignore it for simplicity and the result will be almost the same. |
Thanks so much for the suggestion. We will try to make the visualization code public soon. |
Hi,
So, what is the input of the first decoder layer? Is it randomly initialized proposals added by learnable positional encoding? And what is the initialization distribution?
Is the pisitional encoding in encoder fixed or learnable? Is this positional encoding used in both motion extractor, map aggregator and social constructor or only one of them?
Thank you.
The text was updated successfully, but these errors were encountered: