-
Notifications
You must be signed in to change notification settings - Fork 520
[SYSTEMDS-3303] : NN Builtin: Attention Layer #1625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nd try the attention layer.
modified attention_layer toy example
…ction documentation regarding inputs and outputs for forward and backward, prototype of backward pass
…tion layer, fixed backward pass for attention layer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall i like this PR,
There are some things that needs to be addressed:
- For performance (removing for loop)
- For consistency add the arguments for the attention matrix and gradients for forward and backwards call to make the functions like the other operations.
- The example does not verify if the attention behaves correctly in case of different dimensions for query and value, and it would be nice if we could have this.
- I do not understand the test, and how that verifies if the method works but i think some comments in the tests would help solve that.
- Remove the data from the PR, and add a download script and change .gitignore to ignore downloaded file.
best regards
Sebastian
| } | ||
| } | ||
|
|
||
| attention = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand what this method is testing, maybe a comment would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is to verify the gradient of the backward pass numerically.
The comments are just copied and modified from the other test cases in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention of me commenting on it was, that you add a comment in code. When you add it here only i get the help from you.
|
@Baunsgaard Thanks for the feedback. We'll address the feedback shortly. |
…the AttentionExample.dml file.
|
@Baunsgaard i think we addressed all issues, with the exception of the for loop in the layer. |
…ttentionExample.sh script can be found.
Baunsgaard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments,
I will take it from here.
This commit adds a new Neural network builtin layer for attention. AMLS project SS2022 Co-authored-by: Anton Postl <anton.postl@student.tugraz.at> Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at> Closes apache#1625
This commit adds a new neural network builtin layer for attention. AMLS project SS2022 Closes apache#1625 Co-authored-by: Anton Postl <anton.postl@student.tugraz.at> Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at>
|
Closing for merging. |
This commit adds a new neural network builtin layer for attention. AMLS project SS2022 Closes apache#1625 Closes apache#1679 Co-authored-by: Anton Postl <anton.postl@student.tugraz.at> Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at>
This PR implements an attention layer using scaled dot-product as the alignment score function,
as well as an example showing the usage of the attention layer (as self-attention) in combination with an LSTM layer on
a text classification problem on real data.