Skip to content

Conversation

@A-Postl
Copy link
Contributor

@A-Postl A-Postl commented Jun 2, 2022

This PR implements an attention layer using scaled dot-product as the alignment score function,
as well as an example showing the usage of the attention layer (as self-attention) in combination with an LSTM layer on
a text classification problem on real data.

SteveineiterTU and others added 30 commits April 11, 2022 09:51
modified attention_layer toy example
…ction documentation regarding inputs and outputs for forward and backward, prototype of backward pass
…tion layer, fixed backward pass for attention layer
Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall i like this PR,
There are some things that needs to be addressed:

  • For performance (removing for loop)
  • For consistency add the arguments for the attention matrix and gradients for forward and backwards call to make the functions like the other operations.
  • The example does not verify if the attention behaves correctly in case of different dimensions for query and value, and it would be nice if we could have this.
  • I do not understand the test, and how that verifies if the method works but i think some comments in the tests would help solve that.
  • Remove the data from the PR, and add a download script and change .gitignore to ignore downloaded file.

best regards
Sebastian

}
}

attention = function() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand what this method is testing, maybe a comment would help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is to verify the gradient of the backward pass numerically.
The comments are just copied and modified from the other test cases in this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of me commenting on it was, that you add a comment in code. When you add it here only i get the help from you.

@A-Postl
Copy link
Contributor Author

A-Postl commented Jun 6, 2022

@Baunsgaard Thanks for the feedback. We'll address the feedback shortly.

@A-Postl
Copy link
Contributor Author

A-Postl commented Jun 15, 2022

@Baunsgaard i think we addressed all issues, with the exception of the for loop in the layer.

@A-Postl A-Postl changed the title [SYSTEMDS-3303] WIP: NN Builtin: Attention Layer (need feedback pls) [SYSTEMDS-3303] WIP: NN Builtin: Attention Layer Jun 17, 2022
@A-Postl A-Postl changed the title [SYSTEMDS-3303] WIP: NN Builtin: Attention Layer [SYSTEMDS-3303] : NN Builtin: Attention Layer Jun 17, 2022
@A-Postl A-Postl requested a review from Baunsgaard June 17, 2022 11:36
Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments,
I will take it from here.

Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request Aug 17, 2022
This commit adds a new Neural network builtin layer for attention.

AMLS project SS2022

Co-authored-by: Anton Postl <anton.postl@student.tugraz.at>
Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at>

Closes apache#1625
Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request Aug 17, 2022
This commit adds a new neural network builtin layer for attention.

AMLS project SS2022

Closes apache#1625

Co-authored-by: Anton Postl <anton.postl@student.tugraz.at>
Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at>
@Baunsgaard
Copy link
Contributor

Closing for merging.

@Baunsgaard Baunsgaard closed this Aug 17, 2022
Baunsgaard pushed a commit that referenced this pull request Aug 17, 2022
This commit adds a new neural network builtin layer for attention.

AMLS project SS2022

Closes #1625
Closes #1679

Co-authored-by: Anton Postl <anton.postl@student.tugraz.at>
Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at>
fathollahzadeh pushed a commit to fathollahzadeh/systemds that referenced this pull request Dec 7, 2022
This commit adds a new neural network builtin layer for attention.

AMLS project SS2022

Closes apache#1625
Closes apache#1679

Co-authored-by: Anton Postl <anton.postl@student.tugraz.at>
Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants