[SYSTEMDS-3303] : NN Builtin: Attention Layer #1625

A-Postl · 2022-06-02T18:40:53Z

This PR implements an attention layer using scaled dot-product as the alignment score function,
as well as an example showing the usage of the attention layer (as self-attention) in combination with an LSTM layer on
a text classification problem on real data.

…nd try the attention layer.

modified attention_layer toy example

…ng the softmax

…ction documentation regarding inputs and outputs for forward and backward, prototype of backward pass

…_pool2d.

…tion layer, fixed backward pass for attention layer

… sample

Baunsgaard

Overall i like this PR,
There are some things that needs to be addressed:

For performance (removing for loop)
For consistency add the arguments for the attention matrix and gradients for forward and backwards call to make the functions like the other operations.
The example does not verify if the attention behaves correctly in case of different dimensions for query and value, and it would be nice if we could have this.
I do not understand the test, and how that verifies if the method works but i think some comments in the tests would help solve that.
Remove the data from the PR, and add a download script and change .gitignore to ignore downloaded file.

best regards
Sebastian

scripts/nn/examples/AttentionExample.dml

scripts/nn/layers/attention.dml

Baunsgaard · 2022-06-06T11:50:14Z

src/test/scripts/applications/nn/grad_check.dml

   }
 }
+
+attention = function() {


I do not understand what this method is testing, maybe a comment would help.

This test is to verify the gradient of the backward pass numerically.
The comments are just copied and modified from the other test cases in this file.

The intention of me commenting on it was, that you add a comment in code. When you add it here only i get the help from you.

A-Postl · 2022-06-06T14:38:45Z

@Baunsgaard Thanks for the feedback. We'll address the feedback shortly.

…the AttentionExample.dml file.

A-Postl · 2022-06-15T14:47:50Z

@Baunsgaard i think we addressed all issues, with the exception of the for loop in the layer.

…ttentionExample.sh script can be found.

Baunsgaard

Thanks for addressing my comments,
I will take it from here.

This commit adds a new Neural network builtin layer for attention. AMLS project SS2022 Co-authored-by: Anton Postl <anton.postl@student.tugraz.at> Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at> Closes apache#1625

This commit adds a new neural network builtin layer for attention. AMLS project SS2022 Closes apache#1625 Co-authored-by: Anton Postl <anton.postl@student.tugraz.at> Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at>

Baunsgaard · 2022-08-17T12:53:02Z

Closing for merging.

This commit adds a new neural network builtin layer for attention. AMLS project SS2022 Closes #1625 Closes #1679 Co-authored-by: Anton Postl <anton.postl@student.tugraz.at> Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at>

This commit adds a new neural network builtin layer for attention. AMLS project SS2022 Closes apache#1625 Closes apache#1679 Co-authored-by: Anton Postl <anton.postl@student.tugraz.at> Co-authored-by: Stefan Schörkmeier <s.schoerkmeier@student.tugraz.at>

SteveineiterTU and others added 30 commits April 11, 2022 09:51

Initialized file.

1b42511

Created first structure.

2716e40

Created architecture of test. We need to write it in correct syntax a…

704ce56

…nd try the attention layer.

Basic attention layer without options

8b20db1

added transpose, changed return type in calculate_scores

d4262e5

modified attention_layer toy example

Apply scores gives now the same return values as in the python example.

27d9b9e

first version of backward pass for attention.dml

3e5f9f3

fixed variable names in backward pass of attention.dml

fb8d480

Cleand up code.

da15336

Pushing current state of PC, small additions.

048582c

Reformated code.

5bceeee

attention_layer: implemented backward pass, but yet without consideri…

df684b5

…ng the softmax

added toy example for calculating gradients, added todo

7483944

merge commit

677a85c

Reformated code.

2dc61e2

Merge remote-tracking branch 'origin/master'

8d3710a

attention layer: rewrote forward pass into single function, added fun…

ac73b62

…ction documentation regarding inputs and outputs for forward and backward, prototype of backward pass

Merge remote-tracking branch 'origin/master'

5074da9

Updatet use case. We have a bug at the attention::backward and on avg…

f9a3e6c

…_pool2d.

Fixed return value bug in avg_pooling.

7f4b013

Should be done, needs clean up tho.

91324ac

Cleaned up code.

899f7e2

added prototype of Attentin Example with lstm

a384a61

added grad_check for attention layer, registered grad_check for atten…

e353ca8

…tion layer, fixed backward pass for attention layer

changed attention layer to be applicable for whole batch instead of 1…

0b5475f

… sample

Attention Layer Example: added validation loss to training example

d7f6c6d

Attention Layer Example: added learnable embeddding

b304eba

AttentionExample: removed parfor

2e08cde

AttentionExample: bugfix with batches in evaluate funciton

ab14661

AttentionExample: small bugfix, comments and funciton headers

722ce97

Baunsgaard reviewed Jun 6, 2022

View reviewed changes

SteveineiterTU added 5 commits June 9, 2022 10:39

Changed interface, improved method documentation, restructured code.

227db10

Adjustment of attention gradient test.

f471b8e

Small adjustment of backward parameter, added TODO.

a24dd41

Updated test case of attention layer with and without key.

971a97b

Updated example for attention layer.

672beea

SteveineiterTU mentioned this pull request Jun 11, 2022

Update SystemDS website with new dataset. apache/systemds-website#110

Merged

SteveineiterTU and others added 7 commits June 11, 2022 12:20

Created script to download data from the apache website.

7cea5ab

attention layer: fixed gradient

9f43476

AttentionExample: introduced key to self-attention

3a9a669

Merge branch 'master' into pr2

06134ad

Improved example for the attention layer.

eef8e9d

Added renaming to the data download script, so it is consitence with …

f292bf3

…the AttentionExample.dml file.

Should adress the LicenceCheck error of the Github check.

156479c

Merge branch 'apache:main' into pr2

d132b21

A-Postl changed the title ~~[SYSTEMDS-3303] WIP: NN Builtin: Attention Layer (need feedback pls)~~ [SYSTEMDS-3303] WIP: NN Builtin: Attention Layer Jun 17, 2022

A-Postl changed the title ~~[SYSTEMDS-3303] WIP: NN Builtin: Attention Layer~~ [SYSTEMDS-3303] : NN Builtin: Attention Layer Jun 17, 2022

A-Postl requested a review from Baunsgaard June 17, 2022 11:36

SteveineiterTU added 2 commits June 20, 2022 13:02

Added desciption how to dowload the data set and where the download_a…

8ee3c15

…ttentionExample.sh script can be found.

Merge remote-tracking branch 'origin/pr2' into pr2

4edb40d

Baunsgaard reviewed Jun 23, 2022

View reviewed changes

Baunsgaard mentioned this pull request Aug 17, 2022

[SYSTEMDS-3303] NN Builtin Attention Layer #1679

Closed

Baunsgaard closed this Aug 17, 2022

[SYSTEMDS-3303] : NN Builtin: Attention Layer #1625

[SYSTEMDS-3303] : NN Builtin: Attention Layer #1625

Uh oh!

Conversation

A-Postl commented Jun 2, 2022

Uh oh!

Baunsgaard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Baunsgaard Jun 6, 2022

Choose a reason for hiding this comment

Uh oh!

A-Postl Jun 6, 2022

Choose a reason for hiding this comment

Uh oh!

Baunsgaard Jun 23, 2022

Choose a reason for hiding this comment

Uh oh!

A-Postl commented Jun 6, 2022

Uh oh!

A-Postl commented Jun 15, 2022

Uh oh!

Baunsgaard left a comment

Choose a reason for hiding this comment

Uh oh!

Baunsgaard commented Aug 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Baunsgaard left a comment •

edited

Loading