You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for sharing this repo, I enjoyed when reading your paper, and I am working on a related project. As I am going through the code, I found one potential issue with the current setup. I will (1) explain the issue, and (2) provide a simple test case that I ran on my end. Please help with verifying.
Issue:
nn.MultiheadAttention module inside the BinaryEncoder module is set with batch_first=True, however it seems like we are passing in Q, K, V matrics without the first dimension being the batch dimension.
Code Analysis:
In r2d2.py, it is calling the encoder here, as the following
We can see that input_embedding is definitely with the first dimension being the batch_size as it concat with the embeddings from the nn.embedding module. Before we call self.tree_encoder, .transpose(0, 1) makes the the second dimension of the input being the batch_size instead. Specifically, the first dimension, in this case, is always 4.
Testing Done:
I simply add some logs inside TreeEncoderLayer as,
Summary:
It seems like for r2d2.py, the self-attention is not on those 4 tokens (2 special prefix + left and right children embedding), but it is on the full collection of candidates with their children.
I saw this issue is not presented in r2d2_cuda.py as,
This is great. I have not checked the rest of the code for r2d2_cuda.py though. With this, I am wondering are the numbers from either of your papers need to be updated with this potential issue? Either way, I am not blocked by this potential issue, and I was inspired quite a lot by your codebase. Thanks!
The text was updated successfully, but these errors were encountered:
Many thanks for your feedback. Actually, r2d2.py is used in the first paper accepted by ACL 2021, and r2d2_cuda.py is used in the paper accepted by EMNLP. As r2d2.py is not used in our latest work, so we didn't do a regression test which may cause some discrepancies. If you are interested in the first work, I've made a tag for it: https://github.com/alipay/StructuredLM_RTDT/tree/r2d2, which is the original code for the first paper. The current branch actually only supports the cuda version(r2d2_cuda.py). Since r2d2.py actually is legacy code, we'll consider fixing the discrepancy or removing it directly. But the numbers in the paper are running in the correct version. If you have a Cuda environment, I suggest you use the latest version(Fast-R2D2), which is almost 30 folds faster than R2D2, with better downstream tasks performance.
I've checked in a bug-fixed version of r2d2 in the latest branch. We will release a model pretrained on wiki103 of fast-r2d2 soon, hope that will be helpful to your work :)
Hi Authors!
Thanks for sharing this repo, I enjoyed when reading your paper, and I am working on a related project. As I am going through the code, I found one potential issue with the current setup. I will (1) explain the issue, and (2) provide a simple test case that I ran on my end. Please help with verifying.
Issue:
BinaryEncoder
module is set withbatch_first=True
, however it seems like we are passing in Q, K, V matrics without the first dimension being the batch dimension.Code Analysis:
In
r2d2.py
, it is calling the encoder here, as the followingWe can see that
input_embedding
is definitely with the first dimension being thebatch_size
as it concat with the embeddings from thenn.embedding
module. Before we callself.tree_encoder
,.transpose(0, 1)
makes the the second dimension of the input being thebatch_size
instead. Specifically, the first dimension, in this case, is always 4.Testing Done:
I simply add some logs inside
TreeEncoderLayer
as,And this is what I get,
Summary:
It seems like for
r2d2.py
, the self-attention is not on those 4 tokens (2 special prefix + left and right children embedding), but it is on the full collection of candidates with their children.I saw this issue is not presented in
r2d2_cuda.py
as,This is great. I have not checked the rest of the code for
r2d2_cuda.py
though. With this, I am wondering are the numbers from either of your papers need to be updated with this potential issue? Either way, I am not blocked by this potential issue, and I was inspired quite a lot by your codebase. Thanks!The text was updated successfully, but these errors were encountered: