-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about the implementation #2
Comments
Thanks for the comments. Yes, you are right there is a mistake in line 193 and it should be rnn_h_enc. We are still working on the latest version of the code. In terms of whether the last hidden states/memories should be copied, I think it is a subtle detail that worth to be explored. |
Thanks for the answers. I am still a bit confused about the propagation of error gradients from the first hidden state/memory of the decoder to the last hidden state/memory of the encoder. From what I can tell, during the the forward pass the initial hidden state/memory of the decoder is not copied from the last hidden state/memory of the encoder but is initialized as a random vector (evaluation, training). I agree that whether these vectors should be copied is more of a research question and should definitely be explored. What I don't quite understand is the apparent discrepancy between the forward and backward passes given in the SNLI-attention/LSTMN.lua file. |
Ah sorry, I see what you mean. There is clearly some problems here in this implementation and I will fix it. |
Thanks for sharing the research code along with the paper. I am reading the code in parallel with the paper and I think it is very useful for understanding details that are not explicitly mentioned in the paper. I am not sure if this is the right place to ask but I have two questions regarding the implementation.
{rnn_alpha, rnn_h_dec}
as input. I don't quite understand whyrnn_alpha
is part of the input. Shouldn't the input be the hidden state vectors for the source and target sequences, i.e.{rnn_h_enc, rnn_h_dec}
?drnn_c_dec[1]
anddrnn_h_dec[1]
to the gradientsdrnn_c_enc[max_length+1]
anddrnn_h_enc[max_length+1]
on lines 222-223 of SNLI-attention/LSTMN.lua? After reading the paper and the rest of the implementation, I have the impression that the initial hidden state and memory vectors of the decoder are random vectors and they don't depend on the final hidden state and memory vectors of the encoder.The text was updated successfully, but these errors were encountered: