Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the implementation #2

Closed
dogancan opened this issue Feb 8, 2016 · 3 comments
Closed

Questions about the implementation #2

dogancan opened this issue Feb 8, 2016 · 3 comments

Comments

@dogancan
Copy link

dogancan commented Feb 8, 2016

Thanks for sharing the research code along with the paper. I am reading the code in parallel with the paper and I think it is very useful for understanding details that are not explicitly mentioned in the paper. I am not sure if this is the right place to ask but I have two questions regarding the implementation.

  1. The classifier backprop call on line 193 of SNLI-attention/LSTMN.lua is taking {rnn_alpha, rnn_h_dec} as input. I don't quite understand why rnn_alpha is part of the input. Shouldn't the input be the hidden state vectors for the source and target sequences, i.e. {rnn_h_enc, rnn_h_dec}?
  2. Why do we add the gradients drnn_c_dec[1] and drnn_h_dec[1] to the gradients drnn_c_enc[max_length+1] and drnn_h_enc[max_length+1] on lines 222-223 of SNLI-attention/LSTMN.lua? After reading the paper and the rest of the implementation, I have the impression that the initial hidden state and memory vectors of the decoder are random vectors and they don't depend on the final hidden state and memory vectors of the encoder.
@cheng6076
Copy link
Owner

Thanks for the comments. Yes, you are right there is a mistake in line 193 and it should be rnn_h_enc. We are still working on the latest version of the code. In terms of whether the last hidden states/memories should be copied, I think it is a subtle detail that worth to be explored.

@dogancan
Copy link
Author

dogancan commented Feb 8, 2016

Thanks for the answers. I am still a bit confused about the propagation of error gradients from the first hidden state/memory of the decoder to the last hidden state/memory of the encoder. From what I can tell, during the the forward pass the initial hidden state/memory of the decoder is not copied from the last hidden state/memory of the encoder but is initialized as a random vector (evaluation, training). I agree that whether these vectors should be copied is more of a research question and should definitely be explored. What I don't quite understand is the apparent discrepancy between the forward and backward passes given in the SNLI-attention/LSTMN.lua file.

@cheng6076
Copy link
Owner

Ah sorry, I see what you mean. There is clearly some problems here in this implementation and I will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants