-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about GRU-D implementation #1
Comments
Hey ducnx, good catch! It seems like this behavior is not totally in line with the original publication of GRU-D:
Where It looks like we retained the behavior from the implementation this code is based on (see https://github.com/PeterChe1990/GRU-D) and did not notice the discrepancy to the original GRU-D implementation. To my understanding, the separate dropout of the input mask does not lead to values being imputed which are actually present. Instead it would reduce the models reliance on observation patterns compared to the original implementation because the independent dropout mask makes it harder for the model to differentiate observed and imputed values. Generally, I think it should not have a detrimental effect on GRU-D performance and might even be beneficial to a certain degree. Anyway, thanks for pointing this out. I will think of a way to make this apparent for the readers of the paper and users of the code. Until than I will leave this issue open. Cheers, |
Thanks for the comment! I agree that the behavior from the original implementation is similar to yours. As you explained, using different dropout masks for |
Hi there, I have a question about calculating
dp_mask
forx_t
andm_dp_mask
form_t
in your GRU-D implementation (file gru_d.py).First, the
dp_mask
is generated from GRUCell built-in functionget_dropout_mask_for_cell
: codeThen, the dropout mask
m_dp_mask
for masking vectorm_t
is generated by calling_generate_dropout_mask
: codeBy doing so, the
dp_mask
andm_dp_mask
zero out different elements in two inputsx_t
andm_t
. I can reproduce your result, however, I think that the dropout masks should be the same forx_t
andm_t
. Can you please clarify this for me? Did I misunderstand something in the core TensorFlow implementation/your implementation?Thanks for the great work!
The text was updated successfully, but these errors were encountered: