-
Notifications
You must be signed in to change notification settings - Fork 947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recurrent layers: Accept layer for hid_init #462
Comments
This seems reasonable to me. I think @skaae is better suited to comment, as he added that functionality for a specific use case (next character prediction) in mind. I think the new recurrent containers will be a better way to handle that use-case anyways. |
Yes its better require ´hid_init` to be layer. For the "my" usecase you can just wrap the tensor in an input layer. |
Ok, cool. @f0k can you assign this to me? Unless one of you want to do it. |
You're welcome to tackle this! :) And I'll be glad to review. |
Remember to return the parameters when hid_init/cell_init is a layer. We got a question about this on the mailing list. |
No need to do that manually, |
aah yes :) |
@craffel: Do you plan to work on this soon? otherwise i can create a PR. |
missed that :) I'll remove the TensorVariable special case when #522 is merged. |
I was planning on it, but I'm glad someone else did it :) |
The recurrent layers currently have some custom behaviour when
hid_init
is a TensorVariable rather than a shared variable, callable or numpy array: They assumehid_init
is a tensor of one order higher than otherwise, to include the batch dimension, and they assumehid_init
is not to be learned.With #404, parameters can be arbitrary Theano expressions anyway, and overriding that behaviour to assume a different dimensionality is a bit awkward. As discussed in #11 (comment) and following, a better solution would be allowing
hid_init
to be aLayer
, and assuming it would include the batch dimension in this case. This would also be a step towards supporting the encoder-decoder architecture discussed in #391 (comment) and following.The text was updated successfully, but these errors were encountered: