Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurrent layers: Accept layer for hid_init #462

Closed
f0k opened this issue Oct 15, 2015 · 11 comments
Closed

Recurrent layers: Accept layer for hid_init #462

f0k opened this issue Oct 15, 2015 · 11 comments
Assignees

Comments

@f0k
Copy link
Member

f0k commented Oct 15, 2015

The recurrent layers currently have some custom behaviour when hid_init is a TensorVariable rather than a shared variable, callable or numpy array: They assume hid_init is a tensor of one order higher than otherwise, to include the batch dimension, and they assume hid_init is not to be learned.

With #404, parameters can be arbitrary Theano expressions anyway, and overriding that behaviour to assume a different dimensionality is a bit awkward. As discussed in #11 (comment) and following, a better solution would be allowing hid_init to be a Layer, and assuming it would include the batch dimension in this case. This would also be a step towards supporting the encoder-decoder architecture discussed in #391 (comment) and following.

@craffel
Copy link
Member

craffel commented Oct 19, 2015

This seems reasonable to me. I think @skaae is better suited to comment, as he added that functionality for a specific use case (next character prediction) in mind. I think the new recurrent containers will be a better way to handle that use-case anyways.

@skaae
Copy link
Member

skaae commented Oct 19, 2015

Yes its better require ´hid_init` to be layer. For the "my" usecase you can just wrap the tensor in an input layer.

@craffel
Copy link
Member

craffel commented Oct 19, 2015

Ok, cool. @f0k can you assign this to me? Unless one of you want to do it.

@f0k
Copy link
Member Author

f0k commented Oct 19, 2015

You're welcome to tackle this! :) And I'll be glad to review.

@skaae
Copy link
Member

skaae commented Oct 26, 2015

Remember to return the parameters when hid_init/cell_init is a layer. We got a question about this on the mailing list.

@f0k
Copy link
Member Author

f0k commented Oct 29, 2015

Remember to return the parameters when hid_init/cell_init is a layer.

No need to do that manually, hid_init would become part of the layer's incoming layers (i.e., it would be passed to the MergeLayer super constructor). Otherwise it couldn't properly participate in get_output() anyway.

@skaae
Copy link
Member

skaae commented Oct 30, 2015

aah yes :)

@skaae
Copy link
Member

skaae commented Dec 18, 2015

@craffel: Do you plan to work on this soon? otherwise i can create a PR.

@f0k
Copy link
Member Author

f0k commented Dec 18, 2015

Do you plan to work on this soon? otherwise i can create a PR.

Look above your comment, there is a PR, #522! :)

You could start a PR on top of that which removes the TensorVariable special case, though. (I.e., do not base it on master, but base it on #522, as we're going to merge that anyway.)

@skaae
Copy link
Member

skaae commented Dec 18, 2015

missed that :) I'll remove the TensorVariable special case when #522 is merged.

@craffel
Copy link
Member

craffel commented Dec 18, 2015

@craffel: Do you plan to work on this soon? otherwise i can create a PR.

I was planning on it, but I'm glad someone else did it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants