Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added set_s function for rnns #127

Merged
merged 7 commits into from
Oct 28, 2016
Merged

Added set_s function for rnns #127

merged 7 commits into from
Oct 28, 2016

Conversation

pmichel31415
Copy link
Collaborator

C++ and python implementation. Partially addresses #124

WARNING : NOT TESTED (should be ok though, basically copy pasted set_h)

@pmichel31415
Copy link
Collaborator Author

should be ok now, tested with

from dynet import *
m = Model()
builder = LSTMBuilder(1, 10, 10, m)
s0 = builder.initial_state()
s = s0.add_input(vecInput(10,))
ws = parameter(m.add_parameters((5, 10)))

s1 = s.add_input(s.output())  # no problem here
new_o = vecInput((3))
s1n = s1.set_h((new_o,))
print s1n.h()[0].npvalue()
print s1n.s()[0].npvalue()

in python

@neubig
Copy link
Contributor

neubig commented Oct 26, 2016

Great! This would be even greater if you could convert this test code into
a unit test to make sure that the bug doesn't pop up again.

On Wed, Oct 26, 2016 at 10:33 AM, Paul Michel notifications@github.com
wrote:

should be ok now, tested with

from dynet import *
m = Model()
builder = LSTMBuilder(1, 10, 10, m)
s0 = builder.initial_state()
s = s0.add_input(vecInput(10,))
ws = parameter(m.add_parameters((5, 10)))

s1 = s.add_input(s.output()) # no problem here
new_o = vecInput((3))
s1n = s1.set_h((new_o,))
print s1n.h()[0].npvalue()
print s1n.s()[0].npvalue()

in python


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#127 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAYWG7o68tVGP0w6LIbQ3Xj_exu7VKjgks5q32TSgaJpZM4Kgf9b
.

@zhang-wen
Copy link

for your following code:

for (unsigned i = 0; i < layers; ++i) {
Expression h_i = h_new[i];
Expression c_i = c[t - 1][i];
h[t][i] = h_i;
c[t][i] = c_i;
}

in my opinion, it should be :
// should assign value for cell memories and outputs of each layer concurrently for the new state
Expression GRUBuilder::set_h_impl(int prev, const vector& h_new) {
if (h_new.size()) { assert(h_new.size() == layers); }
const unsigned t = h.size();
h.push_back(vector(layers));
c.push_back(vector(layers));
for (unsigned i = 0; i < layers; ++i) {
Expression h_i = h_new[i];
h[t][i] = h_i;
c[t][i] = h_i; // for GRU and SimpleRNN, c[t] == h[t]
}
return h[t].back();
}


张文,在读博士
中国科学院计算技术研究所
中科院智能信息处理重点实验室

自然语言处理研究组

Wen Zhang, Ph.D Candidate
Natural Language Processing Research Group
CAS Key Laboratory of Intelligent Information Processing
Institute of Computing Technology, Chinese Academy of Sciences

在 2016年10月26日,07:33,Paul Michel notifications@github.com 写道:

should be ok now, tested with

from dynet import *
m = Model()
builder = LSTMBuilder(1, 10, 10, m)
s0 = builder.initial_state()
s = s0.add_input(vecInput(10,))
ws = parameter(m.add_parameters((5, 10)))

s1 = s.add_input(s.output()) # no problem here
new_o = vecInput((3))
s1n = s1.set_h((new_o,))
print s1n.h()[0].npvalue()
print s1n.s()[0].npvalue()
in python


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #127 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AO8NmKHRpB79npNEGrfVe-WGE0gOek9Lks5q32TWgaJpZM4Kgf9b.

@pmichel31415
Copy link
Collaborator Author

@neubig Is there already a framework in place for unit tests?

@zhang-wen Not sure what you mean.

  • In case of GRU/simpleRNN there's simply no field c
  • Why would we want to copy the new h in the internal state c?

Maybe I misunderstood what you meant.

Just to make sure we're talking of the same thing, the new behaviour for set_h and set_s are :

  • set_h : add new step, sets h to h_new and copy c[t-1] to c[t]
  • set_s : add new step, sets c to s_new and
    • if s_new.size() == num_layers : copy h[t-1] to h[t]
    • if s_new.size() == num_layers : set h[t] with s_new[num_layers:]

@neubig
Copy link
Contributor

neubig commented Oct 26, 2016

@pmichel31415 Yes, see the tests/ directory.

@neubig
Copy link
Contributor

neubig commented Oct 27, 2016

Shall I merge this? I'd like to prevent the branches from diverging too much and causing problems at merge time. Tests would be great, but they can wait for later.

@zhang-wen
Copy link

@paul Michel sorry about unclear describing.

your understanding is correct, for your two unsure question:

  1. i make a mistake, no field c for GRU/simpleRNN, so we do not need to assign
    c field for GRU/simpleRNN.
  2. for GRU/simpleRNN, it is not correct to copy the new h in the internal state c,
    because no c field for GRU/simpleRNN.

the problem is LSTM, i do not actually understand why do you copy the h and c of
previous state into current state if no new h or new c given ? can this make sure the
internal calculation in LSTM unit correct ?


张文,在读博士
中国科学院计算技术研究所
中科院智能信息处理重点实验室

自然语言处理研究组

Wen Zhang, Ph.D Candidate
Natural Language Processing Research Group
CAS Key Laboratory of Intelligent Information Processing
Institute of Computing Technology, Chinese Academy of Sciences

在 2016年10月26日,09:00,Paul Michel notifications@github.com 写道:

@neubig https://github.com/neubig Is there already a framework in place for unit tests?

@zhang-wen https://github.com/zhang-wen Not sure what you mean.

In case of GRU/simpleRNN there's simply no field c
Why would we want to copy the new h in the internal state c?
Maybe I misunderstood what you meant.

Just to make sure we're talking of the same thing, the new behaviour for set_h and set_s are :

set_h : add new step, sets h to h_new and copy c[t-1] to c[t]
set_s : add new step, sets c to s_new and
if s_new.size() == num_layers : copy h[t-1] to h[t]
if s_new.size() == num_layers : set h[t] with s_new[num_layers:]

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #127 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AO8NmA8SvacvvKIayHmKIl6x_5ffYf7fks5q33kygaJpZM4Kgf9b.

@pmichel31415
Copy link
Collaborator Author

@neubig Yes, I'll write tests but probably won't be able to do it before next week, I think this should be merged

@zhang-wen The previous behaviour of new_h was to increment the time step of the rnn-state-machine with the new h_new. to be consistent with this behaviour you need to copy the previous state.
If this doesn't anwer your question can you make the modiification in your fork and show me?

Thanks for your feedback!

@neubig neubig merged commit 2a44f15 into clab:master Oct 28, 2016
@zhang-wen
Copy link

@paul Michel i am thinking about the rnn/lstm structure, little confusing about the s and h,
assume that the layer of lstm/rnn is 1, the s() function will output a list [h, o], o is the output of lstm per state,
right ? and what is h ? can you explain this for me , thank you.


张文,在读博士
中国科学院计算技术研究所
中科院智能信息处理重点实验室

自然语言处理研究组

Wen Zhang, Ph.D Candidate
Natural Language Processing Research Group
CAS Key Laboratory of Intelligent Information Processing
Institute of Computing Technology, Chinese Academy of Sciences

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

在 2016年10月27日 星期四,09:03,Paul Michel 写道:

@neubig (https://github.com/neubig) Yes, I'll write tests but probably won't be able to do it before next week, I think this should be merged
@zhang-wen (https://github.com/zhang-wen) The previous behaviour of new_h was to increment the time step of the rnn-state-machine with the new h_new. to be consistent with this behaviour you need to copy the previous state.
If this doesn't anwer your question can you make the modiification in your fork and show me?
Thanks for your feedback!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (#127 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/AO8NmCdbLzh0lXAxDF57I_9w8h-3g-CVks5q4MtkgaJpZM4Kgf9b).

@zhang-wen
Copy link

@paul Michel sorry, i make the mistake again, for simple run, the s() is the same as h().
my question is, for LSTM, s() is a series of memory vectors, followed by h(), so, how should i understand
the memory vector?


张文,在读博士
中国科学院计算技术研究所
中科院智能信息处理重点实验室

自然语言处理研究组

Wen Zhang, Ph.D Candidate
Natural Language Processing Research Group
CAS Key Laboratory of Intelligent Information Processing
Institute of Computing Technology, Chinese Academy of Sciences

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

在 2016年10月30日 星期日,08:08,Wen Zhang 写道:

@paul Michel i am thinking about the rnn/lstm structure, little confusing about the s and h,
assume that the layer of lstm/rnn is 1, the s() function will output a list [h, o], o is the output of lstm per state,
right ? and what is h ? can you explain this for me , thank you.


张文,在读博士
中国科学院计算技术研究所
中科院智能信息处理重点实验室

自然语言处理研究组

Wen Zhang, Ph.D Candidate
Natural Language Processing Research Group
CAS Key Laboratory of Intelligent Information Processing
Institute of Computing Technology, Chinese Academy of Sciences

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

在 2016年10月27日 星期四,09:03,Paul Michel 写道:

@neubig (https://github.com/neubig) Yes, I'll write tests but probably won't be able to do it before next week, I think this should be merged
@zhang-wen (https://github.com/zhang-wen) The previous behaviour of new_h was to increment the time step of the rnn-state-machine with the new h_new. to be consistent with this behaviour you need to copy the previous state.
If this doesn't anwer your question can you make the modiification in your fork and show me?
Thanks for your feedback!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (#127 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/AO8NmCdbLzh0lXAxDF57I_9w8h-3g-CVks5q4MtkgaJpZM4Kgf9b).

@pmichel31415
Copy link
Collaborator Author

@zhang-wen I'm going to push some documentation on rnn builders sometimes in the beginning of the week, hopefully today. This should make things more clear.

Long story short, there are two vectors stored in LSTMs, h and c, which are respectively the output and the cell state.

get_h() returns h, get_s returns [c,h]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants